English | ็ฎไฝไธญๆ | ไธญๅฝ้ๅ
๐ PulsarRPA: The AI-Powered, Lightning-Fast Browser Automation Solution! ๐
- ๐ค AI Integration with LLMs โ Smarter automation powered by large language models.
- โก Ultra-Fast Automation โ Coroutine-safe browser automation concurrency, spider-level crawling performance.
- ๐ง Web Understanding โ Deep comprehension of dynamic web content.
- ๐ Data Extraction APIs โ Powerful tools to extract structured data effortlessly.
Automate the browser and extract data at scale with simple text.
Go to https://www.amazon.com/dp/B0C1H26C46
After browser launch: clear browser cookies.
After page load: scroll to the middle.
Summarize the product.
Extract: product name, price, ratings.
Find all links containing /dp/.
๐บ Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC
curl -L -o PulsarRPA.jar https://github.com/platonai/PulsarRPA/releases/download/v3.0.14/PulsarRPA.jar
# make sure LLM api key is set. VOLCENGINE_API_KEY/OPENAI_API_KEY also supported.
echo $DEEPSEEK_API_KEY
java -D"DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY}" -jar PulsarRPA.jar
๐ Tip: Make sure
DEEPSEEK_API_KEY
or other LLM API key is set in your environment, or AI features will not be available.
๐ Tip: Windows PowerShell syntax:
$env:DEEPSEEK_API_KEY
(environment variable) vs$DEEPSEEK_API_KEY
(script variable).
๐ Resources
- ๐ฆ GitHub Release Download
- ๐ Mirror / Backup Download
- ๐ ๏ธ LLM Configuration Guide
- ๐ ๏ธ Configuration Guide
- Open the project in your IDE
- Run the
ai.platon.pulsar.app.PulsarApplicationKt
main class
# make sure LLM api key is set. VOLCENGINE_API_KEY/OPENAI_API_KEY also supported.
echo $DEEPSEEK_API_KEY
docker run -d -p 8182:8182 -e DEEPSEEK_API_KEY=${DEEPSEEK_API_KEY} galaxyeye88/pulsar-rpa:latest
Use the commands
API to perform browser operations, extract web data, analyze websites, and more.
WebUI: http://localhost:8182/command.html
REST API
curl -X POST "http://localhost:8182/api/commands/plain" -H "Content-Type: text/plain" -d '
Go to https://www.amazon.com/dp/B0C1H26C46
After browser launch: clear browser cookies.
After page load: scroll to the middle.
Summarize the product.
Extract: product name, price, ratings.
Find all links containing /dp/.
'
curl -X POST "http://localhost:8182/api/commands" -H "Content-Type: application/json" -d '{
"url": "https://www.amazon.com/dp/B0C1H26C46",
"onBrowserLaunchedActions": ["clear browser cookies"],
"onPageReadyActions": ["scroll to the middle"],
"pageSummaryPrompt": "Provide a brief introduction of this product.",
"dataExtractionRules": "product name, price, and ratings",
"uriExtractionRules": "all links containing `/dp/` on the page"
}'
๐ก Tip: You don't need to fill in every field โ just what you need.
Harness the power of the x/e
API for highly precise, flexible, and intelligent data extraction.
curl -X POST "http://localhost:8182/api/x/e" -H "Content-Type: text/plain" -d "
select
llm_extract(dom, 'product name, price, ratings') as llm_extracted_data,
dom_base_uri(dom) as url,
dom_first_text(dom, '#productTitle') as title,
dom_first_slim_html(dom, 'img:expr(width > 400)') as img
from load_and_select('https://www.amazon.com/dp/B0C1H26C46', 'body');
"
The extracted data example:
{
"llm_extracted_data": {
"product name": "Apple iPhone 15 Pro Max",
"price": "$1,199.00",
"ratings": "4.5 out of 5 stars"
},
"url": "https://www.amazon.com/dp/B0C1H26C46",
"title": "Apple iPhone 15 Pro Max",
"img": "<img src=\"https://example.com/image.jpg\" />"
}
- X-SQL Guide: X-SQL
PulsarRPA enables high-speed parallel web scraping with coroutine-based concurrency, delivering efficient data extraction while minimizing resource overhead.
val args = "-refresh -dropContent -interactLevel fastest"
val resource = "seeds/amazon/best-sellers/leaf-categories.txt"
val links =
LinkExtractors.fromResource(resource).asSequence().map { ListenableHyperlink(it, "", args = args) }.onEach {
it.eventHandlers.browseEventHandlers.onWillNavigate.addLast { page, driver ->
driver.addBlockedURLs(blockingUrls)
}
}.toList()
session.submitAll(links)
๐ Example: View Kotlin Code
PulsarRPA implements coroutine-safe browser control.
val prompts = """
move cursor to the element with id 'title' and click it
scroll to middle
scroll to top
get the text of the element with id 'title'
"""
val eventHandlers = DefaultPageEventHandlers()
eventHandlers.browseEventHandlers.onDocumentActuallyReady.addLast { page, driver ->
val result = session.instruct(prompts, driver)
}
session.open(url, eventHandlers)
๐ Example: View Kotlin Code
PulsarRPA provides flexible robotic process automation capabilities.
val options = session.options(args)
val event = options.eventHandlers.browseEventHandlers
event.onBrowserLaunched.addLast { page, driver ->
warnUpBrowser(page, driver)
}
event.onWillFetch.addLast { page, driver ->
waitForReferrer(page, driver)
waitForPreviousPage(page, driver)
}
event.onWillCheckDocumentState.addLast { page, driver ->
driver.waitForSelector("body h1[itemprop=name]")
driver.click(".mask-layer-close-button")
}
session.load(url, options)
๐ Example: View Kotlin Code
PulsarRPA provides X-SQL for complex data extraction.
select
llm_extract(dom, 'product name, price, ratings, score') as llm_extracted_data,
dom_first_text(dom, '#productTitle') as title,
dom_first_text(dom, '#bylineInfo') as brand,
dom_first_text(dom, '#price tr td:matches(^Price) ~ td') as price,
dom_first_text(dom, '#acrCustomerReviewText') as ratings,
str_first_float(dom_first_text(dom, '#reviewsMedley .AverageCustomerReviews span:contains(out of)'), 0.0) as score
from load_and_select('https://www.amazon.com/dp/B0C1H26C46 -i 1s -njr 3', 'body');
๐ Example Code:
- ๐ REST API Examples
- ๐ ๏ธ LLM Configuration Guide
- ๐ ๏ธ Configuration Guide
- ๐ Build from Source
- ๐ง Expert Guide
Set the environment variable PROXY_ROTATION_URL to the URL provided by your proxy service:
export PROXY_ROTATION_URL=https://your-proxy-provider.com/rotation-endpoint
Each time the rotation URL is accessed, it should return a response containing one or more fresh proxy IPs. Ask your proxy provider for such a URL.
๐ท๏ธ Web Spider
- Scalable crawling
- Browser rendering
- AJAX data extraction
๐ค AI-Powered
- Automatic field extraction
- Pattern recognition
- Accurate data capture
๐ง LLM Integration
- Natural language web content analysis
- Intuitive content description
๐ฏ Text-to-Action
- Simple language commands
- Intuitive browser control
๐ค RPA Capabilities
- Human-like task automation
- SPA crawling support
- Advanced workflow automation
๐ ๏ธ Developer-Friendly
- One-line data extraction
- SQL-like query interface
- Simple API integration
๐ X-SQL Power
- Extended SQL for web data
- Content mining capabilities
- Web business intelligence
๐ก๏ธ Bot Protection
- Advanced stealth techniques
- IP rotation
- Privacy context management
โก Performance
- Parallel page rendering
- High-efficiency processing
- Block-resistant design
๐ฐ Cost-Effective
- 100,000+ pages/day
- Minimal hardware requirements
- Resource-efficient operation
โ Quality Assurance
- Smart retry mechanisms
- Precise scheduling
- Complete lifecycle management
๐ Scalability
- Fully distributed architecture
- Massive-scale capability
- Enterprise-ready
๐ฆ Storage Options
- Local File System
- MongoDB
- HBase
- Gora support
๐ Monitoring
- Comprehensive logging
- Detailed metrics
- Full transparency
- ๐ฌ WeChat: galaxyeye
- ๐ Weibo: galaxyeye
- ๐ง Email: [email protected], [email protected]
- ๐ฆ Twitter: galaxyeye8
- ๐ Website: platon.ai