Scrapy is a easy and simple scraping library.
To use Scrapy directly you need to call the 'Scrape' method with an endpoint to scrape and a list of selectors, e.g.:
c := NewScrapeClient()
endpoint := "http://localhost:5555/"
selector := Selector{
Name: "xpath-scrape",
TypeOfSelector: "xpath",
Value: "//div",
}
var result []ScrapeResult
c.Scrape(endpoint, []Selector{selector}, &result)
ScrapyBoss serves as a lightweight scheduler for scrapes done by Scrapy. It can be configured by providing a yaml file or by creating the configuration directly from code. Once started it will start scraping the endpoints according to the configured selectors based on the 'ScrapeIntervalInSeconds'.
config := ScrapyBossConfig{
ScrapeIntervalInSeconds: 10,
ScrapeTimeoutInSeconds: 3,
IdleConnectionPool: 1,
ScrapeEndpoints: []ScrapeEndpoint{
ScrapeEndpoint{
Endpoint: "http://localhost:5555",
Selectors: []scrapy.Selector{
scrapy.Selector{
Name: "test",
Value: "//div",
TypeOfSelector: "xpath",
},
},
},
},
}
scrapyBoss := NewScrapyBoss(config)
scrapyBoss.Start()
data, err := ioutil.ReadFile("/path/to/the/config/file.yaml")
if err != nil {
log.Fatal(err)
}
config, err := scrapyboss.ParseConfig(data)
if err != nil {
log.Fatal(err)
}
scrapyBoss := scrapyboss.NewScrapyBoss(config)
scrapyBoss.Start()
Scrapy supports the following selector types:
- xpath
- regex
The Scrapi project (https://github.com/Vorstenbosch/scrapi) provides an example of an implementation of the scrapy library.