This repository contains all the scripts and files related the blogforever-crawler-publication. It is organized as follows:
/texcontains the latex and gnuplot files together with instructions on how to compile the paper from source,/datasetexplains how to extract our test-set from of the Spinn3r Dataset,/success-rateshas the scripts we used to obtain the "extraction success rates" data,/running-timescontains the code we used for running time measurements.
Please keep in mind that the scripts you will find here were written as "single use code" and are anything but beautiful. If you have any issue compiling the paper or running the experiments just let me know!