HTTRUTA Crawler Project ======================= Download all links from a feed using tools like [httrack](http://www.httrack.com) and [wkhtmltopdf](https://wkhtmltopdf.org). This is the engine behind the [Cache](https://cache.fluxo.info) feature used by the [Semantic Scuttle](http://semanticscuttle.sourceforge.net/) instance known as [Fluxo de Links](https://links.fluxo.info). Installation ------------ git clone https://git.fluxo.info/httruta Dependencies ------------ Recommended: sudo apt install httrack wkhtmltopdf Configuration ------------- The default config is optimized for getting all new stuff added into [Fluxo de Links](https://links.fluxo.info). You might use httruta to archive any other website that has RSS support. To customize httruta, just copy the file `config.default` into `config` and edit to suit your needs. Usage ----- Place this script somewhere and setup a cronjob like this: `*/5 * * * * /var/sites/cache/httruta/httracker &> /dev/null` Alternatives ------------ - https://github.com/webrecorder/pywb/ - https://github.com/chfoo/wpull