HTTRUTA Crawler Project
Download all links from a feed using tools like httrack and wkhtmltopdf. This is the engine behind the Cache feature used by the Semantic Scuttle instance known as Fluxo de Links.
git clone https://git.fluxo.info/httruta
sudo apt install httrack wkhtmltopdf
The default config is optimized for getting all new stuff added into Fluxo de Links.
You might use httruta to archive any other website that has RSS support. To customize httruta,
just copy the file
config and edit to suit your needs.
Place this script somewhere and setup a cronjob like this:
*/5 * * * * /var/sites/cache/httruta/httracker &> /dev/null