Fantastic stuff!
Not sure how you're rendering the pages, but if using something like selenium or similar you may be able to include an adblocker to remove site annoyances like popups/etc. prior to rendering out the final html file.
Awesome
I'm using a headless Chrome engine.
So adding an anti-popup and ads is in the work.
The only problem is when websites are using Cloudflare, I need to work a function to randomize the User Agent as CF seems to be blocking the archiving after a while.
You could look at cfscrape (github.com) for some inspiration.
Edit: this could help you out (github.com)
Thanks! Saw that earlier, but I'm busy with preparing Christmas eve dinner menu and stuff ;)
I'll resume that a few days after.
(post is archived)