Web-scraping AI bots cause disruption for scientific databases and journals

Welcome • User Guide
ToS • Privacy • Canary
Donate • Bugs • License

©2026 Poal.co

449

•

Web-scraping AI bots cause disruption for scientific databases and journals (www.nature.com)

From the post:

>In February, the online image repository DiscoverLife, which contains nearly 3 million photographs of different species, started to receive millions of hits to its website every day — a much higher volume than normal. At times, this spike in traffic was so high that it slowed the site down to the point that it became unusable. The culprit? Bots. These automated programs, which attempt to ‘scrape’ large amounts ofcontent from websites, are increasingly becoming a headache for scholarly publishers and researchers who run sites hosting journal papers, databases and other resources.

Archive: https://archive.today/mgKQl From the post: >>In February, the online image repository DiscoverLife, which contains nearly 3 million photographs of different species, started to receive millions of hits to its website every day — a much higher volume than normal. At times, this spike in traffic was so high that it slowed the site down to the point that it became unusable. The culprit? Bots. These automated programs, which attempt to ‘scrape’ large amounts ofcontent from websites, are increasingly becoming a headache for scholarly publishers and researchers who run sites hosting journal papers, databases and other resources.

(post is archived)

[–] • 1 pt

But such blanket bans can cause problems for legitimate users. Academics often access journal websites in a way that can seem bot-like, Mulvany explains — by using proxy servers to remotely browse journals through institutional libraries (meaning many requests can come through a single IP address).

Anubis (anubis.techaro.lol) would help with this.

It treats everything like a bot on the first request. Once your browser completes a short JavaScript operation it gets a pass for about a week. Legitimate users sharing a connection would be fine.

Currently Anubis blocks anything that doesn’t run JavaScript, but I’m pretty sure search engine crawlers are fine.

link

/s/AI

created ago

About this sub

All things AI. If you don't know where to drop it, this is a good place for it. This can include things related to business, work, life, robots, etc...

Some things should fit in other subs such as:

AI videos/entertainment - /s/AIStories

Trash AI stuff (slop) - /s/AISLOP

Interesting AI generated stuff (pictures, videos, music) - /s/AICreations

Flare will be created as it makes sense. If you have a suggestion for a flare contact the owner or a mod.

If you want to be a mod, contact the owner.