WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2025 Poal.co

720

Feed AI generated data back to AI's to drive them insane.....

Archive: https://archive.today/OAMvo

From the post:

>This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.

Feed AI generated data back to AI's to drive them insane..... Archive: https://archive.today/OAMvo From the post: >>This is a tarpit intended to catch web crawlers. Specifically, it's targetting crawlers that scrape data for LLM's - but really, like the plants it is named after, it'll eat just about anything that finds it's way inside.
[–] 1 pt (edited )

I like it.

One of his Mastodon posts on this (chirp.zadzmo.org) is interesting.

How much am I spending on this? About a Raspberry Pi, give or take, and I have Facebook, Amazon, and Anthropic slamming it hard at the same time.

Buy a domain. Rent a small VM. Setup a small, simple site with some random content and a link into a Nepenthes tar pit. Check in on it periodically to see which bots are crawling it.

As he said, we could also use the logs from these tar pit sites to identify crawlers so other legitimate sites can block them.

Imagine we had a few hundred of these sites running on various private VMs wasting crawler time and filling their training data with garbage, and we had tens of thousands of sites full of legitimate content blocking those crawlers—both for server stability and to avoid feeding these beasts for free.

Update: He beat me to the published crawler list (chirp.zadzmo.org) (direct link (zadzmo.org)).

By happenstance, if anyone wants a list of what is absolutely a crawler as a JSON array, here it is - updated every 15 minutes:

https://zadzmo.org/code/nepenthes/crawlers.json

What does "absolutely a crawler" mean? More than 100 hits in the tarpit.