WelcomeUser Guide
ToSPrivacyCanary
DonateBugsLicense

©2024 Poal.co

153

Anyone else kind of feel like they should have to pay for your bandwith? They are already taking the data and training on it. Whatever, but when they crash your website or download excessively they should have to pay for the cost.

If someone is not already doing it we need a list of the IP's of all AI crawlers so we can blackhole them or just feed them their OWN data back to them to make them go insane.

Archive: https://archive.today/737zS

From the post:

>On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company's e-commerce site was down. It looked to be some kind of distributed denial-of-service attack. He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site. "We have over 65,000 products, each product has a page," Tomchuk told TechCrunch. "Each page has at least three photos." OpenAI was sending "tens of thousands" of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions. "OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it's way more," he said of the IP addresses the bot used to attempt to consume his site. "Their crawlers were crushing our site," he said "It was basically a DDoS attack."

Anyone else kind of feel like they should have to pay for your bandwith? They are already taking the data and training on it. Whatever, but when they crash your website or download excessively they should have to pay for the cost. If someone is not already doing it we need a list of the IP's of all AI crawlers so we can blackhole them or just feed them their OWN data back to them to make them go insane. Archive: https://archive.today/737zS From the post: >>On Saturday, Triplegangers CEO Oleksandr Tomchuk was alerted that his company's e-commerce site was down. It looked to be some kind of distributed denial-of-service attack. He soon discovered the culprit was a bot from OpenAI that was relentlessly attempting to scrape his entire, enormous site. "We have over 65,000 products, each product has a page," Tomchuk told TechCrunch. "Each page has at least three photos." OpenAI was sending "tens of thousands" of server requests trying to download all of it, hundreds of thousands of photos, along with their detailed descriptions. "OpenAI used 600 IPs to scrape data, and we are still analyzing logs from last week, perhaps it's way more," he said of the IP addresses the bot used to attempt to consume his site. "Their crawlers were crushing our site," he said "It was basically a DDoS attack."
[–] 1 pt 19d

This is the newcomers (hopefully) learning things that major web crawlers like Google learnt years ago. You rate limit your army of crawlers so that they don’t swamp a site. Instead of hammering a single site from 600 IP addresses at once, crawl 600 separate sites simultaneously. You build your dataset just as fast and you don’t become an internet nuisance that gets blocked by distributed protectors like CloudFlare.

If these idiots run their bot swarm on a CloudFlare protected site a sizable portion of the web will become off limits to them.