fawzi's Techdirt Profile

About fawzi

Jul 15, 2025 @ 02:30pm

on Tackling The AI Bots That Threaten To Overwhelm The Open Web

The problem is non textual data. Images for example, are not in CommonCrawl. As you are much more protected against copyright and generally infringement claims by just hosting URL, rather than the content itself basically everybody hosts collections composed of lists of URLs, forcing every user to re-download everything. For example coyo has that approach and many of the image are unreachable https://github.com/kakaobrain/coyo-dataset/tree/main/download##missing-images
Jul 15, 2025 @ 11:57am

on The Magical Thinking That’s Killing Our Humanity

I enjoyed your post, but agree with others that it was a bit long and with some repetitions, it would have gained from being more succinct. Still I agree with most of the gist of it.