Jim Collinsworth’s Techdirt Profile

Mar 31, 2026 @ 05:53am

Let's save 2022 Archive, and stop.

on Blocking The Internet Archive Won’t Stop AI, But It Will Erase The Web’s Historical Record

The Internet Archive ~2022 should be everyone's trusted source given where AI is rapidly taking online content, eventually even the archives will degrade. Lets just use static info before 2022 going forward, it would handle 99.99% of most questions. And do we really want to save whats happened after 2022? maybe a break is ok. So someone (not me, its big) should package up a 2022 archive copy (100 pentabytes big), add an LLM for access and magic. Scroll like it's 2022, none of that annoying and increasingly made up post-2022 information to worry about. Package up is more than just a copy, needs to be distributed, secured, trusted. Maybe some use for blockchain to ensure no actors can ever affect the archive and it remains a reliable and incredibly useful repository of world knowledge up to 2022. Internet 3.0, first and last release.

Feb 18, 2025 @ 06:57am

Elon, master of social engineering

on In A Monday Night Declaration, The White House Admits Musk And DOGE Violated The CFAA (Although They Might Not Realize It)

Musk has used social engineering masterfully, simply talked people into giving him access to our most sensitive data. Right in front of the entire worlds eyes. CFAA is worth a try.

Dec 04, 2024 @ 07:39am

Generative AI systems are very modular

on Why Generative AI’s Lack Of Modularity Means It Can’t Be Meaningfully Open, Is Unreliable, And Is A Technological Dead End

The article confuses the difference between systems and models (LLM). It's correct that each foundation model (Grok, Gemini..) is kind of monolithic and difficult to change, but this new age of generative AI is as open as everything before it. A 'system' can use one or more LLMs as components, but any non-trivial system will have hundreds components, often full of open-source. Applications still need data to work with, it doesn't all just get generated. To a developer LLMs are just another available super-function to me to use within a system. Maybe used simply to change format of a date, generate test data, or to summarize notes from a police interview. The system could be a very simple chat application, grabbing user input, send to the model, give results back. Good for a demo or two. But more common now and powerful is splitting a request into multiple calls to the model along with calling traditional processing and database operations. This in itself is 'modularity'. There is already a thriving market of open source frameworks and tools for Generative AI. Finally, most foundation models are open, take a look at https://huggingface.co/ to see how the community shares AI models along with the source data used. All of these models can be built upon and customized. I'm a software developer since 1980 and working with AI now. From punch cards -> personal computers -> internet -> AI

Sep 26, 2024 @ 07:14am

do these AI bills apply to the government too?

on Newsom’s Unconstitutional AI Bills Draw First Amendment Lawsuit Within Minutes Of Signing

General question, how do this and other AI bills apply to government use? Business abuse of AI might just result bad advertisements, government abuse (with immunity of course) leads to individual ruin. AI laws are to ensure consumers and business will have to use safe models, disclose use of AI, use ethically and verify results. Guaranteed, under penalty of law. AI developers will only build safe systems.

May 10, 2024 @ 07:54am

AI will also reduce resource usage, not just increase it

on CEO: ‘AI’ Power Drain Could Cause Data Centers To Run Out Of Power Within Two Years

-It's not just all new resources and applications, AI will replace and significantly alter existing systems. Some product categories and systems will completely disappear. -Smartphones will (should) be where much of the AI processing occurrs, local on the device, secure, private, and not using the cloud. -New dedicated AI chips and related software to run and share models will reduce costs, won't just be millions of Nvidia cards. -Existing data centers will be repurposed/refactored for AI, so thats not adding power requirements. -Maybe we won't all be saving/creating as much content, so that cloud storage goes down. We'll consume more dynamically generated content. (sad)

Jan 29, 2024 @ 01:34pm

on Well, That’s Everyone: Senator Wyden Letter Confirms The NSA Is Buying US Persons’ Data From Data Brokers

So why would it need to purchase something it can obtain (more legitimately[?]) from its own dragnets and risk having part of its collection techniques exposed?

The purchased data would provide an excellent source to cross check internal data collection, and fill in gaps. But I'll agree with the author, they do it because they can. And now that they have, the full force of the federal government will protect that ability.

Sep 13, 2023 @ 08:45am

FTC could probably take down any tech company with their power and process

on DOJ To Court: Here Are The Many, Many Reasons Why The FTC Can & Should Be Investigating Elon Musk’s Handling Of User Data

This can happen to anyone/company that the FTC or federal government sets its sights on. Musk/Twitter is just an unlucky, easy target now. Every reasonable-sized company using technology has broken some federal law on data handling. As someone in the software industry, everyplace has technical debt, process flaws, and human error. Most of this is never exposed. Thru an initial data leak (common now) a company gets forced into a consent decree, now has to follow many more stricter data use and reporting requirements, under penalty of federal law. Now the FTC can continuously escalate with more requirements, additional penalties. Hundreds of professionals producing and reviewing thousands of documents, involving multiple courts for years now. In the end of course it's just a cash fine, maybe enforced. Got to be a better, cheaper way to get the company to behave in the first place.

Aug 22, 2023 @ 10:25am

On the other hand, this might help slow down mis-information

on Today In ‘Here’s How Elon Is Making ExTwitter Worse,’ He’s Going To Get Rid Of Headlines & Snippets For News

Automatically copying the context of a link makes it effortless to promote misinformation, the headlines are always designed for maximum impact, good or bad. Making the re-poster think for a minute about the link and type in their own interpretation sounds like a possible improvement.

Jun 21, 2023 @ 11:05am

Then just tell me what others paid last month

on Comcast Tells FCC It’s Just Too Hard To Explain Its Bullshit Broadband And TV Fees To Consumers

Too hard to tell me what I will pay? Just as good to me to see what everybody else is paying. Make the providers show de-identified, current, real account data instead. Before anyone purchases or wants to research prices, they can see a random sampling of actual current invoices, within their region. Easy for the provider, don't have to deal with any government regulated fee schedule or documentation, just have to be transparent and show the relevant data to the consumer. Consumers will see examples of actual bills they could receive, I would take that information over any marketing or regulated reporting.

May 25, 2022 @ 01:04pm

Spam and false accounts are not necessarily content moderation

on Very, Very Little Of ‘Content Moderation’ Has Anything To Do With Politics

I think it is misleading to call all this moderation. The risks and costs associated with dealing with the 3.4 billion spam messages and fake accounts is very different then all the other ~130 million messages. There is much more automation involved with spam/fake account management. Facebook stating they deleted 1.8 billion spam messages doesn't mean much, that's the easy part. Spam management has been essentially solved for years in email. Likewise there is automation for much of fake account management. If this was difficult Facebook would not have been able to delete 1.6 billion fake accounts. Also the two are linked as spam is often sent by fake accounts. That leaves the ~130 million messages containing sexual activity, nudity and physical abuse, terrorism, violent and graphic content, suicide and self-injury, hate speech or bullying and harassment. None of this type of content can be automatically filtered reliably. All of this type of content can be political. This is where moderation is done, using a combination of technology and humans that can evaluate each message, understanding the content, context and history, and then make a determination. Still a big unsolved problem, and haven't even got into misinformation moderation.

Jim Collinsworth's Techdirt Profile

About Jim Collinsworth

Jim Collinsworth's Comments