Community Notes Is A Useful Tool For Some Things… But Not As A Full Replacement For Trust & Safety
from the it's-a-tool,-not-the-tool dept
When Twitter first launched what it called “Birdwatch,” I was hopeful that it would turn into a useful alternative approach to helping with trust & safety/content moderation questions, but I noted that there were many open questions, in particular with how it would deal with malicious actors seeking to game the system. When Elon took over Twitter, he really seemed to embrace Birdwatch, though he changed the name to the pointlessly boring “Community Notes.”
I still think the concept is a good one, and think it’s one of Elon’s few good moves. I think other social media sites should experiment with some similar ideas as well.
The problem, though, is that Elon seems to think that Community Notes is an effective replacement for a comprehensive trust & safety program. At the heart of so many of Elon’s decisions in firing the vast majority of the company’s trust & safety staff was that “Community Notes can handle it.”
As we’re in the midst of a series of major crises around the globe, where the flow of information has proven incredibly important, one thing we’re clearly learning is that Community Notes is not up to the task. Just to drive this point home, over the weekend Elon himself posted some fucking nonsense (as he’s prone to do) and many hours later Community Notes pointed out it was hogwash. Elon, as he’s done in the past when he’s been “Noted,” claimed he was happy it happened to himself… before claiming that his post was “obviously a joke meme” and that “there is more than a grain of truth to it.”

So, first of all, there isn’t “more than a grain of truth to it.” The whole thing is simply false. But, more importantly, if you look at the top replies to his “obviously a joke meme,” suggests that Elon’s biggest fans did not, even remotely, think that this was “obviously a joke meme,” but rather took it entirely seriously, cheering him on for “telling the truth.” Here’s just one of the top replies to his original tweet:

Also, it took quite some time for the note to appear on Elon’s account. And, look, content moderation at scale is impossible to do well and all that, but Community Notes seems like the exact wrong approach in situations like this one. Especially at a time when the accounts pushing out the most viewed news these days seem to be made up by a combination of grifters and idiots:
Online we have seen many users of X describe their experience of this crisis as different. Some of that may result from the more ambiguous nature of the larger conflict, especially as the news cycle moves from the unambiguous horror of the initial attack to concerns about Israel’s response. However, our investigation here suggests an additional factor: in Musk’s short tenure as owner of the platform, a new set of news elites has emerged. These elites post frequently, many sharing unvetted content and emotionally charged media. While sharing no single political ideology, many embrace a similar culture of rapid production of unlinked or ambiguously sourced content, embracing a “firehose of media” ethos that places the onus of verification on the end-user. This occurs in an environment that has been shorn of many of the “credibility signals” that served to ground users in the past — checkmarks that indicated notability, fact-checks distributed through Twitter Trends, and Twitter/X-based labeling of deceptive content. Even fundamental affordances of the web — such as simple sourcing through links — have been devalued by the platform, and, perhaps as a result, by the new elites that now direct its users’ attention.
Leaving aside the significant concern of taking away professional, trained trust & safety employees, and replacing them with random (often hand-picked) untrained volunteers, there are serious concerns coming to light about how Community Notes actually works in practice.
Multiple reports have come out lately highlighting the limitations of Community Notes on important breaking news in the midst of various conflicts around the world, where you have malicious actors seeking to deliberately spread misinformation. A report at Wired found that Community Notes is actually making some of the problems worse, rather than better.
On Saturday, the company wrote on its own platform that “notes across the platform are now being seen tens of millions of times per day, generating north of 85 million impressions in the last week.” It added that thousands of new contributors had been enrolled in the system. However, a WIRED investigation found that Community Notes appears to be not functioning as designed, may be vulnerable to coordinated manipulation by outside groups, and lacks transparency about how notes are approved. Sources also claim that it is filled with in-fighting and disinformation, and there appears to be no real oversight from the company itself.
“I understand why they do it, but it doesn’t do anything like what they say it does,” one Community Notes contributor tells WIRED. “It is prone to manipulation, and it is far too slow and cumbersome. It serves no purpose as far as I can see. I think it’s probably making the disinformation worse, to be honest.”
The report isn’t just based on random Community Notes users, but looking more closely at how the program works, and the ability for it to be gamed. Wired found that it wasn’t difficult to set up multiple accounts controlled by one person which all had access to Community Notes, meaning that you could manipulate support for a position with just a small group of users controlling multiple accounts.
It also points to earlier (pre-Elon) research that showed that (then) Birdwatch wasn’t used nearly as much for standard fact checking, but was used in political debates by users who disagreed politically with someone who had tweeted.
Back during the summer, the Poynter Institute had a good analysis of the limitations of Community Notes for dealing with real-time misinformation campaigns during crises. Specifically, the design of the current Community Notes has some, well, questionable assumptions built in. Apparently, it looks over your tweeting history and assigns you to a camp as being either “left” or “right” and then only allows a Community Note to go public if enough of the “left” people and the “right” people agree on a note.
“It has to have ideological consensus,” he said. “That means people on the left and people on the right have to agree that that note must be appended to that tweet.”
Essentially, it requires a “cross-ideological agreement on truth,” and in an increasingly partisan environment, achieving that consensus is almost impossible, he said.
Another complicating factor is the fact that a Twitter algorithm is looking at a user’s past behavior to determine their political leanings, Mahadevan said. Twitter waits until a similar number of people on the political right and left have agreed to attach a public Community Note to a tweet.
While that may work on issues where there isn’t any kind of culture war, it’s completely useless for culture war issues, where plenty of disinformation flows. Indeed, the Poynter report notes that a huge percentage of the highest rated Community Notes inside the Community Notes system are never seen by the public because they don’t have “cross-ideological agreement.”
The problem is that regular Twitter users might never see that note. Sixty percent of the most-rated notes are not public, meaning the Community Notes on “the tweets that most need a Community Note” aren’t public, Mahadevan said.
The setup with “cross-ideological” consensus basically seems almost perfectly designed to make sure that the absolute worst nonsense will never have Community Notes shown publicly.
Meanwhile, a report from NBC News also highlights how even when Community Notes is able to help debunk false information, it often comes way too late.
NBC News focused on two prominent pieces of Israel-Hamas misinformation that have already been debunked: a fake White House news release that was posted to X claiming the Biden administration had granted Israel $8 billion in emergency aid and false reports that St. Porphyrius Orthodox Church in Gaza was destroyed.
Only 8% of 120 posts related to those stories had published community notes, while 26% had unpublished notes from volunteers that had yet to be approved. About two-thirds of the top posts NBC News reviewed had no proposed or published Community Notes on them.
The findings echo what a Community Notes volunteer said was X’s lack of response to efforts to debunk misleading posts.
“All weekend we were furiously vetting, writing, and approving Community Notes on hundreds of posts which were demonstrably fake news,” Kim Picazio, a Community Notes volunteer, wrote on Instagram’s Threads. “It took 2+ days for the backroom to press whatever button to finally make all our warnings publicly viewable. By that time… You know the rest of that sentence.”
And when the Community Notes don’t show up until much later, a ton of nonsense can spread:
A post about the debunked White House news release published by a verified account had nearly 500,000 views and no proposed or appended note Tuesday afternoon.The Community Notes system also showed that a user tried to submit a fact-check Sunday on another post including the same known misinformation but that it had yet to be approved, saying, “Needs more ratings.” The post had accrued 80,000 views since Sunday.
In a search for St. Porphyrius Orthodox Church in Gaza, only five Community Notes had been applied to the top 42 posts echoing the debunked misinformation. Several posts from verified users with no notes repeated the claim and got over 100,000 views, while 13 Community Notes had been proposed on posts of the debunked claims but had not yet been approved for publishing.
Another deep dive look at how Community Notes handled the first few days of the Israel/Palestine mess showed just how ineffective it was:
During the first 5 days of the conflict, just 438 Community Notes (attached to 309 posts from 223 unique accounts) earned a “HELPFUL” rating and ended up being displayed publicly to users. Although it’s impossible to know what percentage of content about the war this represents, the fact that trending topics related to the conflict have routinely involved hundreds of thousands or even millions of posts suggests that a few hundred posts is just a drop in the bucket. The visible notes were generally attached to popular posts — the 309 posts in question earned a combined total of 2147081 likes, an average of 6948 likes per post. The majority of the posts that earned Community Notes (222 of 309 posts, 71.8%) came from paid X Premium/Twitter Blue subscribers, and the majority of the accounts posting them (147 of 223, 65.9%) are X Premium subscribers, who are potentially earning a share of X’s ad revenue based on the number of times their posts are seen and who therefore have a financial motive to never delete misleading content. (Overall, roughly 7% of posts that received Community Notes were deleted during the period studied, but there’s no reliable way of knowing how many of these posts were related to the Israel/Hamas war.)
Again, I really like the concept of Community Notes. I think it’s a very useful tool — and one example (of many) of trust & safety tools beyond simply “taking down” content. But it needs to be part of a wider strategy, not the only strategy. And, the program can’t be setup with such a huge blindspot for culture war issues.
But, that’s exactly how things currently work, and it’s a shame, in part because I fear it’s going to discourage others from creating their own versions of Community Notes.
Filed Under: birdwatch, community notes, content moderation, crowdsourcing, disinformation, elon musk, trust & safety
Companies: twitter, x