Exploratory Analysis of Fact Checks and URLs in Community Notes on X

Published on Mon May 05 2025Aatman Vaidya

Tags:

We wanted to understand the role, if any, played by fact check websites in community notes on X. The data for community notes and user rating can be downloaded officially from here and fields of the data have been documented here. The data for notes is available from 28th Jan 2021 to 25th Apr 2025. The total number of notes in this timeline are approximately 1.85 million. This was a short exploratory analysis to understand possible research directions.

NOTE - We used the urlextract python library to extract URLs from each community note and tldextract to find domains of the URLs. While the tools work well in most cases, they may have limitations—particularly with complex or unusual URLs. Please keep this in mind when interpreting the figures/numbers presented below.

We started off by trying to find which websites or domains are linked in a note? The top 20 website domains that are referenced in community notes can be seen below.

TopDomains

We found that 81.23% of all the community notes had URLs in them. Of all the posts that had a URL in them, 30.81% posts had two or more links in them. Majority of those URLs were of X itself (x.com and twitter.com). Wikipedia came second with 6.12%. This was followed by Youtube, AP News, Google, BBC, Reuters, Instagram etc.

Next, we repeated the same analysis to find the number of community notes which have International Fact-Checking Network (IFCN) websites included in them. We collected the list of active IFCN Signatories and used them to manually create an array with their domains. As of May 2025, there are 159 active IFCN signatories, while manually creating the list we were only able to access the domains of 157 websites of which 19 are India based IFCN signatories. It is important to note that many IFCN signatories also publish news stories alongside their fact-checking reports. The graph below is aggregating mentions of websites which are also IFCN certified fact checkers but these include news articles on the website that are not fact check articles.

We found that a total of 2.85% of community notes contained IFCN links in them, below is the distribution of top 20 most linked websites, that are also IFCN signatories, in notes.

TopDomainsIFCN

We then ran a similar analysis as above for the 19 active India based IFCN signatories. We found that all 19 domains were referenced in community notes. In total, 2,003 URLs from these domains were present, accounting for approximately 0.08% of all URLs extracted from the complete set of community notes.

Wherever possible, we have attempted to distinguish between fact-checks and news stories. For example, in the case of thequint.com, fact-checks typically appear under the URL path thequint.com/news/webqoof, allowing us to identify them better. However, for others like factly.in, it is more challenging to differentiate between fact-checks and regular news articles based solely on the URL. In such cases, we have acknowledged these limitations and counted content accordingly wherever clear distinctions could be made.

In our analysis, we focused only on currently active India-based IFCN signatories. As a result, some well-known fact-checking websites such as Alt News, The Logical Indian, and Factchecker are not included in the dataset.

Here is a distribution with counts of India based IFCN websites.

DomainType of linksCount
factly.inFact Checks and Stories328
boomlive.in/fact-checkOnly Fact Checks298
indiatoday.in/fact-checkOnly Fact Checks280
newschecker.inOnly Fact Checks244
thequint.com/news/webqoofOnly Fact Checks230
factcrescendo.comOnly Fact Checks200
dfrac.orgFact Checks and Stories167
newsmeter.in/fact-check, newsmeter.in/ai-deepfakeOnly Fact Checks102
youturn.in/factcheck/Only Fact Checks49
vishvasnews.comFact Checks and Stories30
thip.media/health-news-fact-checkOnly Fact Checks25
ptinews.com/fact-detailOnly Fact Checks21
telugupost.comFact Checks and Stories10
newsmobile.in/nm-fact-checkerOnly Fact Checks9
firstcheck.inFact Checks and Stories3
digiteye.inOnly Fact Checks2
thelallantop.com/factcheckOnly Fact Checks2
manoramaonline.com/fact-checkOnly Fact Checks2
medicaldialogues.in/fact-checkOnly Fact Checks1

Possible Future Directions:

  • Cross tabulating with other data sources such as Google Claim Review can help identify only fact checks from IFCN certified domains.
  • Topic analysis of all notes that include links to IFCN-affiliated websites to understand the broader topics, discussions, or contexts in which fact-checking sources are cited. Additionally, we could also compare user approval ratings for notes that include IFCN links versus those that don’t.

An older version of this blog was published on 30th April. Based on feedback from some early readers on the number of links for specific sites, we have updated the numbers. The Tattle blog, unlike our peer reviewed papers and reports, are intended as updates about work-in-progress. Blogging about intermediate results is a part of our ethos of working in the open. But, we realize that data analysis that is 'work-in-progress' is more open to misinterpretation than 'work-in-progress' software. Moving forward, we'll ensure that all data analysis that is published as work in progress has a note stating this at the top of the blog.

Here is what changed between the first version of the blog and this one:

  • In the first version, when extracting domains from the community notes, we didn’t separate the fact checking specific path from the domain name. For instance the website domain could be the-true-guys.com and their factcheck articles could be on the path the-true-guys.com/factcheck-articles/ In this current version, where it was easy to find the fact checking specific path, we have analyzed those. In the table, we have listed when the count refers only to fact-check reports and when it includes both fact-checks and news articles.
  • We updated the list of IFCN websites in India to reflect active signatories, as of May 2025- the previous analysis included websites that were no longer signatories and excluded some that were.

Related Posts

Finding Similar Videos Efficiently
Data Science blog on finding similar videos in Tattle's archive by Feature Selection of anchor frames
Contributing to Shell Server
Shell Server is the single point of contact for all the different services that tattle builds. This blog post describes the system architecture and lists resources helpful to anyone looking to contribute to the Shell Server.
Topic Modelling on Fact-Checked Stories
This notebook builds LDA topic models on the headlines of 13,000+ fact-checking stories in the Tattle archive.
Text and illustrations on the website is licensed under Creative Commons 4.0 License. The code is licensed under GPL. For data, please look at respective licenses.