Twitter, Reddit and 4chan: The Web's Fake News Centipede

The New York Times recently demonstrated how information that appeared in a “fringe” community, such as a parody website, managed to creep into mainstream web communities, such as Facebook and Fox News. Figure 1 depicts how the fake information flowed between the communities involved in this particular example, thus forming a sort of centipede of web communities. This example highlights how fringe communities can influence mainstream web communities by manipulating and spreading unconfirmed information. Before our study, this phenomenon had not been rigorously investigated, and there was no thorough measurement and analysis on how information flows between online communities.

An example of a Web Centipede

Figure 1: An example of a Web Centipede as presented by The New York Times.

Nevertheless, anecdotal evidence and the aforementioned example suggests that mainstream web communities, such as Facebook and Twitter, as well as “fringe” web communities, such as 4chan and Reddit, are part of the Web centipede of information diffusion. In a recent technical report, we systematically study Twitter, Reddit, and 4chan with respect to the influence they have on each other, aiming to understand their impact on the greater information ecosystem. More specifically, we study the temporal dynamics of URLs from 99 mainstream and alternative news domains as they appear in these three communities.

The three platforms are quite different, and worthy of a brief explanation:

  • Twitter. It is a mainstream micro-blogging, directed social network where users can share 140-character tweets to their followers.
  • Reddit. It is a social news aggregator where users can post URLs to content along with a title. Users can upvote/downvote the post. The aggregate of the votes determine the ranking of the post, thus determining the popularity and reachability of the content to the users of the platform. The platform is organized into sub-communities called subreddits. Users can create subreddits, each with their own topic and moderation policy. In our paper, we focus on 6 specific subreddits that substantially contribute to the dissemination of alternative and mainstream news: The_Donald (for Donald Trump supporters), politics, news, worldnews, AskReddit and conspiracy.
  • 4chan. It is an imageboard kind of discussion forum: users create a new thread by making a post with a single image attached, and perhaps some text, in one of several boards (69 as of May 2017) for different topics of interest. Users on 4chan are anonymous and boards are monitored by “janitors“ that ensure conversation remains on-topic. However, janitors are generally not concerned about the language used, hence 4chan is often quite a hostile place. Furthermore, there are several examples that highlight 4chan's impact on the Web and the community. For example, a hoax that originated from 4chan about the death of Ethereum's founder resulted in loses of $4 billion in Ethereum's market value. In this work, we looked at the Politically Incorrect board (/pol/), which focuses on discussion of world news and politics.

Below, we briefly describe some of our findings from temporal dynamics and influence estimation analyses.

Temporal Dynamics

Given the set of unique URLs across all platforms and the time they first pop up, we analyze their appearance in one, two, or three platforms, and the order in which this happens. For each URL, we find the first occurrence on each platform and build corresponding “sequences.” E.g., if a URL first appears on Reddit and subsequently on 4chan, the sequence is Reddit→4chan (R→4). These sequences allow us to study and visualize the information ecosystem by utilizing graph modelling and analysis techniques.

Specifically, we create two directed graphs, one for each type of news, where the vertices represent alternative or mainstream domains, as well as the three platforms, and the edges represent the set of sequences that consider only the first-hop of the platforms. For example, if a breitbart.com URL appears first on Twitter and later on Reddit, we add an edge from breitbart.com to Twitter, and from Twitter to Reddit. We also add weights on these edges based on the number of such unique URLs.

By examining the paths, we can discern the domains whose URLs tend to appear first on each of the platforms. The graphs below show the flow of information involving alternative and mainstream domains.

Graph for mainstream news domains Graph for alternative news domains

Comparing the outgoing edges’ thickness, we see that breitbart.com URLs appear first in Reddit more often than on Twitter, and more frequently than they do on 4chan. However, for other popular alternative domains, such as infowars.com, rt.com, and sputniknews.com, URLs appear first on Twitter more often than Reddit and 4chan. Also, 4chan is rarely the platform where a URL first shows up.

For the mainstream news domains, we note that URLs from nytimes.com and cnn.com tend to appear first more often on Reddit than Twitter and 4chan. On the other hand, URLs from other domains like bbc.com and theguardian.com tend to appear first more often on Twitter than Reddit. As is the case with the alternative domains graph, there is no domain where 4chan dominates in terms of first URL appearance.

Influence Estimation

To estimate how the individual platforms influence the media shared on other platforms, we use a statistical model known as Hawkes processes. This method enables us to say with confidence that a particular event (i.e., posting of a URL) is caused by a previously occurring event. For example, we can be confident that an event that happened on 4chan’s /pol/ board (i.e., a URL is posted in a 4chan thread) resulted on an event on Twitter (i.e., the same URL is posted on Twitter), thus denoting influence from 4chan to Twitter. This is particularly useful for modelling the influence of the three platforms, since they are not independent; each one is influenced by the others as well as the greater Web (more details about the Hawkes methodology can be found in the manuscript).

For instance, there are some examples of crazy conspiracy theories that originate from “fringe” communities and have an impact on other web communities, sometimes even ending up being reported by both mainstream and alternative news users.

In our experiments, we measure the influence of the 6 selected subreddits from Reddit (The_Donald, politics, worldnews, AskReddit, conspiracy, news), /pol/ board from 4chan and the Twitter platform. Figure 2 shows the estimated total impact of the three platforms on each other, for both mainstream URLs (M) and alternative URLs (A).

Twitter contributes heavily to both types of events on the other platforms, and is in fact the most influential single source for most of the other communities. After Twitter, The_Donald and /pol/ also have a strong influence on the alternative URLs that get posted on other communities. The_Donald has a stronger effect for alternative URLs on all communities except Twitter. Yet it still has the largest alternative influence on Twitter, causing an estimated 2.72% of alternative URLs tweeted. Interestingly, we observe that The_Donald causes 8% of /pol/’s alternative URLs, while /pol/’s influence on The_Donald is less, at 5.7%. For the mainstream URLs the strength of influence is reversed. Specifically, /pol/’s influence on The_Donald is 8.61%, whereas The_Donald’s influence on /pol/ is 6.13%. The following figure shows the percentage of influence for each combination of communities.

Influence estimation between communities in the OSNs

Figure 2: Estimation of influence between the platforms for alternative and mainstream URLs.

Conclusion

This work is a first attempt at characterizing the dissemination of mainstream and alternative news across multiple social media platforms, and to estimate a quantifiable influence between them. To achieve this, we analyze thousands of URLs obtained from millions of posts in Reddit, Twitter and 4chan. Ultimately, we find that fringe communities have a surprisingly large affect on well-known platforms.

That said, there is certainly more to learn. In the future, we aim to include other information modalities in our analysis, namely textual and image characteristics, to further assess the influence between communities.

Share on Google+
Share on Linkedin
Share on Reddit
Share on Tumblr
comments powered by Disqus