The New York Times recently demonstrated how information that appeared in a “fringe” community, such as a parody website, managed to creep into mainstream web communities, such as Facebook and Fox News. Figure 1 depicts how the fake information flowed between the communities involved in this particular example, thus forming a sort of centipede of web communities. This example highlights how fringe communities can influence mainstream web communities by manipulating and spreading unconfirmed information. Before our study, this phenomenon had not been rigorously investigated, and there was no thorough measurement and analysis on how information flows between online communities.
Nevertheless, anecdotal evidence and the aforementioned example suggests that mainstream web communities, such as Facebook and Twitter, as well as “fringe” web communities, such as 4chan and Reddit, are part of the Web centipede of information diffusion. In a recent technical report, we systematically study Twitter, Reddit, and 4chan with respect to the influence they have on each other, aiming to understand their impact on the greater information ecosystem. More specifically, we study the temporal dynamics of URLs from 99 mainstream and alternative news domains as they appear in these three communities.
The three platforms are quite different, and worthy of a brief explanation:
Below, we briefly describe some of our findings from temporal dynamics and influence estimation analyses.
Given the set of unique URLs across all platforms and the time they first pop up, we analyze their appearance in one, two, or three platforms, and the order in which this happens. For each URL, we find the first occurrence on each platform and build corresponding “sequences.” E.g., if a URL first appears on Reddit and subsequently on 4chan, the sequence is Reddit→4chan (R→4). These sequences allow us to study and visualize the information ecosystem by utilizing graph modelling and analysis techniques.
Specifically, we create two directed graphs, one for each type of news, where the vertices represent alternative or mainstream domains, as well as the three platforms, and the edges represent the set of sequences that consider only the first-hop of the platforms. For example, if a breitbart.com URL appears first on Twitter and later on Reddit, we add an edge from breitbart.com to Twitter, and from Twitter to Reddit. We also add weights on these edges based on the number of such unique URLs.
By examining the paths, we can discern the domains whose URLs tend to appear first on each of the platforms. The graphs below show the flow of information involving alternative and mainstream domains.
Comparing the outgoing edges’ thickness, we see that breitbart.com URLs appear first in Reddit more often than on Twitter, and more frequently than they do on 4chan. However, for other popular alternative domains, such as infowars.com, rt.com, and sputniknews.com, URLs appear first on Twitter more often than Reddit and 4chan. Also, 4chan is rarely the platform where a URL first shows up.
For the mainstream news domains, we note that URLs from nytimes.com and cnn.com tend to appear first more often on Reddit than Twitter and 4chan. On the other hand, URLs from other domains like bbc.com and theguardian.com tend to appear first more often on Twitter than Reddit. As is the case with the alternative domains graph, there is no domain where 4chan dominates in terms of first URL appearance.
To estimate how the individual platforms influence the media shared on other platforms, we use a statistical model known as Hawkes processes. This method enables us to say with confidence that a particular event (i.e., posting of a URL) is caused by a previously occurring event. For example, we can be confident that an event that happened on 4chan’s /pol/ board (i.e., a URL is posted in a 4chan thread) resulted on an event on Twitter (i.e., the same URL is posted on Twitter), thus denoting influence from 4chan to Twitter. This is particularly useful for modelling the influence of the three platforms, since they are not independent; each one is influenced by the others as well as the greater Web (more details about the Hawkes methodology can be found in the manuscript).
For instance, there are some examples of crazy conspiracy theories that originate from “fringe” communities and have an impact on other web communities, sometimes even ending up being reported by both mainstream and alternative news users.
In our experiments, we measure the influence of the 6 selected subreddits from Reddit (The_Donald, politics, worldnews, AskReddit, conspiracy, news), /pol/ board from 4chan and the Twitter platform. Figure 2 shows the estimated total impact of the three platforms on each other, for both mainstream URLs (M) and alternative URLs (A).
Twitter contributes heavily to both types of events on the other platforms, and is in fact the most influential single source for most of the other communities. After Twitter, The_Donald and /pol/ also have a strong influence on the alternative URLs that get posted on other communities. The_Donald has a stronger effect for alternative URLs on all communities except Twitter. Yet it still has the largest alternative influence on Twitter, causing an estimated 2.72% of alternative URLs tweeted. Interestingly, we observe that The_Donald causes 8% of /pol/’s alternative URLs, while /pol/’s influence on The_Donald is less, at 5.7%. For the mainstream URLs the strength of influence is reversed. Specifically, /pol/’s influence on The_Donald is 8.61%, whereas The_Donald’s influence on /pol/ is 6.13%. The following figure shows the percentage of influence for each combination of communities.
This work is a first attempt at characterizing the dissemination of mainstream and alternative news across multiple social media platforms, and to estimate a quantifiable influence between them. To achieve this, we analyze thousands of URLs obtained from millions of posts in Reddit, Twitter and 4chan. Ultimately, we find that fringe communities have a surprisingly large affect on well-known platforms.
That said, there is certainly more to learn. In the future, we aim to include other information modalities in our analysis, namely textual and image characteristics, to further assess the influence between communities.