In this third and final installment of my series of blog posts on research about how information spreads online, I cover the latest study from Stanford University, implications and findings for the tech sector, and offer general takeaways and advice for marketers.
First, here is a quick wrap on the first two posts: In my first post I discussed an article in the New York Times that led to my initial interest and interview with Jure Leskovec, a Stanford researcher referenced in the article. The second post describes the Stanford researcher’s test bed and early research. The Stanford team identified six distinct popularity curves, i.e. information dissemination patterns that were functions of what type of site broke a story.
The third study is described in the paper Modeling Information Diffusion in Implicit Networks – it builds on the earlier research. The researchers wanted to see if it was possible to infer networks of diffusion and influence without knowing about the network topology.
Taking this approach was an important breakthrough. Using the viral analogy, it is possible to track a disease outbreak by seeing who gets infected – but it is very hard to know the source of contagion and exactly who infected whom, and in what order. Similarly, in the online world, it is relatively easy to see who is “infected” with a meme – that is, you can track who mentions something – but it is hard to know whether the person was first exposed to the information by reading a blog, hearing a news cast, talking to a friend on the phone etc.
The researchers wanted to estimate a popularity curve for each media type and see if it was possible to accurately predict how a story spreads on-line based on where it first appears. So they developed algorithms that inferred underlying social networks, and also discovered that the node can have a dramatic and predictable impact on how many others mention the same thing over time.
“If you can see who mentioned the news today, we can accurately predict how many will mention it tomorrow,” said Jure Leskovec, the Standford researcher. This was pretty impressive, because, as Jure said, there are a “super exponential” number of possible connections and networks; developing such an algorithm was a real breakthrough.
Of course, there are other important factors that determine how far, fast and wide a story spreads. The researchers were careful to take into account variables such as freshness of information, novelty, and imitation – i.e. a certain number of people or sites will echo information because they see that it is becoming popular. Also, they learned that influence can vary based on the topic, and explored relative influence for technology, politics, business, sports, nation and entertainment (see the chart below).
Research Conclusions and Source Data
So what data did the researchers review, which sites were most influential, and what can we learn from their work?
My second post detailed the test bed used. In terms of this latest paper and study, the researchers explored a massive body of Web data over time – 500 million tweets and 170 million articles. They categorized sites as newspapers, professional blogs (e.g. Salon, Huffington Post) TV stations, news agencies (e.g. AP, Reuters) and personal blogs, and tracked the influence of specific sites.
The team also evaluated how influence works on Twitter. They explored adoption of hash tags over a set of 10,000 users, categorized tweeters into 100 groups of 100 users each, and ranked them based on activity and volume of followers.
Here are some specific takeaways – the following bullets are excerpted from the paper
- ...These results suggest that there are a relatively small number of media sites that have large influence on the adoption of textual phrases
- …[They] align well with the two-step theory of information flow … as the information and influence “flows” from the mass media through opinion leaders to the public.
- … the influence of bloggers tends to be lower at start, but tends to last longer (in particular for entertainment and technology). This confirms the intuition that blogs tend to be echo chambers while mainstream media play the dominant force in the news cycle.
- This is further confirmed by the fact that politics, business, technology and the nation tend to be dominated by news agencies. Professional blogs are the second in terms of total influence in politics and national news, newspapers are the second in business, and personal blogs are in technology.
- …we find the strong influence of the USA Today on technology [edging out the New York Times and Wall Street Journal] to be surprising.
- …We find the half-life to be 32.2 hours, which… suggests that people consume news on a daily basis.
- …we find that the mainstream media holds the most influential position in the dissemination of news content.
- …On the other hand, hashtags on Twitter are a very different type of contagion… our results… suggest that users with the highest follower count are not the most influential in terms of information diffusion. Rather, users with the number of followers of around 1,000 tend to be most effective in diffusion and adoption of hashtags.
The bottom line for marketers? I asked Jure this and he said: “Our research lets you see the effects of online media… we are making it a hard science, that you can measure and quantify precisely… the influence curve gives you some sense, depending on what topic you have, of where you may want info to appear so the story spreads nicely.”