FAST: Predictive Web Analytics: How does it work?

The FAST (Forecast and Analytics of Social Media and Traffic) platform is based on research by Carlos Castillo (QCRI), Mohammed El-Haddad (Al Jazeera), Jürgen Pfeffer (CMU), and Matt Stempeck (MIT). The software has been developed at QCRI by Carlos Castillo and Kiran Garimella.

Predicting user behavior online is a well-established research topic, and in our paper we acknowledge at least 10 previous works that have done other types of predictions on the web including number of comments, tweets, votes, links, etc.

We focus specifically on a relevant problem for a news website: predict 3-day pageviews shortly after a news article is posted on the web. Our system works through a series of steps:

How good are the predictions?

Most new articles exhibit fairly predictable trajectories, almost like a ballistic trajectory, with visits per minute going up and then down following a smooth curve. However, not all articles start with the same speed or generate the same reaction in the audience.

The accuracy of the prediction improves as time passes. Naturally, the more we wait for an article to accumulate pageviews and social media reactions, the better the prediction quality. At the same time, the value of such predictions decreases with time. There is a sweet spot between having early (but less accurate) predictions and having late (but more accurate) predictions.

In our platform, that sweet spot is somewhere between 1 and 6 hours. The majority of news articles have a fairly predictable behavior in which visits slow down rather quickly. For those articles, the predictions after 1 hour already provide valuable hints about whether the article will be a high-traffic one. After 6 hours, we have a clear picture of the ordering of articles and rarely are more than 50% off the mark for high-traffic articles.

What happens if reader habits change?

Our system continuously learns using the input from new articles, and over time the weight of older articles in the algorithm goes to zero.

What happens if a slow-burning story catches fire later?

Predictions are revised every few hours to incorporate new events, such as having a link from a high-traffic web page, or having a new cascade of activity in social media.

Does this system replace the editor's work?

Absolutely not. Editors should rely on their own knowledge and instincts. Editors that additionally take into account the shifts in interest of their visitors may take better, more informed, editorial decisions.

This provides guest access for testing purposes to predictions using a sample of data from Al Jazeera English. Note that some information is only visible to logged-in users.

For inquiries, including how we can predict traffic to your website, please contact Carlos Castillo: