Who’s viewing my blog?

Like many bloggers, I’m interested in whether my blogs are being read. Most blogging platforms, including Blogger and WordPress, have some kind of interface for looking at your blog’s statistics. When I looked at the stats for my site Protists in Singapore, it seemed that there was some kind of pattern to the numbers. I wanted to investigate further, but with the tools available on the WordPress dashboard, there’s not much analysis that one can do.

Fortunately, it is possible to export pageview data from a WordPress blog. You’ll need to send a query to the WordPress stats server. This forum post explains how. The only drawback is that API keys are being phased out by WordPress.com and are no longer distributed with new WordPress accounts.

I set up my WordPress account before that, so I could use this method to obtain my pageview statistics. The output is a .csv file with two columns: ‘date’ and ‘views’. I imported this .csv file into the statistical computing environment R for further analysis. What’s great about R is that it’s freely available, and there are various packages for different types of specialized functions. There’s also plenty of tutorial material floating around the web. I referred to a few of these: a time-series analysis intro from the University of Göttingen (pdf), and lecture notes from the University of Bristol.

My pageview data is parsed by day: each ‘view’ number represents the total views on a given day. Put together it is a series covering more than 600 days, i.e. more than 1 and a half years of the blog’s availability.


As you can see there is a small but consistent pattern of traffic. There are two spikes around week 40 and week 60. The pattern looks noisy but possibly periodic. But first, is there any long-term trend? I smoothed out the time-series using a filter, i.e. taking a running average over 21 days, and overlaid this smoothed line on the original plot.

smoothedSome warm periods and some cold periods, but nothing that I would call a directional trend. Have to work harder on the marketing….

Another exploratory tool is the correlogram, i.e. a plot of autocorrelation against lag. When two random variables are correlated, it means that they are not independent, but the values are related to each other (dependent) in some way. In autocorrelation, we deal with pairs of points along our time series, spaced a certain width (“lag”) apart. For example, if we calculate autocorrelation for a lag of 7 days, we are trying to see if our values in the series show any dependence on the values from one week ago. By plotting the correlation coefficients for a range of lag values, we can identify dependencies in time.

acfIn this correlogram, the blue dashed lines represent the 95% confidence interval. Values beyond the interval are statistically significant (i.e. the probability that you’d get a value beyond this interval by chance alone is only 1/20). Some features: Autocorrelation at lag = 0 is 1, which will always be true because values have to be perfectly correlated with themselves! There is a significant ACF at lag = 1 day and 2 days, suggesting that the previous day’s traffic could have predictive values for the next day’s web traffic. There are significant “peaks” at 1 week, 2 weeks, 3 weeks (etc.), suggesting that there is some sort of weekly pattern in the pageviews. A look at the raw data shows that weekends have lower traffic than weekdays. Why would people be browsing my protist blog on weekdays?

The answer is revealed by the top search engine terms that bring people here:

euglena 233
euglena animation 220
scenedesmus 185
amoeba 182
cyanobacteria 107
euglena image 106
euglenids 94
vacuole 93
vorticella 86
ciliate 86
trachelomonas 79
ochromonas 77
opercularia 76
heliozoans 74
halteria 71
why are protists important 68
closterium 67
scuticociliate 66
litonotus 63
cinetochilum 58
what are protists 55
staurastrum 55
oxytricha 52
diatoms 51
flagellar movement in euglena animation 48

People want to know “what are protists”, and want to find pictures of particular species. My suspicion is that the people visiting my page are mostly students who have to look up these organisms for school work. That would explain the pattern of web traffic, and also the search terms that bring them there!


One comment on “Who’s viewing my blog?

  1. […] two peaks every year, once in the winter and once more in the spring. I’ve already noted a similar pattern on my protist website, and I believe that the same explanation applies. This is a search term that […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s