marketing experience required

the developmental journey of a 21 y/o full stack marketer

Maximize Your Facebook Post Reach with Statistics (in R)

When working with clients on social media marketing projects, I’ve always been fascinated by ways to bring quantifiable results and hard statistics into consultant-client conversations. Seeing numbers and graphs to represent the progress that a brand has made on its social media properties is incredibly reassuring to a client, especially when compared to the vague assertions that other so-called “experts” will pass off as justification for their fees (e.g. “You’re getting a lot more user engagement now that you hired me!”).

I recently saw a great opportunity to pull out some of my old high school AP Statistics learnings when brainstorming ways to improve Facebook post reach. Any page owner could tell you that it’s discouraging to see what a low proportion of their audience actually sees any particular post- in my experience an average (unpromoted) post reaches less than 30% of a page’s audience. While there’s certainly a time and place for paid promoted posts (and other ads) in any brand’s Facebook strategy, it’s important to do everything possible to make sure that every post reaches as many customers as possible.

I’ve heard a number of different ideas for strategies on how to accomplish that- some marketers say to make sure you post a lot of a particular kind of content, like photos. Some assert that it’s all about frequency of posting and post scheduling. A very popular strategy in the live music industry is to have a core street team that will engage with every post on the page- liking, commenting, and/or sharing it. All of these strategies have their own benefits- having an optimal content mix and posting schedule is important from an audience interest and engagement perspective, and making sure that every post is engaged with can create powerful social proof.

But do any of these things actually influence Facebook page reach? I set to find out using some statistical tests (calculated by using the statistical programming language R), and some dreadfully boring data collection work (hint: maybe outsource this bit). I’ll share some of my generalized findings on all this below, but more importantly, here’s how to run a similar test for yourself. By completing these tests, you’ll gain insights that will help with determining post types, scheduling, and seeing if engagement numbers bear significant influence on your reach.


Collecting Post Reach Data

For each page I analyzed, I went through a couple hundred posts (it’s important to get as many data points as possible) and created a spreadsheet that tracked the following variables for each post:

  • Date (for my own referencing purposes)
  • Reach (what we’ll be calculating correlation against)
  • Day of Week
  • Time (make a few categories of time- overnight, early/late morning, early/late afternoon, night, etc)
  • Post Type (text only, link w/ preview, photo, video, event, etc, with separate categories for if the post was shared from another page)
  • # of Shares
  • # of Likes
  • # of Comments
  • # of Pages Tagged in post
  • # of Hashtags used in post

It’s important to note that any promoted posts should be treated as outliers and ignored for the purposes of these calculations.

Here's what one of my data collection spreadsheets looked like

Here’s what one of my data collection spreadsheets looked like

Make sure that your spreadsheet can be saved as a .csv, as that’s how it’ll be imported into R.

Again, it’s important that you have as many data points as possible for your results to be as accurate as possible. I’d recommend collecting a year’s worth of data, assuming that your page posts semi-frequently. Anything older than a year and who knows what changes Facebook has made to their News Feed algorithms. Again, this can be a pretty time-consuming data entry class, so I’d highly recommend outsourcing it to a contractor on oDesk or the site of your choosing.

Analyzing and Interpreting Data using R

Once you have a spreadsheet with all the data for us to look at and play with, it’s only a few lines of code to run significance tests using R, the statistical programming language. If you’re new to R, the good news is that there’s a wealth of fantastic resources for beginners available online (I myself am a complete newbie) and a Google search will find you just about any help you need. You can download and install R from its home page, and I’d also highly recommend the RStudio interface as well.

Once you’ve gotten acquainted and figured out where the heck to type the code in, here’s some things you can plug in and play with:

First things first, let’s import your .csv spreadsheet and define it as a variable. Mine is called “FBDATA.csv”, and I’m naming the variable FBDATA (please note that variables are case-senstitive in R).


Now we have to create a second variable that formats our .csv as a data frame, so that we can pull our variables from it.


One last piece of formatting-type stuff- we need to pull out each individual variable from our spreadsheet, and define it as a vector inside R. Note that the variable names you define (stuff on the left side of the arrow) is up to your personal choice, and column names you’ll be pulling from your spreadsheet (stuff after the $) is dependent on what you typed in your actual spreadsheet. Let’s do it:




See what’s going on here? Pretty basic- the variable names goes on the left side of the <- arrow, “as.vector” is the command to format what you want, and what you’re formatting is a column inside of your spreadsheet’s data frame (DATA.FRAME$Column.Name).

Now to actually look at the data and see if we can figure out- does any of this stuff actually influence how many people your posts are reaching? We’ll use two tests to do so- cor.test (looking for Pearson’s product moment correlation coefficient) on any variables that are numeric/quantitative (# of Shares, # of Likes, # of Comments, # of Pages Tagged, # of Hashtags used), and oneway.test (an Analysis of Variance test) on categorical variables (Post Type, Day of Week).

Let’s plug in some quantitative variables first- lets look at Likes vs. Reach


Notice the “$p.value” bit above- this is telling R to only output the resulting p-value of the cor.test, just to make things cleaner for you to look at. If you know some statistics, or are trying to impress somebody, you can omit that part and see the full results including all sorts of fancy stuff.

3183288543_ce318ba7cb (1)With the $p.value, you’ll see a single number. Mine was .9424952, which means there’s a high correlation between the reach of the posts and the number of times it’s been liked (1 is total positive correlation, 0 is no correlation, and -1 is total negative correlation).

Here’s an important failure of this experiment though- correlation does not imply causation. Sure, getting more likes on a post could increase its reach, but isn’t it potentially even more possible that a post that reaches more people will get liked by more people? We can’t prove that any type of engagement (likes, comments, or shares) increases post reach, we can only calculate the correlation and see if there is one.

We can however, test variables that we set and control BEFORE a post’s reach is determined– # of pages tagged, # of hashtags used, post type, and day of week published. None of these things could be changed by how many people a post reaches, so they’re good variables to test here. Let’s check out the correlation between # of pages tagged and # of hashtags



For these tests, I got values back of .536545 and .4352989, respectively, implying some positive correlation, but not necessarily a very strong one. From a marketing standpoint, I could say that it certainly can’t hurt to tag pages and use hashtags in my Facebook posts, and I might even be more encouraged to use them regularly after seeing these numbers.

Now let’s try using oneway.test to look at categorical variables- this one isn’t going to be as straightforward to explain, so bear with me and I’ll make sure to quote people smarter than me to explain the math. Easy part first- plugging in the code:

oneway.test(Reach ~ Post)

7106278403_e29e21f71e_nLooking at the output here, you’ll see another p-value, but this one means something very different. Aforementioned quoting incoming:

The P value tests the null hypothesis that data from all groups are drawn from populations with identical means. Therefore, the P value answers this question:

If all the populations really have the same mean (the treatments are ineffective), what is the chance that random sampling would result in means as far apart (or more so) as observed in this experiment?

If the overall P value is large, the data do not give you any reason to conclude that the means differ. Even if the population means were equal, you would not be surprised to find sample means this far apart just by chance. This is not the same as saying that the true means are the same. You just don’t have compelling evidence that they differ.

If the overall P value is small, then it is unlikely that the differences you observed are due to random sampling. You can reject the idea that all the populations have identical means. This doesn’t mean that every mean differs from every other mean, only that at least one differs from the rest. Look at the results of post tests to identify where the differences are. (Source)

Basically, a smaller p-value means that certain post types or days of the week perform better than others (in direct contrast to using cor.test, where a smaller p-value meant there wasn’t any correlation). When I plugged in oneway.test(Reach ~ Post) to test post types, I got a p-value of .00323, suggesting that there was a definite link between post type and reach. But which post type performed the best? What should I be posting more of on my page?

R has graphing commands that can help you visualize your data, and just plugging in boxplot(Reach ~ Post) will show you what you’re looking for (you may have to click “Zoom” in the viewer window in RStudio to see all your post types on the X axis). Assuming you know how to read boxplots, you can easily see which post type has the highest mean- in my case, Text posts (or posts without any media or link previews) outperformed just about everything else. Very interesting, I’ll have to use those more.

Doing oneway.test(Reach ~ Day) , on the other hand, gave me a p-value of .5917. Looking at a boxplot shows me that Post Reach is pretty consistent every day of the week, which is pretty reassuring.

Screen Shot 2015-03-01 at 7.45.44 PM


Obviously, results will vary here- sometimes you’ll find actionable insights (like I did with post types- I learned that posting event links is incredibly ineffective, and text posts usually perform above average), and sometimes you won’t. At the very least, this experiment is worth performing once to give you a more in-depth understanding of your organic page reach.

That being said, we all know how the game is played on Facebook nowadays- if you truly want solid post reach on all of your key marketing messages, pay in and boost the post- even $5 can bring a 5000% increase in reach.

Need help with your social media reach? I’ve helped brands and businesses all over the world grow their audiences, improve key metrics, and drive real measurable customers from social media- contact me today and find out how I can help.

photo credit: 2009 – 9/365 – Tirages avec remise and Discovering Statistics Using R via photopin (license)

Leave a Reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.