A self-proclaimed SEO and statistics expert recently told me I can’t correlate factors with 100 samples when google estimates there are 2.3 million results for a keyword because my sample size is too small.

This expert then proceeded to shit on the whole field of statistics for any business purpose. Some expert, right?

This person could not be more wrong. We are talking about using math to determine “sort order” factors of a result set. If you went to Amazon, which has millions of products, and picked a secret sort order for the category… how many samples do you need to reasonably determine if the category is sorted by price? or by name alphabetically? Do you really need a 10% sample size consisting of tens of thousands of products for that category? No… you can probably make your gut determination in roughly ten samples. You probably wouldn’t even consider checking 100 samples. This is all we are doing when we use correlation in SEO. We use math to determine if it appears that Google is sorting results by certain factors.

Why It Works

To start we assume everything is not a factor.

Below is a hypothetical chart of a factor. On the X axis we have position in the Google search results and on the Y axis we have the factor measurement. When we assume it is not a factor we mean we expect the measurements to not appear in a sorted order with respect to rank position in Google.

Now when the measurements look like this we think Google might be sorting by them:

Statistical correlation describes the slope of the trend line. So for SEO factors you want a strong negative correlation in your natural data. This means that the chart depicts that the more of the factor you have the better you tend to rank. A strong positive correlation means the more of the factor you have the worse you tend to rank. Typically what we see in Google is that strong negative correlations are frequently occurring for the same factors over and over again. The strong positive correlations tend to be one time flukes that do not reappear from one keyword to the next which suggests they are outliers that happen coincidentally. So in general we are a lot more interested in the strong negative correlations than the strong positive.

Definitive Proof vs. Good Enough

People try to paint a picture that with correlations we are claiming to definitively prove something is a factor or that something is not a factor. This is not the case at all.

In general, a strong correlating factor is worthy of consideration and everything else is INCONCLUSIVE.

We are hunting for clues that make our guesses better on where to spend time to achieve positive measurable results. With Cora SEO Software we measure over 540 on-page and off-page SEO factors for your keywords. Cora is one of several SEO measurement tools on the market. So we use correlation to reduce the field of possibilities from 540 possible areas to work on to just one or two dozen likely suspects that have empirical evidence supporting them.  So while the rest of the world is blindly guessing what to work on and how much to invest into that work we are using measurements and correlation to have much better guesses and to know exactly when to stop investing in certain factors.

Correlation is not Causation

This is a concern for statisticians the world over. It is a cautionary statement about double checking you data and your methodologies and being very strict about what you are testing and what conclusions you can infer from the data.

Most people in SEO when they use this statement, and I have had a LOT of people in SEO throw this in my face, they are actually saying “I am justifying my practice of NOT using math, data, or evidence-based methodologies in my SEO”.  Even if you don’t agree with my methods or data you should NOT be condemning math and science in the SEO industry. The people who are doing this are charlatans. They are typically SEOs who built a reputation for themselves in past years and they have been selling that reputation all these years. When measurements and math work in SEO it devalues their shaman-esque divine knowledge. These SEOs work on the principle that the things they did in the past to rank well will still work well today and will continue to work well in the future. They are shamans throwing the chicken bones with each new client they acquire. Many of these shaman vehemently attack math and data in SEO because it is an attack on their belief system and it devalues their magical ability to somehow know the mind of Google past, present, and future.

I Do Not Care What Is or Isn’t a Ranking Factor 

I have no stock in something being a factor or something not being a factor. I only care about whether or not there is data to support it. I have seen factors change. I have had “SEO facts” that were logical and reasonable and claimed on hundreds of SEO blogs proven to be false. I have seen Google claim things to be one way, but when measured in field were actually untrue. None of it is malicious. It is just what human beings do when they aren’t using science. If it is a factor, cool. If it is not a factor, cool. Either way I know something the rest of the world doesn’t and that adds up to advantage. I do not mind being proven wrong about factors. In fact, it is very refreshing. Free effortless learning. Normally to learn something new I have to invest a lot of time and effort.

More often than not, following the data leads to better results for me than following the SEO herd. I no longer trust the advice of blogs and SEO pundits and Google proclamations. That kind of guidance has inflicted more harm than good in my opinion. And with all the claims made in those areas none of it came with proof.  I call the people who solely operate on that kind of advice the “Blog Believers”. They have no data. They don’t care about evidence. They simply defer to the most acclaimed moniker and that is good enough for them. More power to you. That is not good enough for me.

Chart provided by Josh Bachynski from White Hat versus Black Hat

I Do Not Care about Google Algorithm Updates

I measure. I have an archive of measurements. When an algorithm update is suspected I simply compare the correlations from before the update with the correlations from after the update and the factors that got stronger or weaker are my clues on how to respond to an update.

Since I have been eliminating factor deficits my websites rarely take a hit from a Google update. It appears that the poorly tuned have the most to worry about with updates.  So while the rest of the community loses their mind about an update I usually have a typical day instead.

I’ll stop my rant here. In my next post I will blow your mind with SEO Correlation Data. I dare you not to use it.