QueryStash

QueryStash

Spike Detection for Website Visits

To demonstrate one approach to spike detection, we are exploring website visitor data from the BigQuery Public Google Analytics sample data bigquery-public-data.google_analytics_sample.ga_sessions_*.

More specifically, the number of daily visits.

Website Visits

Now looking at the above graph, you don't need to be a statistician to work out where the spikes occurred. Visually obvious signs lead us to the conclusion that there have been two major spikes across the time series data.

When we talk about spikes what do we actually mean? In simple terms, the value for that day is out side of the expected range for what would be normal.

Now visually spotting spikes is great, but in the real world not actually that useful. Spikes often mean something is happening that you should know about. Now waiting until you have had a chance to manually look at some data isn't a great way to monitor anything.

So, how do we write a simple query to detect spikes in our time series data?

Spike Detection Process

First we aggregate the data we need, in this case daily visits.

SELECT
    date
    , SUM(totals.visits) AS visits
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*`
GROUP BY date

We then find the moving average and standard deviation across the data using a 7 day window prior to the current day. The time window could be larger or smaller depending on your data.

SELECT 
    *
    , AVG(visits) OVER(ORDER BY date ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS visits_ma
    , STDDEV(visits) OVER(ORDER BY date ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING) AS visits_stddev
FROM aggregation

Finally we use all of the above to detect spikes for a given day. Pay attention to this particular line

IF(visits > visits_ma + 2 * visits_stddev, 1, 0) AS spike

What we are saying here is if the current days visits is greater than the moving average + 2 times the standard deviation, we flag it as a spike.

The full query can be found below.

Resulting in:

Website Visits Spikes

Thats its, see its quite simple. The output of this query could trigger notifications or email alerts or sound alarms. Whatever the needs this simple technique can be applied to any time series data.

Notes

Adjusting the sensitivity of the spike detection can be done by changing the number of standard deviations, increase the value for detection of more extreme values.

Finally with regards to the Google Analytics data, vishalg19 on Reddit kindly pointed out that on real GA data using totals.visits doesn't match up to the numbers you see in the dashboard for sessions. Instead he suggested the work around of COUNT(DISTINCT CONCAT(fullvisitorid, CAST(visitstarttime AS string))) AS sessions as the aggregation metric and also include WHERE totals.visit > 0 as well.

Public
spikes bigquery

QueryStash

Created: Nov 5th 2020

QueryStash

QueryStash

Made with ❤️ by @Yekalb

© 2024 Query Stash. All rights reserved

TwitterFacebookEmail

Resources

BlogPricingYour First QuerySearch Your Stash

Stay up to date

Terms of ServicePrivacy Policy