Metrics Maven: Calculating an Exponentially Weighted Moving Average in PostgreSQL

Published

In our Metrics Maven series, Compose's data scientist shares database features, tips, tricks, and code you can use to get the metrics you need from your data. In this article, we'll walk through how and why to calculate an exponentially weighted moving average.

We've covered a few different kinds of averages in this series. We had a look at mean, dug into weighted averages, showed a couple methods for calculating a simple moving average, generated a cumulative moving average in the same article, and also produced a 7-day weighted moving average. In this article we're going to add an exponentially weighted moving average to the group.

We'll start by getting a basic understanding of what an exponentially weighted moving average is and why we would want to use it.

Exponentially weighted moving average

The exponentially weighted moving average, sometimes also just called exponential moving average, (EWMA or EMA, for short) is used for smoothing trend data like the other moving averages we've reviewed. Similar to the weighted moving average we covered in our last article, weights are applied to the data such that dates further in the past will receive less weight (and therefore be less impactful to the result) than more recent dates. Rather than decreasing linearly, however, like we saw with the weighted moving average, the weight for an EWMA decreases exponentially for each time period further in the past. Additionally, the result of an EWMA is cumulative because it contains the previously calculated EWMA in its calculation of the current EWMA. Because of this, all the data values have some contribution in the result, though that contribution diminishes as each next period is calculated.

An exponentially weighted moving average is often applied when there is a large variance in the trend data, such as for volatile stock prices. It can reduce the noise and help make the trend clearer. It also has the benefit of staying more true to the trend than other types of moving averages, which can over- or under-correct or that smooth things out too much.

Let's get into our example and see how this works.

Our data

For EWMA, we're going to use the same daily summary data from our hypothetical pet supply company that we used in the previous article on weighted moving averages.

Our data table is called "daily_orders_summary" and looks like this:

date       | total_orders | total_order_items | total_order_value | average_order_items | average_order_value  
--------------------------------------------------------------------------------------------------------------
2017-01-01 | 14           | 18                | 106.84            | 1.29                | 7.63  
2017-01-02 | 10           | 21                | 199.79            | 2.10                | 19.98  
2017-01-03 | 12           | 17                | 212.98            | 1.42                | 17.75  
2017-01-04 | 12           | 15                | 100.93            | 1.25                | 8.41  
2017-01-05 | 10           | 13                | 108.54            | 1.30                | 10.85  
2017-01-06 | 14           | 20                | 216.78            | 1.43                | 15.48  
2017-01-07 | 13           | 16                | 198.32            | 1.23                | 15.26  
2017-01-08 | 10           | 12                | 124.67            | 1.20                | 12.47  
2017-01-09 | 10           | 16                | 140.88            | 1.60                | 14.09  
2017-01-10 | 17           | 19                | 136.98            | 1.12                | 8.06  
2017-01-11 | 12           | 14                | 99.67             | 1.17                | 8.31  
2017-01-12 | 11           | 15                | 163.52            | 1.36                | 14.87  
2017-01-13 | 10           | 18                | 207.43            | 1.80                | 20.74  
2017-01-14 | 14           | 20                | 199.68            | 1.43                | 14.26  
2017-01-15 | 16           | 22                | 207.56            | 1.38                | 12.97  
2017-01-16 | 14           | 19                | 176.76            | 1.36                | 12.63  
2017-01-17 | 13           | 18                | 184.48            | 1.38                | 14.19  
2017-01-18 | 14           | 25                | 265.98            | 1.79                | 19.00  
2017-01-19 | 10           | 17                | 178.42            | 1.70                | 17.84  
2017-01-20 | 19           | 24                | 139.67            | 1.26                | 7.35  
2017-01-21 | 15           | 21                | 187.66            | 1.40                | 12.51  
2017-01-22 | 19           | 24                | 226.98            | 1.26                | 11.95  
2017-01-23 | 17           | 24                | 212.64            | 1.41                | 12.51  
2017-01-24 | 16           | 21                | 187.43            | 1.31                | 11.71  
2017-01-25 | 19           | 27                | 244.67            | 1.42                | 12.88  
2017-01-26 | 20           | 29                | 267.44            | 1.45                | 13.37  
2017-01-27 | 17           | 25                | 196.43            | 1.47                | 11.55  
2017-01-28 | 21           | 28                | 234.87            | 1.33                | 11.18  
2017-01-29 | 18           | 29                | 214.66            | 1.61                | 11.93  
2017-01-30 | 14           | 20                | 199.68            | 1.43                | 14.26  
2017-02-01 | 19           | 27                | 189.98            | 1.42                | 10.00  
2017-02-02 | 22           | 31                | 274.98            | 1.41                | 12.50  
2017-02-03 | 20           | 28                | 213.76            | 1.40                | 10.69  
2017-02-04 | 21           | 30                | 242.78            | 1.43                | 11.56  
2017-02-05 | 22           | 34                | 267.88            | 1.55                | 12.18  
2017-02-06 | 19           | 24                | 209.56            | 1.26                | 11.03  
2017-02-07 | 21           | 33                | 263.76            | 1.57                | 12.56  

It's all about the lambda

As mentioned above, the weight for EWMA decreases exponentially for each time period in the past. The further in the past, the less weight is given. To apply the weights for our data, we'll need a smoothing parameter (also called lambda) which will act as a multiplier on the data values. This smoothing parameter will be a value between 0 and 1 and is typically 2 divided by the sum of the length of days. Since we'll stick with a 7-day range, our lambda would be 2 / (1 + 7) which comes out to 0.25.

The formula for calculating an EWMA boils down to this:

(Current period data value * lambda) + (Previous period EWMA * (1 - lambda)) = Current period EWMA

An alternative formula which produces the same result is:

((Current period data value - Previous period EWMA) * lambda) + Previous period EWMA = Current period EWMA

Now that we know what our lambda is and we have the formula we're going to apply, it's time to run our query:

WITH recursive exponentially_weighted_moving_average  
(date, average_order_value, ewma, rn)
AS (

 -- Initiate the ewma using the 7-day simple moving average (sma)
    SELECT rows.date, rows.average_order_value, sma.sma AS ewma, rows.rn
    FROM (
        SELECT date, average_order_value, ROW_NUMBER() OVER(ORDER BY date) rn
        FROM daily_orders_summary
    ) rows
    JOIN (
        SELECT date,  
           ROUND(AVG(average_order_value)
              OVER(ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW), 2) AS sma
        FROM daily_orders_summary
    ) sma ON sma.date = rows.date
    WHERE rows.rn = 7 -- start on the 7th day since we're using the 7-day sma

    UNION ALL

 -- Perform the ewma calculation for all the following rows
    SELECT rows.date, rows.average_order_value
    , ROUND((rows.average_order_value * 0.25) + (ewma.ewma * (1 - 0.25)), 2) AS ewma
    --, ROUND((((rows.average_order_value - ewma.ewma) * 0.25) + ewma.ewma), 2) AS ewma -- alternative formula
    , rows.rn
    FROM exponentially_weighted_moving_average ewma
    JOIN (
        SELECT date, average_order_value, ROW_NUMBER() OVER(ORDER BY date) rn
        FROM daily_orders_summary
    ) rows ON ewma.rn + 1 = rows.rn
    WHERE rows.rn > 7 -- rows following the 7th day
)

SELECT date, average_order_value, ewma  
FROM exponentially_weighted_moving_average  
;

That's a lot to take in all at once so let's break it down and learn how it works.

Using a recursive CTE

First, we're creating a recursive CTE (common table expression using WITH) called "exponentially_weighted_moving_average" that returns 4 field values: date, average order value, the ewma, and a row number. We're using this approach because the EWMA calculation requires the previous period's EWMA. A recursive CTE can provide the previous EWMA calculation to us for each period.

Note that using a recursive CTE on a large data set is not going to be your best option. Performance will take a big dive since the query will recurse through all the data. If you have a large data set that you need to calculate EWMA for, then you should consider using the procedural language options for PostgreSQL such as PL/Python or PL/Perl. You can learn more about recursive CTEs and procedural language options in the official PostgreSQL documentation.

Initializing the EWMA

The first query in our WITH block is the EWMA initialization. Because the calculation requires the previous period EWMA, we have to give it something to start with. A common approach is to use the simple moving average for the length of the time period as the initial EWMA. That's what we've done here. Because we're calculating a 7-day EWMA (our lambda is based on a 7-day range), we have a sub-query called "sma" where we're calculating the 7-day simple moving average for the 7th day (we need 7 days to get the SMA) and using that as our EWMA starting point. If you're not familiar with the simple moving average or you just need a refresher, check out our article on basic moving averages.

You could also simply initialize the EWMA with the actual data value for that date. Of course the results will be different. Play around with what seems to work best for your data on how to initialize the EWMA since that will impact all the rest of your calculations.

Most of the data in that first query comes from another sub-query called "rows" that employes the ROW_NUMBER() window function. We are using this sub-query to generate row numbers for each of the rows, which allows us to identify the 7th row for initialization. If you're not familiar with window functions, take a gander at our article on window functions.

Recursing through the data to calculate EWMA

The UNION ALL and the next query in the WITH block are where the recursion and the calculation of EWMA occur. We've got the same "rows" subquery, but in this case, we only care about the rows following the 7th row since that's where we'll apply our EWMA calculation. We're joing the "rows" sub-query to our recursive CTE "exponentially_weighted_moving_average" (aliased as "ewma") on row number where the "ewma" row is 1 less than the "rows" row. In this way we can use the previously-calculated EWMA from the "ewma" CTE and the current data value (average_order_value in this case) from the "rows" sub-query.

To get the calculated EWMA for the current row, we're applying the formula in SQL as:

ROUND((rows.average_order_value * 0.25) + (ewma.ewma * (1 - 0.25)), 2) AS ewma  

What we're doing here is...

Note that we've also included the alternative formula in the SQL, just commented out. You can use either one.

Returning results

Finally we're selecting the fields we're interested in for our report from the recursive CTE.

Here's what those results look like:

date       | average_order_value | ewma  
----------------------------------------
2017-01-07 | 15.26               | 13.62  
2017-01-08 | 12.47               | 13.33  
2017-01-09 | 14.09               | 13.52  
2017-01-10 | 8.06                | 12.16  
2017-01-11 | 8.31                | 11.20  
2017-01-12 | 14.87               | 12.12  
2017-01-13 | 20.74               | 14.28  
2017-01-14 | 14.26               | 14.28  
2017-01-15 | 12.97               | 13.95  
2017-01-16 | 12.63               | 13.62  
2017-01-17 | 14.19               | 13.76  
2017-01-18 | 19.00               | 15.07  
2017-01-19 | 17.84               | 15.76  
2017-01-20 | 7.35                | 13.66  
2017-01-21 | 12.51               | 13.37  
2017-01-22 | 11.95               | 13.02  
2017-01-23 | 12.51               | 12.89  
2017-01-24 | 11.71               | 12.60  
2017-01-25 | 12.88               | 12.67  
2017-01-26 | 13.37               | 12.85  
2017-01-27 | 11.55               | 12.53  
2017-01-28 | 11.18               | 12.19  
2017-01-29 | 11.93               | 12.13  
2017-01-30 | 14.26               | 12.66  
2017-02-01 | 10.00               | 12.00  
2017-02-02 | 12.50               | 12.13  
2017-02-03 | 10.69               | 11.77  
2017-02-04 | 11.56               | 11.72  
2017-02-05 | 12.18               | 11.84  
2017-02-06 | 11.03               | 11.64  
2017-02-07 | 12.56               | 11.87  

Let's look at just one date to review how the EWMA was calculated. We'll use January 21. By applying the formula for EWMA, we get:

(Current period data value * lambda) + (Previous period EWMA * (1 - lambda)) = Current period EWMA
         (12.51 * 0.25)              +         (13.66 * (1 - 0.25))          =      13.37225

Seeing trends

Now that we have our EWMA, let's plot our average order value, the simple moving average, the weighted moving average, and the EWMA together to see how the trend lines compare:

trend comparison

We can see that the average order value by itself is pretty volatile making it difficult to see the overall trend. The simple moving average helps smooth things out, but over- or under-corrects in some places. The weighted moving average smooths the trend out further and makes it easier to see the rise that happened until about the 3rd week of January and then the slight decline from then. The exponentially weighted moving average follows the true data values better than the other two metrics while still smoothing the trend line.

Wrapping up

In this article we learned how to calculate an exponentially weighted moving average using a recursive CTE. We discussed when it's useful to apply and compared the results to other average types we looked at in previous articles. With each metric we are better able to zero in on just how our business is performing.

In our next article, we'll be taking a look at CROSSTAB again and covering some aspects there that we didn't have a chance to get to in our previous article on pivot tables in Postgres.

Image by: msandersmusic

Conquer the Data Layer

Spend your time developing apps, not managing databases.