Back in the pre windowing function days, if you wanted to do a running count, you either had to run a subquery or you could use a variable. This was slow because for each row the query that did the sum would be executed. With windowing functions in PostgreSQL, this is now running much faster.
Let's take a look, first create the following table
CREATE Temp TABLE test(Id int,SomeDate date, Charge decimal(20,10)); insert into test values( 1,to_date('20120101','YYYYMMDD'),1000); insert into test values( 1,to_date('20120401','YYYYMMDD'),200); insert into test values( 1,to_date('20120501','YYYYMMDD'),300); insert into test values( 1,to_date('20120601','YYYYMMDD'),600); insert into test values( 2,to_date('20120101','YYYYMMDD'),100); insert into test values( 2,to_date('20120101','YYYYMMDD'),500); insert into test values( 2,to_date('20120101','YYYYMMDD'),-800); insert into test values( 3,to_date('20120101','YYYYMMDD'),100);
let's check that data we just inserted into the temporary table
SELECT * FROM test
The output looks like this
Id SomeDate Charge 1 2012-01-01 1000.0000000000 1 2012-04-01 200.0000000000 1 2012-05-01 300.0000000000 1 2012-06-01 600.0000000000 2 2012-01-01 100.0000000000 2 2013-01-01 500.0000000000 2 2014-01-01 -800.0000000000 3 2012-01-01 100.0000000000
What we want is the following
id StartDate Enddate Charge RunningTotal 1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000 1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000 1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000 1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000 2 2012-01-01 2012-12-31 100.0000000000 100.0000000000 2 2013-01-01 2013-12-31 500.0000000000 600.0000000000 2 2014-01-01 9999-12-31 -800.0000000000 -200.0000000000 3 2012-01-01 9999-12-31 100.0000000000 100.0000000000
For each row, we want to have the date that the row starts on and also the date when it ends, we also want a running total as well. If there is no row after the current row for that id, we want the end date to be 9999-12-31.
So we will use a couple of functions. The first one is LEAD, LEAD accesses data from a subsequent row in the same result set without the use of a self-join. So the LEAD part looks like this
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231') OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
What we are doing is subtracting 1 from the date in the subsequent row (SomeDate + -1 * INTERVAL '1 day' )
We are using 1 as the offset since we want to apply this to the next row. Finally if there is no subsequent row, we want to use the date 9999-12-31 instead of NULL
To do the running count, we will do the following
SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS RunningTotal
What this means in English is for each id ordered by date, sum up the charge values for the rows between the preceding rows and the current row. Here is what all that stuff means.
ROWS BETWEEN
Specifies the rows that make up the range to use as implied by
UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. UNBOUNDED PRECEDING can only be specified as window starting point.
CURRENT ROW
Specifies that the window starts or ends at the current row when used with ROWS or the current value when used with RANGE.
CURRENT ROW can be specified as both a starting and ending point.
And here is the query
SELECT id, someDate as StartDate, LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231') OVER (PARTITION BY id ORDER BY SomeDate) as Enddate, Charge, SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM test ORDER BY id, SomeDate
And running that query, gives us the running count as well as the end dates
id StartDate Enddate Charge RunningTotal 1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000 1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000 1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000 1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000 2 2012-01-01 2011-12-31 100.0000000000 100.0000000000 2 2012-01-01 2011-13-31 500.0000000000 600.0000000000 2 2012-01-01 9999-12-31 -800.0000000000 -200.0000000000 3 2012-01-01 9999-12-31 100.0000000000 100.0000000000
Here is what it looks like if you execute the query in PGAdmin
If you don't want the last row to have the end date filled in, just omit the default value in the LEAD function. Instead of
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231')Make it
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1)
Here is the whole query again
SELECT id, someDate as StartDate, LEAD((SomeDate + -1 * INTERVAL '1 day' ),1) OVER (PARTITION BY id ORDER BY SomeDate) as Enddate, Charge, SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal FROM test ORDER BY id, SomeDateAnd here is what the output looks like after we made the changeAs you can see the rows for an id that doesn't have a row with a date greater than the current date will have a null end dateThat is all for this post