Sunday, November 25, 2018

Easy running totals with windowing functions in PostgreSQL




Back in the pre windowing function days, if you wanted to do a running count, you either had to run a subquery or you could use a variable. This was slow because for each row the query that did the sum would be executed. With windowing functions in PostgreSQL, this is now running much faster. 

Let's take a look, first create the following table


CREATE Temp TABLE test(Id int,SomeDate date, Charge decimal(20,10));


insert into test
values( 1,to_date('20120101','YYYYMMDD'),1000);
insert into test
values( 1,to_date('20120401','YYYYMMDD'),200);
insert into test
values( 1,to_date('20120501','YYYYMMDD'),300);
insert into test
values( 1,to_date('20120601','YYYYMMDD'),600);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),100);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),500);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),-800);
insert into test
values( 3,to_date('20120101','YYYYMMDD'),100);


let's check that data we just inserted into the temporary table


SELECT * FROM test


The output looks like this

Id SomeDate Charge
1 2012-01-01 1000.0000000000
1 2012-04-01 200.0000000000
1 2012-05-01 300.0000000000
1 2012-06-01 600.0000000000
2 2012-01-01 100.0000000000
2 2013-01-01 500.0000000000
2 2014-01-01 -800.0000000000
3 2012-01-01 100.0000000000


What we want is the following

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2012-12-31 100.0000000000 100.0000000000
2 2013-01-01 2013-12-31 500.0000000000 600.0000000000
2 2014-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000

For each row, we want to have the date that the row starts on and also the date when it ends, we also want a running total as well. If there is no row after the current row for that id, we want the end date to be 9999-12-31.

So we will use a couple of functions. The first one is LEAD, LEAD accesses data from a subsequent row in the same result set without the use of a self-join. So the LEAD part looks like this

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231') OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,

What we are doing is subtracting 1 from the date in the subsequent row (SomeDate + -1 * INTERVAL '1 day' )
We are using 1 as the offset since we want to apply this to the next row. Finally if there is no subsequent row, we want to use the date 9999-12-31 instead of NULL

To do the running count, we will do the following

SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          AS RunningTotal

What this means in English is for each id ordered by date, sum up the charge values for the rows between the preceding rows and the current row. Here is what all that stuff means.

ROWS BETWEEN
Specifies the rows that make up the range to use as implied by

UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. UNBOUNDED PRECEDING can only be specified as window starting point.

CURRENT ROW
Specifies that the window starts or ends at the current row when used with ROWS or the current value when used with RANGE.
CURRENT ROW can be specified as both a starting and ending point.

And here is the query


SELECT id, someDate as StartDate,
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231')
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM test
  ORDER BY id, SomeDate


And running that query, gives us the running count as well as the end dates 

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2011-12-31 100.0000000000 100.0000000000
2 2012-01-01 2011-13-31 500.0000000000 600.0000000000
2 2012-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000


Here is what it looks like if you execute the query in PGAdmin



If you don't want the last row to have the end date filled in, just omit the default value in the LEAD function. Instead of

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231')

Make it

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1)

Here is the whole query again


SELECT id, someDate as StartDate,
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1)
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM test
  ORDER BY id, SomeDate
And here is what the output looks like after we made the change
As you can see the rows for an id that doesn't have a row with a date greater than the current date will have a null end date
That is all for this post

Friday, November 23, 2018

Happy Fibonacci day, here is how to generate a Fibonacci sequence in PostgreSQL


Image by Jahobr - Own work, CC0, Link


Since today is Fibonacci day I decided to to a short post about how to do generate a Fibonacci sequence in PostgreSQL. But first let's take a look at what a Fibonacci sequence actually is.

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones:

 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

Often, especially in modern usage, the sequence is extended by one more initial term:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

November 23 is celebrated as Fibonacci day because when the date is written in the mm/dd format (11/23), the digits in the date form a Fibonacci sequence: 1,1,2,3.

So here is how you can generate a Fibonacci sequence in PostgreSQL, you can do it by using s recursive table expression.  Here is what it looks like if you wanted to generate the Fibonacci sequence to up to a value of 1 million

;WITH RECURSIVE Fibonacci (Prev, Next) as
(
     SELECT 0, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000




That will generate a Fibonacci sequence that starts with 0 if you need the Fibonacci sequence to start at 1, all you have to do is replace the 1 to 0 in the first select statement

;WITH RECURSIVE Fibonacci (Prev, Next) as
(
     SELECT 1, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000


Here is what it looks like in PGAdmin when you run the query



Happy Fibonacci day!!


Here is pretty much the same post that I created for SQL Server: Happy Fibonacci day, here is how to generate a Fibonacci sequence in SQL

Tuesday, November 20, 2018

This brought back memories



The other day I walked past the building in the picture above and I felt a little sad. The reason I felt sad is because I have great memories about this building. When I first came to the US from the Netherlands I was amazed at how big this store was and how they had so many things on display. In Amsterdam where I arrived from, you didn’t really have a store like this. The story was named J&R, there was a J&R Music World store and a J&R Computer World store. J&R stands for the founders Joe and Rachelle Friedman.

There were several items I bought at this store

The first camera I ever owned I bought in this place, it was a Minolta but I don’t remember the model. I later sold the camera because a co-worker needed the camera while he went on vacation to Equador. So I only had it for 2 or 3 months.

A couple of years later I replaced that camera with a Canon 50E  I think this Canon camera might have also been known as an Elan. The E in the model name stands for eye-control, this was an enhanced version of the 3-zone eye-controlled autofocus system that was first seen on the EOS 5 camera. This was very high tech back then.




Another item that I have fond memories about was the first MP3 player that I purchased; It was the first MP3 player that was available to buy. It was the Rio PMP-300 portable MP3 player. I believe it was around $200 and you could hold about 10 songs or so since it only came with 32 MB of memory. You could store more songs if you inserted a smartmedia card. But even then you could not store hundreds of songs. I remember encoding the songs in WMA 64k format so I could store even more songs…… lol  1st world problems.




I bought many other things at J&R like camera lenses, filters for the camera, a tripod, RAM to upgrade the PC, I even purchased Plus! lol  Plus had themes and utilities for Windows



The nice thing about J&R was that they had prices unlike 42 street photo or whatever those stores were called where you had to negotiate the price. Also unlike at the Nobody Beats The Wiz store, they did not try to sell you insurance with every item you bought.

These days I guess you would have to go to B&H to get a  similar experience, just be aware that they are not open on Saturdays.