Musings, Rants and Ponderings Of A DB Architect: howto

Showing posts with label howto. Show all posts

Tuesday, April 30, 2019

A quick and easy way to count the percentage of nulls without a where clause in PostgreSQL

This came up the other day, someone wanted to know the percentage of NULL values in a column

Then I said "I bet you I can run that query without using a NULL in the WHERE clause, as a matter of fact, I can run that query without a WHERE clause at all!!"

I created a SQL Server version of this post here: How to count NULLS without using IS NULL in a WHERE clause, this is the PostgreSQL version

To start, first create this table and verify you have 9 rows

CREATE TABLE foo(bar int);
INSERT INTO foo values(1),(null),(2),(3),(4),
 (null),(5),(6),(7);

SELECT * FROM foo;

Here is what the output should look like

To get the NULL values and NON NULL values, you can do something like this

SELECT COUNT(*) as CountAll FROM foo WHERE bar IS NOT NULL;
SELECT COUNT(*) as CountAll FROM foo WHERE bar IS  NULL;

However, there is another way

Did you know that COUNT behaves differently if you use a column name compared to when you use *

Take a look

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn
FROM foo;

If you ran that query, the result is the following

CountAll CountColumn
----------- -----------
9 7

Here is what it looks like in pgAdmin3

Let's see what the PostgreSQL documentation has to say

COUNT(*) number of input rows

COUNT(expression) number of input rows for which the value of expression is not null

This is thus indeed documented behavior, count(*) counts all rows, count(column) skips the rows with NULLS

So now, lets change our query to return the percentage of non null values in the column

Here is what the query looks like

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues 
FROM foo;

Running that gives us this output

CountAll CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9 7 0

Here is what it looks like in pgAdmin3

As you can see, we have a 0 in the percentage column. The reason is that if you do math with integers, you will get an integer back

We can fix this easy by multiplying one of the integers by 1.0, this will convert the integer to a numeric data type . So instead of this line

 (COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues

We will change it to be like this

(COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues

Here is the changed query

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues 
FROM foo;

Here is the output

CountAll CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9 7 77.7777777777800

Here is what it looks like in pgAdmin3

Instead of doing * 1.0, you can also explicitly cast the integer like this

(CAST (COUNT(bar) as numeric(20,10))/COUNT(*))*100 as PercentageOfNonNullValues

The query now looks like this

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (CAST (COUNT(bar) as numeric(20,10))/COUNT(*))*100 as PercentageOfNonNullValues
FROM foo;

Here is what it looks like in pgAdmin3

That's it for this short post.. hopefully you knew this, if not, then you know it now :-)

Monday, February 18, 2019

Calculating Sexy Primes, Prime Triplets and Sexy Prime Triplets in PostgreSQL

The other day I was reading something on Hackernews and someone posted a link to a Sexy Primes wikipedia article. I looked at that and then decided to do this in SQL Server because.. why not? Then I decided to see how different this would be to do in PostgreSQL. For the first method to create the prime numbers it's different. For the method with the CTE it is very similar

From the Sexy Primes wikipedia link: https://en.wikipedia.org/wiki/Sexy_prime

In mathematics, sexy primes are prime numbers that differ from each other by six. For example, the numbers 5 and 11 are both sexy primes, because 11 minus 5 is 6.

The term "sexy prime" is a pun stemming from the Latin word for six: sex.

If p + 2 or p + 4 (where p is the lower prime) is also prime, then the sexy prime is part of a prime triplet.

Ok I did a couple of versions of this and over the weekend.. here is what I ended up with

So first we need a table that will just have the prime numbers

I decided to populate a table with numbers from 2 till 500 and then use the sieve of Eratosthenes method to delete the non primes

This will look like this

Create this table

CREATE  TABLE  PrimeNumbers  (N INT);

In one window run this to create the function/proc

CREATE OR REPLACE FUNCTION MakePrime() RETURNS void AS $$
DECLARE I integer := 2;
BEGIN
WHILE I <= SQRT(500) LOOP
    DELETE FROM PrimeNumbers WHERE N % I = 0 AND N > I;
    I := I + 1 ; 
END LOOP;

END;

$$ LANGUAGE plpgsql;



In a another window populate the table by making the call to the function
 

INSERT  INTO PrimeNumbers(n)
SELECT N
 FROM (SELECT generate_series(2,500) as n) x;


SELECT MakePrime() ; -- Yes that is a proc call

SELECT * FROM PrimeNumbers

Thinking about it a little more I decided to do it with a CTE instead of a loop with delete statements, if your tables will be big then the delete method is probably better... it's for you to test that out :-)

What we are doing is a NOT EXISTS query against the same cte and we are filtering out numbers that are greater than the number in the current row and are not divisible by the current number

CREATE TABLE IF NOT EXISTS  PrimeNumbers  (N INT);


;WITH cte AS (
  SELECT * FROM generate_series( 2, 500 )  n
  )

INSERT INTO PrimeNumbers
SELECT n
FROM cte
WHERE NOT EXISTS (
  SELECT n FROM  cte as cte2
WHERE cte.n > cte2.n AND cte.n % cte2.n = 0)
;

SELECT * FROM PrimeNumbers;

If we run that last select statement, we should have 95 rows

2
3
5
7
.....
.....
463
467
479
487
491
499

Now that we have our table filled with prime numbers till 500, it's time to run the queries

Sexy prime pairs
The sexy primes (sequences OEIS: A023201 and OEIS: A046117 in OEIS) below 500 are:

(5,11), (7,13), (11,17), (13,19), (17,23), (23,29), (31,37), (37,43), (41,47), (47,53), (53,59), (61,67), (67,73), (73,79), (83,89), (97,103), (101,107), (103,109), (107,113), (131,137), (151,157), (157,163), (167,173), (173,179), (191,197), (193,199), (223,229), (227,233), (233,239), (251,257), (257,263), (263,269), (271,277), (277,283), (307,313), (311,317), (331,337), (347,353), (353,359), (367,373), (373,379), (383,389), (433,439), (443,449), (457,463), (461,467).

Here is that query for the sexy prime pairs

-- 46 rows.. sexy primes
SELECT t1.N,t2.N 
 FROM PrimeNumbers t1
join PrimeNumbers t2 on t2.N - t1.N = 6 
order by 1

It's very simple.. a self join that returns rows where the number from one table alias and the number from the other table alias differ by 6

Prime triplets
The first prime triplets below 500 (sequence A098420 in the OEIS) are

(5, 7, 11), (7, 11, 13), (11, 13, 17), (13, 17, 19), (17, 19, 23), (37, 41, 43), (41, 43, 47), (67, 71, 73), (97, 101, 103), (101, 103, 107), (103, 107, 109), (107, 109, 113), (191, 193, 197), (193, 197, 199), (223, 227, 229), (227, 229, 233), (277, 281, 283), (307, 311, 313), (311, 313, 317), (347, 349, 353), (457, 461, 463), (461, 463, 467)

A prime triplet contains a pair of twin primes (p and p + 2, or p + 4 and p + 6), a pair of cousin primes (p and p + 4, or p + 2 and p + 6), and a pair of sexy primes (p and p + 6).

So we need to check that the 1st and 3rd number have a difference of 6, we also check that that difference between number 1 and 2 is 2 or 4. That query looks like this

-- 22 rows.. Prime Triplets
SELECT t1.N AS N1,t2.N AS N2, t3.N AS N3
 FROM PrimeNumbers t1
join PrimeNumbers t2 on t2.N > t1.N 
join PrimeNumbers t3 on t3.N - t1.N = 6
and t3.N > t2.N
and t2.n - t1.n IN (2,4)
order by 1

Here is what it looks like from pgAdmin

Sexy prime triplets
Triplets of primes (p, p + 6, p + 12) such that p + 18 is composite are called sexy prime. p p, p+6 and p+12 are all prime, but p+18 is not

Those below 500 (sequence OEIS: A046118) are:

(7,13,19), (17,23,29), (31,37,43), (47,53,59), (67,73,79), (97,103,109), (101,107,113), (151,157,163), (167,173,179), (227,233,239), (257,263,269), (271,277,283), (347,353,359), (367,373,379)

The query looks like this.. instead of a self join, we do a triple self join, we also check that p + 18 is not a prime number in the line before the order by

-- 14 rows.. Sexy prime triplets
SELECT t1.N AS N1,t2.N AS N2, t3.N AS N3
 FROM PrimeNumbers t1
join PrimeNumbers t2 on t2.n - t1.n = 6
join PrimeNumbers t3 on t3.N - t1.N = 12
and t3.N > t2.N
AND NOT EXISTS( SELECT null FROM PrimeNumbers p WHERE p.n = t1.n +18)
order by 1

And that's it for this post. If you are interested in the SQl Server version, you can find it here: Calculating Sexy Primes, Prime Triplets and Sexy Prime Triplets in SQL Server

More PostgreSQL posts can be found here: /label/PostgreSQL

Monday, December 3, 2018

Using PostgreSQL's Interval to mimic SQL Server's DATEADD function

I wanted to do some date calculations in PostgreSQL and was doing some research on if something like the DATEDIFF function that exists in SQL Server is available in PostgreSQL. These notes are mostly for me so I can refer back to them..but maybe they are useful for someone else as well

In SQL Server to add dates, minutes and other fractions of a date to a date, you can use the DATEADD function

Here are some quick examples, if you want to run this in SQL Server, create this table with one row first

CREATE  TABLE test(SomeDate date);

INSERT INTO test
VALUES('20120101');

And here is a simple DATEADD query, that adds 1 or 2 days to a date by using both the datepart and the abbreviated datepart

SELECT SomeDate,
  DATEADD(dd,1,SomeDate) as Interval1dd,
  DATEADD(dd,2,SomeDate) as Interval2dd,
  DATEADD(day,1,SomeDate) as Interval1Day,
  DATEADD(day,2,SomeDate) as Interval2Day
FROM test

That will give us the following output

SomeDate Interval1dd Interval2dd Interval1Day Interval2Day

2012-01-01 2012-01-02 2012-01-03 2012-01-02 2012-01-03

If you want to go negative, all you have to do is place a minus sign in front of the number

SELECT SomeDate,
  DATEADD(dd,-1,SomeDate) as Interval1dd,
  DATEADD(dd,-2,SomeDate) as Interval2dd,
  DATEADD(day,-1,SomeDate) as Interval1Day,
  DATEADD(day,-2,SomeDate) as Interval2Day
FROM test

Here is the output of that query

SomeDate Interval1dd Interval2dd Interval1Day Interval2Day
2012-01-01 2011-12-31 2011-12-30 2011-12-31 2011-12-30

Here are all the valid datepart arguments in SQL Server

datepart	Abbreviations
year	yy, yyyy
quarter	qq, q
month	mm, m
dayofyear	dy, y
day	dd, d
week	wk, ww
weekday	dw, w
hour	hh
minute	mi, n
second	ss, s
millisecond	ms
microsecond	mcs
nanosecond	ns

Now let's take a look at PostgreSQL

In PostgreSQL, there is no DATEPART function, but you can use interval literals to accomplish something that behaves the same

Let's do something similar like we did with the SQL Server queries, first create this temp table with one row

CREATE Temp TABLE test(SomeDate date);


INSERT INTO test
VALUES( to_date('20120101','YYYYMMDD'));

Now let's run this query

SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 D')  as IntervalD,
  (SomeDate + 2 * INTERVAL '1 Day' ) as Interval2Times1Day,
  (SomeDate + 1 * INTERVAL '2 Day' ) as Interval2Days,
  (SomeDate + 2 * INTERVAL '2 Days' ) as Interval2Times2Days,
  (SomeDate +  INTERVAL '1 Day' )    as IntervalD
FROM test

Here is the output

"2012-01-01";"2012-01-02 00:00:00";"2012-01-02 00:00:00";"2012-01-03 00:00:00";"2012-01-03 00:00:00";"2012-01-05 00:00:00";"2012-01-02 00:00:00"

When you copy from pgAdmin , you don't get the column aliases, so below is a screenshot of what it looks like(Click on the image for a bigger sized image)

As you can see there are two parts where you can supply a number

I think I prefer the top one from the query below, since it resembles the DATEPART function more

(SomeDate + 2 * INTERVAL '1 Day' ) as Interval2Times1Day,
(SomeDate + 1 * INTERVAL '2 Day' ) as Interval2Days,

But as you saw,it was possible to add 4 days by using a one in both places like shown below

(SomeDate + 2 * INTERVAL '2 Days' ) as Interval2Times2Days,

And of course if you want, you can just use the number inside the string like in this example below

(SomeDate +  INTERVAL '1 Day' )    as IntervalD

It's up to you, but I don't like changing numbers inside a string

To do negative numbers, you just change the positive number to a negative number, here is the same query from before but now with negative numbers

SELECT SomeDate,(SomeDate + -1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + -1 * INTERVAL '1 D')    as IntervalD,
  (SomeDate + -2 * INTERVAL '1 Day' ) as Interval2Times1Day,
  (SomeDate + -1 * INTERVAL '2 Day' ) as Interval2Days,
  (SomeDate + -2 * INTERVAL '-2 Days')as Interval2Times2Days,
  (SomeDate +  INTERVAL '-1 Day' )    as IntervalD
FROM test

Here is the output
"2012-01-01";"2011-12-31 00:00:00";"2011-12-31 00:00:00";"2011-12-30 00:00:00";"2011-12-30 00:00:00";"2012-01-05 00:00:00";"2011-12-31 00:00:00"

As you can see that all works as expected, did you notice that the we get +4 when we do -2 * -2?

Here is the output also from pgAdmin so that you can see the column aliases

Besides using days, you can also use these parts of a date

Abbreviation	Meaning
Y	Years
M	Months (in the date part)
W	Weeks
D	Days
H	Hours
M	Minutes (in the time part)
S	Seconds

Let's take a look by using some of these in a query

SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 Week' ) as Interval1Week,
  (SomeDate + 1 * INTERVAL '1 Month' ) as Interval1Month,
  (SomeDate + 1 * INTERVAL '1 Year' ) as Interval1Year
  
FROM test
UNION ALL
SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 D' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 W' ) as Interval1Week,
  (SomeDate + 1 * INTERVAL '1 M' ) as Interval1Month,
  (SomeDate + 1 * INTERVAL '1 Y' ) as Interval1Year
  
FROM test

Here is the output

"2012-01-01";"2012-01-02 00:00:00";"2012-01-08 00:00:00";"2012-02-01 00:00:00";"2013-01-01 00:00:00"
"2012-01-01";"2012-01-02 00:00:00";"2012-01-08 00:00:00";"2012-01-01 00:01:00";"2013-01-01 00:00:00"

Here is the output also from pgAdmin so that you can see the column aliases

As you can see when using M it used minute not month. I would recommend to always use the full name and not the abbreviated part so as not to create confusion

That's all for this post

Friday, November 23, 2018

Happy Fibonacci day, here is how to generate a Fibonacci sequence in PostgreSQL

Image by Jahobr - Own work, CC0, Link

Since today is Fibonacci day I decided to to a short post about how to do generate a Fibonacci sequence in PostgreSQL. But first let's take a look at what a Fibonacci sequence actually is.

In mathematics, the Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones:

1, 1, 2, 3, 5, 8, 13, 21, 34, ...

Often, especially in modern usage, the sequence is extended by one more initial term:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

November 23 is celebrated as Fibonacci day because when the date is written in the mm/dd format (11/23), the digits in the date form a Fibonacci sequence: 1,1,2,3.

So here is how you can generate a Fibonacci sequence in PostgreSQL, you can do it by using s recursive table expression. Here is what it looks like if you wanted to generate the Fibonacci sequence to up to a value of 1 million

;WITH RECURSIVE Fibonacci (Prev, Next) as
(
     SELECT 0, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000

That will generate a Fibonacci sequence that starts with 0 if you need the Fibonacci sequence to start at 1, all you have to do is replace the 1 to 0 in the first select statement

;WITH RECURSIVE Fibonacci (Prev, Next) as
(
     SELECT 1, 1
     UNION ALL
     SELECT Next, Prev + Next
     FROM Fibonacci
     WHERE Next < 1000000
)
SELECT Prev as Fibonacci
     FROM Fibonacci
     WHERE Prev < 1000000

Here is what it looks like in PGAdmin when you run the query

Happy Fibonacci day!!

Here is pretty much the same post that I created for SQL Server: Happy Fibonacci day, here is how to generate a Fibonacci sequence in SQL