Tuesday, April 30, 2019

A quick and easy way to count the percentage of nulls without a where clause in PostgreSQL


This came up the other day, someone wanted to know the percentage of NULL values in a column

Then I said "I bet you I can run that query without using a NULL in the WHERE clause, as a matter of fact, I can run that query without a WHERE clause at all!!"

I created a SQL Server version of this post here: How to count NULLS without using IS NULL in a WHERE clause, this is the PostgreSQL version


To start, first create this table and verify you have 9 rows

CREATE TABLE foo(bar int);
INSERT INTO foo values(1),(null),(2),(3),(4),
 (null),(5),(6),(7);

SELECT * FROM foo;

Here is what the output should look like



To get the NULL values and NON NULL values, you can do something like this


SELECT COUNT(*) as CountAll FROM foo WHERE bar IS NOT NULL;
SELECT COUNT(*) as CountAll FROM foo WHERE bar IS  NULL;

However, there is another way

Did you know that COUNT behaves differently if you use a column name compared to when you use *

Take a look

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn
FROM foo;

If you ran that query, the result is the following

CountAll    CountColumn
----------- -----------
9           7

Here is what it looks like in pgAdmin3




Let's see what the PostgreSQL documentation has to say


COUNT(*) number of input rows

COUNT(expression) number of input rows for which the value of expression is not null

This is thus indeed documented behavior, count(*) counts all rows, count(column) skips the rows with NULLS

So now, lets change our query to return the percentage of non null values in the column

Here is what the query looks like

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues 
FROM foo;

Running that gives us this output

CountAll    CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9           7                  0

Here is what it looks like in pgAdmin3



As you can see, we have a 0 in the percentage column. The reason is that if you do math with integers, you will get an integer back


We can fix this easy by multiplying one of the integers by 1.0, this will convert the integer to a numeric data type . So instead of this line

 (COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues 

We will change it to be like this

(COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues


Here is the changed query

SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues 
FROM foo;

Here is the output

CountAll    CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9           7           77.7777777777800


Here is what it looks like in pgAdmin3



Instead of doing * 1.0, you can also explicitly cast the integer like this

(CAST (COUNT(bar) as numeric(20,10))/COUNT(*))*100 as PercentageOfNonNullValues

The query now looks like this


SELECT COUNT(*) as CountAll, 
  COUNT(bar) as CountColumn, 
  (CAST (COUNT(bar) as numeric(20,10))/COUNT(*))*100 as PercentageOfNonNullValues
FROM foo;

Here is what it looks like in pgAdmin3






That's it for this short post.. hopefully you knew this, if not, then you know it now  :-)

2 comments:

Craig S. Mullins said...

Well, you are actually returning the percentage of non-NULL column values (77+%) whereas you specified you want the percentage of NULL column values (which would be 22+%). You need to subtract the result you get from 100

Denis said...

Craig, yes, you are absolutely right.. it should have been the following

SELECT COUNT(*) as CountAll,
COUNT(bar) as CountColumn,
(COUNT(bar)*1.0/COUNT(*))*100.0 as PercentageOfNullValues,
100- (COUNT(bar)*1.0/COUNT(*))*100.0 as PercentageOfNonNullValues
FROM foo;