This came up the other day, someone wanted to know the percentage of NULL values in a column
Then I said "I bet you I can run that query without using a NULL in the WHERE clause, as a matter of fact, I can run that query without a WHERE clause at all!!"
I created a SQL Server version of this post here: How to count NULLS without using IS NULL in a WHERE clause, this is the PostgreSQL version
To start, first create this table and verify you have 9 rows
CREATE TABLE foo(bar int); INSERT INTO foo values(1),(null),(2),(3),(4), (null),(5),(6),(7); SELECT * FROM foo;
Here is what the output should look like
To get the NULL values and NON NULL values, you can do something like this
SELECT COUNT(*) as CountAll FROM foo WHERE bar IS NOT NULL; SELECT COUNT(*) as CountAll FROM foo WHERE bar IS NULL;
However, there is another way
Did you know that COUNT behaves differently if you use a column name compared to when you use *
Take a look
SELECT COUNT(*) as CountAll, COUNT(bar) as CountColumn FROM foo;
If you ran that query, the result is the following
CountAll CountColumn
----------- -----------
9 7
Here is what it looks like in pgAdmin3
Let's see what the PostgreSQL documentation has to say
COUNT(*) number of input rows
COUNT(expression) number of input rows for which the value of expression is not null
This is thus indeed documented behavior, count(*) counts all rows, count(column) skips the rows with NULLS
So now, lets change our query to return the percentage of non null values in the column
Here is what the query looks like
SELECT COUNT(*) as CountAll, COUNT(bar) as CountColumn, (COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues FROM foo;Running that gives us this output
CountAll CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9 7 0
Here is what it looks like in pgAdmin3
As you can see, we have a 0 in the percentage column. The reason is that if you do math with integers, you will get an integer back
We can fix this easy by multiplying one of the integers by 1.0, this will convert the integer to a numeric data type . So instead of this line
We will change it to be like this(COUNT(bar)/COUNT(*))*100 as PercentageOfNonNullValues
(COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues
Here is the changed query
SELECT COUNT(*) as CountAll, COUNT(bar) as CountColumn, (COUNT(bar)*1.0/COUNT(*))*100 as PercentageOfNonNullValues FROM foo;
Here is the output
CountAll CountColumn percentageOfNonNullValues
----------- ----------- ---------------------------------------
9 7 77.7777777777800
Here is what it looks like in pgAdmin3
Instead of doing * 1.0, you can also explicitly cast the integer like this
(CAST (COUNT(bar) as numeric(20,10))/COUNT(*))*100 as PercentageOfNonNullValuesThe query now looks like this
2 comments:
Well, you are actually returning the percentage of non-NULL column values (77+%) whereas you specified you want the percentage of NULL column values (which would be 22+%). You need to subtract the result you get from 100
Craig, yes, you are absolutely right.. it should have been the following
SELECT COUNT(*) as CountAll,
COUNT(bar) as CountColumn,
(COUNT(bar)*1.0/COUNT(*))*100.0 as PercentageOfNullValues,
100- (COUNT(bar)*1.0/COUNT(*))*100.0 as PercentageOfNonNullValues
FROM foo;
Post a Comment