Monday, February 4, 2019

TWID Feb 3rd 2019, Huawei stealing, billion solar panels, big puzzle, beethoven, kodak

This is a post detailing some stuff I did, learned, posted and tweeted this week, I call this TWID (This week in Denis). I am doing this mostly for myself... a kind of an online journal so that I can look back on this later on. Will use the label TWID for these



This Week I Learned

Continued with the Getting Started with Docker Swarm Mode Pluralsight course

Started reading the book The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computability and the Turing Machine by Charles Petzold


This Week I Tweeted

Huawei is accused of attempting to copycat a T-Mobile robot, and the charges read like a comical spy movie

The US on Monday charged the Chinese phone giant Huawei with trying to steal trade secrets from T-Mobile, among other crimes.
One Justice Department indictment includes internal emails between Huawei's US and Chinese employees who prosecutors said were trying to copy a T-Mobile device-testing robot.
The emails read like a comical spy movie, with one set of employees trying to avoid wrongdoing and another engineer getting caught putting part of the robot into his bag.
Huawei said that it hasn't violated any US laws and that it already settled with T-Mobile in a civil lawsuit.

What the heck is going on here?


How Do You Count Every Solar Panel in the U.S.? Machine Learning and a Billion Satellite Images

The DeepSolar Project, developed by engineers and computer scientists at Stanford University, is a machine learning framework that analyzes a dataset of satellite images in order to identify the size and location of installed solar panels.

To accurately count the panels, the DeepSolar team used a machine learning algorithm to analyze more than a billion high-resolution satellite images. The algorithm identified what the team believes to be almost every solar power installation across the contiguous 48 states.

The DeepSolar analysis reached a total of 1.47 million solar installations in the U.S., a much higher number than either of the two most commonly cited estimates.

“We can use recent advances in machine learning to know where all these assets are, which has been a huge question, and generate insights about where the grid is going and how we can help get it to a more beneficial place,” said Ram Rajagopal, associate professor of civil and environmental engineering, who supervised the project with Arun Majumdar, professor of mechanical engineering.

This is cool..but do they really need to use a billion images.. can't they use a subset and come pretty close to the real answer as well?


40x faster hash joiner with vectorized execution

For the past four months, I’ve been working with the incredible SQL Execution team at Cockroach Labs as a backend engineering intern to develop the first prototype of a batched, column-at-a-time execution engine. During this time, I implemented a column-at-a-time hash join operator that outperformed CockroachDB’s existing row-at-a-time hash join by 40x. In this blog post, I’ll be going over the philosophy, challenges, and motivation behind implementing a column-at-a-time SQL operator in general, as well as some specifics about hash join itself.

In CockroachDB, we use the term “vectorized execution” as a short hand for the batched, column-at-a-time data processing that is discussed throughout this post.

I love it when you see drastic speed improvements like this, I remember one time when we upgraded some hardware to use SSDs and more RAM.. a reporting query that took a minute now finished in less than a second.. I thought the results were wrong because it finished too fast lol


Some cool stuff you might enjoy


Kodak Premium Puzzle Presents: The World's Largest Puzzle 51,300 Pieces 27 Wonders from Around The World 28.5 Foot x 6.25 Foot Jigsaw Puzzle 

That is 8.68 meters wide for the metric people....aka 1 big white shark length



I don't always listen to classical music.. but when I do.. I make sure it's played on an electric guitar, enjoy this piece of Ludwig van Beethoven - Moonlight Sonata ( 3rd Movement ) by Tina S


Sunday, January 27, 2019

TWID Jan 27th, Salah, SSMS emoji, franz josef, mourning stamps,american gods season 2, PostgreSQL

This is a post detailing some stuff I did, learned, posted and tweeted this week, I call this TWID (This week in Denis). I am doing this mostly for myself... a kind of an online journal so that I can look back on this later on. Will use the label TWID for these



This Week I Learned


Continued with the Getting Started with Docker Swarm Mode Pluralsight course
Finished A History of Japan by R.H.P. And J. G. Caiger Mason


Did you know, you can use emojis in output from SSMS?


More info, including the code to run can be found here: Did you know, you can use emojis in output from SSMS?


This Week I Tweeted


Why are glasses so expensive? The eyewear industry prefers to keep that blurry

It’s a question I get asked frequently, most recently by a colleague who was shocked to find that his new pair of prescription eyeglasses cost about $800. Why are these things so damn expensive? The answer: Because no one is doing anything to prevent a near-monopolistic, $100-billion industry from shamelessly abusing its market power. Prescription eyewear represents perhaps the single biggest mass-market consumer ripoff to be found. The stats tell the whole story.

Totally agree




The Internals of PostgreSQL for database administrators and system developers

In this document, the internals of PostgreSQL for database administrators and system developers are described.

PostgreSQL is an open source multi-purpose relational database system which is widely used throughout the world. It is one huge system with the integrated subsystems, each of which has a particular complex feature and works with each other cooperatively. Although understanding of the internal mechanism is crucial for both administration and integration using PostgreSQL, its hugeness and complexity prevent it. The main purposes of this document are to explain how each subsystem works, and to provide the whole picture of PostgreSQL.

This document is based on the second part of the book I wrote in Japanese in 2012 (ISBN-13: 978-4774153926) which is composed of seven parts, and covers version 11 and earlier.

Contents 

What are you wating for.. start reading.....


Since we are talking about PostgreSQL.....


I am thrilled to announce that we have acquired Citus Data, a leader in the PostgreSQL community. Citus is an innovative open source extension to PostgreSQL that transforms PostgreSQL into a distributed database, dramatically increasing performance and scale for application developers. Because Citus is an extension to open source PostgreSQL, it gives enterprises the performance advantages of a horizontally scalable database while staying current with all the latest innovations in PostgreSQL. Citus is available as a fully-managed database as a service, as enterprise software, and as a free open source download.

Since the launch of Microsoft’s fully managed community-based database service for PostgreSQL in March 2018, its adoption has surged. Earlier this month, PostgreSQL was named DBMS of the Year by DB-Engines, for the second year in a row. The acquisition of Citus Data builds on Azure’s open source commitment and enables us to provide the massive scalability and performance our customers demand as their workloads grow.

Together, Microsoft and Citus Data will further unlock the power of data, enabling customers to scale complex multi-tenant SaaS applications and accelerate the time to insight with real-time analytics over billions of rows, all with the familiar PostgreSQL tools developers know and love.

That's pretty cool..hopefully they will leave the team alone and not do a FoxPro type move.....  But since Ballmer is not in charge anymore, I have faith

The bucket contained 21 files containing 23,000 pages of PDF documents stitched together — or about 1.3 gigabytes in size. Diachenko said that portions of the data in the exposed Elasticsearch database on Wednesday matched data found in the Amazon S3 bucket, confirming that some or all of the data is the same as what was previously discovered. Like in Wednesday’s report, the server contained documents from banks and financial institutions across the U.S., including loans and mortgage agreements. We also found documents from the U.S. Department of Housing and Urban Development, as well as W-2 tax forms, loan repayment schedules and other sensitive financial information.


This is really, really messed up


Google Search Operators: The Complete List (42 Advanced Operators)

Did you know that Google is constantly killing useful operators?
That’s why most existing lists of Google search operators are outdated and inaccurate.
For this post, I personally tested EVERY search operator I could find.

Here is a complete list of all working, non‐working, and “hit and miss” Google advanced search operators as of 2018.

All you need to step up your Google Fu






Some cool stuff you might enjoy


Austro-Hungarian emperor Franz Josef defaced

Franz Joseph defaced....

See more info here: The revenge of Yugoslavia on Bosnian stamps of 1906

If you are interested in philately, see also, why do some of these stamps have black perforations?


Took this pic of Liverpool player Mo Salah mural near time square

Mo Salah ...   Larger Than Life



American Gods | Season 2 Official Trailer

 

 Can't wait to watch that, have watched season I and have also read the book last year



Sunday, January 20, 2019

TWID Jan 20th 2019.. ... Crypts of Winterfell, Brexit, re:MARS, remars, myths, trace flags, agile bs

This is a post detailing some stuff I did, learned, posted and tweeted this week, I call this TWID (This week in Denis). I am doing this mostly for myself... a kind of an online journal so that I can look back on this later on. Will use the label TWID for these

This Week I Learned

Continued with the Getting Started with Docker Swarm Mode Pluralsight course
Finished D'Aulaires' Book of Greek Myths by Ingri d'Aulaire, Edgar Parin d'Aulaire



This Week I Tweeted

MSSQL Tiger Team:   Let’s talk about trace flags

So why do we even have trace flags to begin with?
Some trace flags are used to enable enhanced debugging features such as additional logging, memory dumps etc. and are used only when you are working with Microsoft Support to provide additional data for troubleshooting. These trace flags are not ones you want to leave turned on in a production system as they may have a negative impact on your workload. An example of one of these flags would be TF 2551 which is used to trigger a filtered memory dump whenever there is an exception or assertion in the SQL Server process. These trace flags are only used for a short period of time and typically only at the recommendation of Microsoft Support, so they will likely always be around.

Other trace flags are used to alter the behavior of the server in other ways, such as to turn a feature ON or OFF or change the way the database engine manages resources. One example of this type of trace flag is 7752 which was introduced as a knob for something that is default behavior in Azure SQL DB. In a SQL Server (on-prem or IaaS) database that is undergoing recovery and has Query Store (QDS) enabled, user queries will be blocked until all the data required for QDS to start is loaded. This ensures QDS doesn't miss any queries that are executed in that database. In some cases, this can take a long time to complete and you'll see sessions with a wait type of QDS_LOADDB until the QDS becomes available. Turning on TF 7752 makes this process asynchronous so that user queries can proceed while the QDS starts. It's not something we want to make the default behavior in general because it means that some query executions won't be captured by QDS, but it's a tradeoff you might be willing to make in order to reduce the time it takes for a database to become available upon restart or failover. This is the sort of trace flag that we are trying to incorporate into the product in some other way, such as to provide a database-scoped configuration. Moving forward, we hope not to introduce any new trace flags like this, and over time the number of flags of this type should approach zero.

Make sure to read the rest of the post and read the links to the knowledge base articles as well



Broadcast group Discovery shifts jobs from London to Amsterdam

Discovery has become the latest international broadcast group to move European operations to the Netherlands following Britain’s decision to pull out of the EU. Discovery has had a Dutch base since 1989 but has now applied for EU licences for its paid channel portfolio through the Netherlands. 

This decision ensures continuity of our services for the view across Europe,’ the company said in a statement.


I have seen a lot of these job shifts over the last year because of Brexit, even the company I work for has moved people to Amsterdam from London. This will be very costly for London.


Bullet Journal Progress Walls and Progress Grids

The one thing I like to have in my Bullet Journal is a progress wall or progress grid
What a progress wall can do for you is to visually let you know how long you are from your goal. You can make it look neat and pretty or as in my case a little messy


Bullet Journal Progress Grid


Detecting Agile BS(from the US Department Of Defense) [pdf] 

Agile is a buzzword of software development, and so all DoD software development projects
are, almost by default, now declared to be “agile.” The purpose of this document is to provide
guidance to DoD program executives and acquisition professionals on how to detect software
projects that are really using agile development versus those that are simply waterfall or spiral
development in agile clothing (“agile-scrum-fall”). 
Pretty cool from the department of Defense


MongoDB removed from RHEL 8 beta due to license

Note that the NoSQL MongoDB database server is not included in RHEL 8.0 Beta because it uses the Server Side Public License (SSPL).

Didn't AWS roll out their own version as well?


Lego collecting delivers huge and uncorrelated market returns

A lot of fancy things can be built with Lego sets nowadays. Such as a diversifying portfolio that loads on the Fama-French size factor. Collecting Lego -- yes, the plastic toys made of interlocking bricks that become cars and castles and robots -- returned more than large stocks, bonds and gold in the three decades ending in 2015, says a study by Victoria Dobrynskaya, an assistant professor at Russia’s Higher School of Economics. Aspects of the performance even align with returns sought by owners of smart-beta ETFs.

Man, I wish I still had my space sets from the 70s intact


Some cool stuff you might enjoy

Apple Open-Sources FoundationDB Record Layer

Today, we are excited to announce the open source release of the FoundationDB Record Layer!

From its inception, FoundationDB was designed as a highly scalable key-value store with a simple API. Layers extend the functionality of the database by adding features and data models to allow new storage access patterns. Today we are releasing the FoundationDB Record Layer, which provides relational database semantics on top of FoundationDB. This layer features schema management, indexing facilities, and a rich set of query capabilities. The Record Layer is used in production at Apple to support applications and services for hundreds of millions of users.

Supports ACID transactions


re:MARS, a new AI event for machine learning, automation, robotics, and space

Machine Learning (ML) and Artificial Intelligence (AI) are behind almost everything we do at Amazon. Some of this work is highly visible, such as autonomous Prime Air delivery drones, eliminating checkout lines at Amazon Go, and making everyday life more convenient for customers with Alexa. But much of what we do with AI and ML happens beneath the surface – from the speed in which we deliver packages, to the broad selection and low prices we’re able to offer customers, to automatic extraction of characters and places from books and videos with X-Ray. This is in addition to the unrivaled breadth and depth of AI and ML services that AWS offers businesses of all sizes. And it’s still day one for AI at Amazon.

Today we are excited to announce we are bringing together some of the brightest leaders across science, academia, and business to explore innovation, scientific advancements, and practical applications of AI and ML. re:MARS, Amazon’s new AI event focused on Machine learning, Automation, Robotics, and Space, will take place June 4-7, 2019 at the ARIA Resort & Casino in Las Vegas. Business leaders and technical builders will learn, share, and further imagine how these four fields of study will shape the future of AI.

See also here:  https://remars.amazon.com/


Game of Thrones | Season 8 | Official Tease: Crypts of Winterfell (HBO)



We have all been waiting for this... but where are the books??

Monday, January 14, 2019

Bullet Journal Progress Walls and Progress Grids

I got a Bullet Journal as a gift in December and have been messing around with it a little
The one thing I like to have in my Bullet Journal is a progress wall or progress grid
What a progress wall can do for you is to visually let you know how long you are from your goal. You can make it look neat and pretty or as in my case a little messy


I decided to run 750 miles this year and also do 15,000 push ups. This means I will do about 100 push ups every other day and run about 4 miles 4 times a week

What I did for the 17,500 push ups is divide it up by 100 so that I need a wall with 175 bricks

For the 750 mile run, I divided it by 2.5 miles so that I ended up with 300 bricks

Once you are done with fill in the brick in your favorite color, I chose the color red


Here is what my messy walls look like



Once you finish a row, put a date next to it, this way you can quickly see when you completed that section


If you prefer a neater version, you can of course do a grid, here is what the progress grid I created for my SQL Server blog posts looks like





My goal is to write 100 SQL Server Posts in 2019, here is what the grid looks like right now, so far I created 4 post, I have colored those brick in red

You can use a progress wall for all kind of things... maybe you are saving for a vacation and putting $20 away each week... divide the amount you need by 20 and you have how many bricks you need in your wall. 

A progress wall is a fun and creative way to keep track of how far you are from your goal 

Let me know in a comment what you would use a progress wall for to track progress


Sunday, January 13, 2019

TWID Jan 13th 2019.. Rules of 3. hacked photosynthesis. Containers Killed The Virtual Machine

This is a post detailing some stuff I did, learned, posted and tweeted this week, I call this TWID (This week in Denis). I am doing this mostly for myself... a kind of an online journal so that I can look back on this later on. Will use the label TWID for these

This Week I Learned


Finished the Getting Started with Docker Pluralsight course


Rules of 3... for survival
3 Minutes for air
3 Days for water
3 Weeks for food


This Week I Tweeted

Scientists Have 'Hacked Photosynthesis' In Search Of More Productive Crops

"This is a very important finding," she says. "It's really the first major breakthrough showing that one can indeed engineer photosynthesis and achieve a major increase in crop productivity."

It will be many years, though, before any farmers plant crops with this new version of photosynthesis. Researchers will have to find out whether it means that a food crop like soybeans actually produces more beans — or just more stalks and leaves.

Then they'll need to convince government regulators and consumers that the crops are safe to grow and eat.


Times are tough in 2019 thanks to the US-China trade war and an escalating war of words between Washington and Beijing over tech leadership
Chinese companies at CES all agreed though that while the trade war has adversely impacted their business in the US, it remains a very important market

“We are definitely affected by the tariffs, in fact one of our big US customers is moving their manufacturing operations outside of China to Vietnam to avoid an increase in the cost of doing business,” said Yuki, a saleswoman from Dongguan-based Ruiheng Electronic Co. Ltd., which manufactures power adaptors and circuit boards.



Costco Sells Out of 27-Pound ‘Storage Bucket’ Mac and Cheese With 20-Year Shelf Life

Bad news for those looking to stock up on Costco’s food hit, the 27-pound bucket of macaroni and cheese with 20 years of shelf life: it’s now sold out on the warehouse’s website after going viral on social media.

Who buys this stuff?


Containers Killed The Virtual Machine Star

We predict new enterprise application development will pass a tipping point in 2019 and shift away from legacy virtual machines (VMs) and strongly toward containers and Kubernetes container orchestration.

To be precise, we predict that:
  • The future is multi-cloud, and multi-cloud means Docker containers with Kubernetes orchestration. Every public cloud has its own APIs, and in that sense, they are all new versions of proprietary mainframes.
  • “Lift-and-shift” of VMware virtual machines will be more expensive than customers realize. Paying no money up front, in this case to refactor and port applications to Kubernetes, typically means paying more during operations.
  • Java is not dead. It may play an important middle ground between lift-and-shift and the expense of completely refactoring applications for cloud-native environments. Java may be a light touch version of “move-and-improve.”
  • It seems likely that Lenovo will take a look at acquiring SUSE, with Supermicro perhaps also in the mix.

Docker/Kubernetes and other container technologies is all the rage now...especially with DevOps


As the U.S. federal shutdown continues, dozens of U.S. government websites have been rendered either insecure or inaccessible due to expired transport layer security (TLS) certificates that have not been renewed.

In fact, .gov websites are using more than 80 TLS certificates that have expired, according to a new Thursday report by Netcraft. That’s because funding for renewals has been paused. That opens the impacted sites to an array of cyber-attacks; most notably, man-in the-middle attacks, which allow bad actors to intercept exchanges between a user and a web application—either to eavesdrop or to impersonate the website and steal any data that the user may input.

That's not good... shouldn't these people be on the essential employee list?
 


Some cool stuff you might enjoy

500 Top PDFs posted to Hacker News in 2018

The top 5




Mongo song....

Hey Mongo I just met you And this is craaazy but here's my data so store it maybe?

B bu but.. it's webscale :-)




The ideal amount of sunlight for growing your garden








Seems very overwhelming to me

.



Alexander Hall as seem from Blair Hall at Princeton University

Alexander Hall


Took this on my way to pick up the kids from an event

Monday, December 3, 2018

Using PostgreSQL's Interval to mimic SQL Server's DATEADD function




I wanted to do some date calculations in PostgreSQL and was doing some research on if something like the DATEDIFF function that exists in SQL Server is available in PostgreSQL. These notes are mostly for me so I can refer back to them..but maybe they are useful for someone else as well

In SQL Server to add dates, minutes and other fractions of a date to a date, you can use the DATEADD function

Here are some quick examples, if you want to run this in SQL Server, create this table with one row first

CREATE  TABLE test(SomeDate date);

INSERT INTO test
VALUES('20120101');


And here is a simple DATEADD query, that adds 1 or 2 days to a date by using both the datepart and the abbreviated datepart

SELECT SomeDate,
  DATEADD(dd,1,SomeDate) as Interval1dd,
  DATEADD(dd,2,SomeDate) as Interval2dd,
  DATEADD(day,1,SomeDate) as Interval1Day,
  DATEADD(day,2,SomeDate) as Interval2Day
FROM test


That will give us the following output

SomeDate Interval1dd Interval2dd Interval1Day Interval2Day
2012-01-01 2012-01-02 2012-01-03 2012-01-02 2012-01-03


If you want to go negative, all you have to do is place a minus sign in front of the number

SELECT SomeDate,
  DATEADD(dd,-1,SomeDate) as Interval1dd,
  DATEADD(dd,-2,SomeDate) as Interval2dd,
  DATEADD(day,-1,SomeDate) as Interval1Day,
  DATEADD(day,-2,SomeDate) as Interval2Day
FROM test

Here is the output of that query

SomeDate Interval1dd Interval2dd Interval1Day Interval2Day
2012-01-01 2011-12-31 2011-12-30 2011-12-31 2011-12-30

Here are all the valid datepart arguments in SQL Server


datepart Abbreviations
year yy, yyyy
quarter qq, q
month mm, m
dayofyear dy, y
day dd, d
week wk, ww
weekday dw, w
hour hh
minute mi, n
second ss, s
millisecond ms
microsecond mcs
nanosecond ns


Now let's take a look at PostgreSQL

In PostgreSQL, there is no DATEPART function, but you can use interval literals to accomplish something that behaves the same

Let's do something similar like we did with the SQL Server queries, first create this temp table with one row


CREATE Temp TABLE test(SomeDate date);


INSERT INTO test
VALUES( to_date('20120101','YYYYMMDD'));



Now let's run this query

SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 D')  as IntervalD,
  (SomeDate + 2 * INTERVAL '1 Day' ) as Interval2Times1Day,
  (SomeDate + 1 * INTERVAL '2 Day' ) as Interval2Days,
  (SomeDate + 2 * INTERVAL '2 Days' ) as Interval2Times2Days,
  (SomeDate +  INTERVAL '1 Day' )    as IntervalD
FROM test



Here is the output

"2012-01-01";"2012-01-02 00:00:00";"2012-01-02 00:00:00";"2012-01-03 00:00:00";"2012-01-03 00:00:00";"2012-01-05 00:00:00";"2012-01-02 00:00:00"

When you copy from pgAdmin , you don't get the column aliases, so below is a screenshot of what it looks like(Click on the image for a bigger sized image)





As you can see there are two parts where you can supply a number

I think I prefer the top one from the query below, since it resembles the DATEPART function more

(SomeDate + 2 * INTERVAL '1 Day' ) as Interval2Times1Day,
(SomeDate + 1 * INTERVAL '2 Day' ) as Interval2Days,



But as you saw,it was possible to add 4 days by using a one in both places like shown below

(SomeDate + 2 * INTERVAL '2 Days' ) as Interval2Times2Days,

And of course if you want, you can just use the number inside the string like in this example below

(SomeDate +  INTERVAL '1 Day' )    as IntervalD


It's up to you, but I don't like changing numbers inside a string

To do negative numbers, you just change the positive number to a negative number, here is the same query from before but now with negative numbers

SELECT SomeDate,(SomeDate + -1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + -1 * INTERVAL '1 D')    as IntervalD,
  (SomeDate + -2 * INTERVAL '1 Day' ) as Interval2Times1Day,
  (SomeDate + -1 * INTERVAL '2 Day' ) as Interval2Days,
  (SomeDate + -2 * INTERVAL '-2 Days')as Interval2Times2Days,
  (SomeDate +  INTERVAL '-1 Day' )    as IntervalD
FROM test

Here is the output
"2012-01-01";"2011-12-31 00:00:00";"2011-12-31 00:00:00";"2011-12-30 00:00:00";"2011-12-30 00:00:00";"2012-01-05 00:00:00";"2011-12-31 00:00:00"

As you can see that all works as expected, did you notice that the we get +4 when we do -2 * -2?

Here is the output also from pgAdmin so that you can see the column aliases


Besides using days, you can also use these parts of a date


Abbreviation Meaning
Y Years
M Months (in the date part)
W Weeks
D Days
H Hours
M Minutes (in the time part)
S Seconds


Let's take a look by using some of these in a query


SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 Day' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 Week' ) as Interval1Week,
  (SomeDate + 1 * INTERVAL '1 Month' ) as Interval1Month,
  (SomeDate + 1 * INTERVAL '1 Year' ) as Interval1Year
  
FROM test
UNION ALL
SELECT SomeDate,(SomeDate + 1 * INTERVAL '1 D' ) as Interval1Day,
  (SomeDate + 1 * INTERVAL '1 W' ) as Interval1Week,
  (SomeDate + 1 * INTERVAL '1 M' ) as Interval1Month,
  (SomeDate + 1 * INTERVAL '1 Y' ) as Interval1Year
  
FROM test

Here is the output

"2012-01-01";"2012-01-02 00:00:00";"2012-01-08 00:00:00";"2012-02-01 00:00:00";"2013-01-01 00:00:00"
"2012-01-01";"2012-01-02 00:00:00";"2012-01-08 00:00:00";"2012-01-01 00:01:00";"2013-01-01 00:00:00"


Here is the output also from pgAdmin so that you can see the column aliases



As you can see when using M it used minute not month. I would recommend to always use the full name and not the abbreviated part so as not to create confusion

That's all for this post

Sunday, November 25, 2018

Easy running totals with windowing functions in PostgreSQL




Back in the pre windowing function days, if you wanted to do a running count, you either had to run a subquery or you could use a variable. This was slow because for each row the query that did the sum would be executed. With windowing functions in PostgreSQL, this is now running much faster. 

Let's take a look, first create the following table


CREATE Temp TABLE test(Id int,SomeDate date, Charge decimal(20,10));


insert into test
values( 1,to_date('20120101','YYYYMMDD'),1000);
insert into test
values( 1,to_date('20120401','YYYYMMDD'),200);
insert into test
values( 1,to_date('20120501','YYYYMMDD'),300);
insert into test
values( 1,to_date('20120601','YYYYMMDD'),600);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),100);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),500);
insert into test
values( 2,to_date('20120101','YYYYMMDD'),-800);
insert into test
values( 3,to_date('20120101','YYYYMMDD'),100);


let's check that data we just inserted into the temporary table


SELECT * FROM test


The output looks like this

Id SomeDate Charge
1 2012-01-01 1000.0000000000
1 2012-04-01 200.0000000000
1 2012-05-01 300.0000000000
1 2012-06-01 600.0000000000
2 2012-01-01 100.0000000000
2 2013-01-01 500.0000000000
2 2014-01-01 -800.0000000000
3 2012-01-01 100.0000000000


What we want is the following

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2012-12-31 100.0000000000 100.0000000000
2 2013-01-01 2013-12-31 500.0000000000 600.0000000000
2 2014-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000

For each row, we want to have the date that the row starts on and also the date when it ends, we also want a running total as well. If there is no row after the current row for that id, we want the end date to be 9999-12-31.

So we will use a couple of functions. The first one is LEAD, LEAD accesses data from a subsequent row in the same result set without the use of a self-join. So the LEAD part looks like this

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231') OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,

What we are doing is subtracting 1 from the date in the subsequent row (SomeDate + -1 * INTERVAL '1 day' )
We are using 1 as the offset since we want to apply this to the next row. Finally if there is no subsequent row, we want to use the date 9999-12-31 instead of NULL

To do the running count, we will do the following

SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
          AS RunningTotal

What this means in English is for each id ordered by date, sum up the charge values for the rows between the preceding rows and the current row. Here is what all that stuff means.

ROWS BETWEEN
Specifies the rows that make up the range to use as implied by

UNBOUNDED PRECEDING
Specifies that the window starts at the first row of the partition. UNBOUNDED PRECEDING can only be specified as window starting point.

CURRENT ROW
Specifies that the window starts or ends at the current row when used with ROWS or the current value when used with RANGE.
CURRENT ROW can be specified as both a starting and ending point.

And here is the query


SELECT id, someDate as StartDate,
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231')
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM test
  ORDER BY id, SomeDate


And running that query, gives us the running count as well as the end dates 

id StartDate Enddate         Charge         RunningTotal
1 2012-01-01 2012-03-31 1000.0000000000 1000.0000000000
1 2012-04-01 2012-04-30 200.0000000000 1200.0000000000
1 2012-05-01 2012-05-31 300.0000000000 1500.0000000000
1 2012-06-01 9999-12-31 600.0000000000 2100.0000000000
2 2012-01-01 2011-12-31 100.0000000000 100.0000000000
2 2012-01-01 2011-13-31 500.0000000000 600.0000000000
2 2012-01-01 9999-12-31 -800.0000000000 -200.0000000000
3 2012-01-01 9999-12-31 100.0000000000 100.0000000000


Here is what it looks like if you execute the query in PGAdmin



If you don't want the last row to have the end date filled in, just omit the default value in the LEAD function. Instead of

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1,'99991231')

Make it

LEAD((SomeDate + -1 * INTERVAL '1 day' ),1)

Here is the whole query again


SELECT id, someDate as StartDate,
LEAD((SomeDate + -1 * INTERVAL '1 day' ),1)
 OVER (PARTITION BY id ORDER BY SomeDate) as Enddate,
  Charge,
  SUM(Charge) OVER (PARTITION BY id ORDER BY SomeDate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM test
  ORDER BY id, SomeDate
And here is what the output looks like after we made the change
As you can see the rows for an id that doesn't have a row with a date greater than the current date will have a null end date
That is all for this post