Use a Time Series to Quickly Generate A Date Dimension in Postgresql

Steven NgDaily Debug BlogLeave a Comment

generate_series

A neat feature in Postgresql is the generate_series function. generate_series is classified as a “Set Returning Function”, which in plain English means that it returns a bunch of rows. How it works is very similar to a for..next loop. You basically set up a start and stop point, and optionally add a step interval.

The syntax for generate_series is generate_series(start, stop, [step_interval])

Let’s say you wanted three rows, with values 1 to 3. To do this, you would use this:

select * from generate_series(1,3)

Which would return:

generate_series
1
2
3

Let’s try an example with a step interval. Let’s say you wanted alternating numbers between 1 and 10.

select * from generate_series(1,10,2)
generate_series
1
3
4
7
9

By using a step interval of 2, it means that from the starting number (1), it will jump every other number (in this case, the even numbers) until it reaches the end number. The start value always appears, but depending on the step interval, the end value may not.

While the last two examples are using numbers, the generate_series function also works with dates.

Let’s try generating rows for every month in 2019:

select * from generate_series( ‘2019-01-01’::timestamp, ‘2019-12-31’::timestamp, ‘1 month’::interval)

generate_series
2019-01-01 00:00:00
2019-02-01 00:00:00
2019-03-01 00:00:00
2019-04-01 00:00:00
2019-05-01 00:00:00
2019-06-01 00:00:00
2019-07-01 00:00:00
2019-08-01 00:00:00
2019-09-01 00:00:00
2019-10-01 00:00:00
2019-11-01 00:00:00
2019-12-01 00:00:00

Now that is pretty cool. But you might be asking, how can this be useful?

Well, if you’re building a data warehouse, you can quickly create a date dimension for any date range. This is similar to Vince’s post on how to generate a Date Dimension Table with Pentaho, except that it is written in native Postgresql SQL.

Creating a Date Dimension with generate_series

To create a date dimension, all you need to do is use generate_series for a given start and end date in a subquery. For example:

SELECT
  to_char(dimension_date, 'YYYYMMDD')::integer as date_id,
  dimension_date,
  to_char(dimension_date, 'YYYY')::integer as year,
  to_char(dimension_date, 'Q')::integer as quarter,
  to_char(dimension_date, 'MM')::integer as month,
  to_char(dimension_date, 'DD')::integer as day,
  to_char(dimension_date, 'Q')::integer * 10000 + to_char(dimension_date, 'MMDD')::integer as quarter_month_day,
  to_char(dimension_date, 'MMDD')::integer as month_day,
  to_char(dimension_date, 'Month') as month_name,
  to_char(dimension_date, 'Mon') as month_abbreviation,
  to_char(dimension_date, 'Day') as weekday_name,
  to_char(dimension_date, 'Dy') as weekday_abbreviation,
  to_char(dimension_date, 'W')::integer as week_in_month,
  to_char(dimension_date, 'WW')::integer as week_in_year,
  to_char(dimension_date, 'D')::integer as day_in_week,
  (dimension_date + 1) - to_date((date_part('year', dimension_date)::integer * 10000 + 101)::char(8),'YYYYMMDD') as day_in_year,
  case when to_char(dimension_date - INTERVAL '1 day','D')::integer < 6 then 1 else 0 end as is_working_day,
  to_char(dimension_date, 'YYYY')::integer - to_char(dimension_date - (day_in_dimension - 1), 'YYYY')::integer + 1 as year_in_dimension,
  (to_char(dimension_date, 'YYYY')::integer - to_char(dimension_date - (day_in_dimension - 1), 'YYYY')::integer) * 12 + to_char(dimension_date, 'MM')::integer as month_in_dimension,
  day_in_dimension,
  to_char(dimension_date, 'IYYY')::integer as iso_year,
  to_char(dimension_date, 'IW')::integer as iso_week_in_year,
  (to_char(dimension_date, 'IW')::integer - 1) * 7 + to_char(dimension_date - INTERVAL '1 day','D')::integer as iso_day_in_year,
  to_char(dimension_date - INTERVAL '1 day','D')::integer iso_day_in_week
FROM
  (
    SELECT
      date_trunc('day', generated_date):: date as dimension_date,
      (row_number() OVER ())::integer as day_in_dimension
    FROM
      generate_series( '2010-01-01'::timestamp, '2030-12-31'::timestamp, '1 day'::interval) generated_date
  ) time_series

The SQL above creates a query for a sample date dimension. Note in the subquery, I’ve also added a (row_number() OVER ())::integer column. This is simply a counter, which gives you a serial integer for every row in the subquery that I use for the “in dimension” column. I’ll explain that in my list of all the columns produced by the date dimension.

In my main query, I basically perform simple calculations on the date to derive all of my columns. Feel free to add to or change the calculations to suit your own use case.

List of Date Dimension Columns

ColumnUsage
date_idThe date expressed as an integer in YYYYMMDD format. This can be used as the primary key for the table.
dimension_dateThe date as a date type.
yearThe year
quarterThe quarter
monthThe month
dayThe day of the month
quarter_month_dayThe date expressed as an integer in QMMDD format. This is here to make multi-year quarter-to-date comparisons easier in your where clauses.
month_dayThe date expressed as in integer in MDD format. This is here to make your multi-year year-to-date or month-to-date comparisons easier in your where clauses.
month_nameThe full name of the month (e.g., “February”)
month_abbreviationThe three letter abbreviation of the month (e.g., “Feb”)
weekday_nameThe full name of the weekday (e.g., “Tuesday”)
weekday_abbreviationThe three letter abbreviation of the weekday (e.g., “Tue”)
week_in_monthThe week number in the month
week_in_yearThe week number in the year
day_in_weekThe day in the current week. Sunday = 1
day_in_yearThe day in the year
is_working_day1 if the day is a weekday, 0 if the day is on the weekend
year_in_dimensionThe ordinal year in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
month_in_dimensionThe ordinal calendar month in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
day_in_dimensionThe ordinal day in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
iso_yearISO 8601 year. This may not be in sync with traditional calendar years.
iso_week_in_yearThe week number in the ISO 8601 year.
iso_day_in_yearThe day number in the ISO 8601 year.
iso_day_in_weekThe ISO day in the current week. Monday = 1

You’ll notice that I have a bunch of seemingly pointless columns, like the quarter_month_day, month_day and “in dimension” columns. I put columns like that in to make my SQL where statements cleaner, or to do integer math instead of date math.

Let me give you an example. Let’s say today is April 10, 2019 and I want to do a year to date for this year and last year.

My where statement condition would be:

select 
  * from some_table left join 
  date_dimension 
  on 
  some_table.date_id = date_dimension.date_id 
where
  date_dimension.year in (2018, 2019)
  and 
  date_dimension.month_day <= 410

That’s a lot simpler than:

select 
  * 
from 
  some_table left join 
  date_dimension 
  on some_table.date_id = date_dimension.date_id 
where 
  (date_dimension.date_id between 20180101 and 20180410)
or
  (date_dimension.date_id between 20190101 and 20190410)

Depending on the types of reporting you do, you may find those columns useless, so it might make sense to exclude them from your implementation of this date dimension.

Because you’re probably going to modify the SQL to your situation, I didn’t include a CREATE TABLE statement to accompany the date dimension SQL. It’s also because you may actually choose to implement it without a table.

You can also use the code to create a view (materialized or not) or even use it as a subquery. Generally speaking, I wouldn’t go with those two options, but I don’t know your data situation, so go with the solution that works best for your particular situation.

 

 

Quick Hit: Microsoft Edge with Chromium is Now Ready to Preview

Steven NgDaily Debug BlogLeave a Comment

If you haven’t heard the news, Microsoft is moving the engine behind the Edge browser to Chromium, the engine behind Chrome. You can now try it out, as the preview program has gone live.

So you might be wondering how this will impact you. I don’t have an answer for that, but I can tell you how it will impact me.

  1. Assign It To Me, while capable of running on Edge, always had issues with performance when running in Edge compared to Chrome and Firefox. With the adoption of Chromium (which we use as a baseline for our development and testing) will make Assign It To Me much more performant for our users.
  2. If you’re a Windows laptop user, Microsoft’s own browsers tend to be better on battery consumption than other browsers. Chrome, has always been for lack of a better word, pig, on Windows. It eats up CPU cycles like you don’t believe, and of course, battery. Even with those flaws, I still can’t abandon Chrome for my own browsing use. I’m hoping this new incarnation of Edge will change that.
  3. If you’re a Cognos+Microsoft shop, you’ll no longer be forced to use IE to access Cognos. Cognos’s more recent user interfaces are better suited for Chromium/Webkit rendering engines, and now you’ll have a Cognos friendly Microsoft browser supporting modern web standards that is IT approved.

 

Quick Hit: Public Sans, A Free Font That’s Great for Reporting

Steven NgDaily Debug BlogLeave a Comment

The United States Web Design System released a free font called Public Sans. It’s available in multiple weights, but it has a few things going for it that make it a solid choice :

  • It’s a very nice sans serif font that is very legible
  • It has tabular figures, which are great for displaying numbers in charts and tables
  • It’s free and open source – the source is on Github, and you can report bugs and issues with the font (and from the looks of it, you’ll even get a prompt response)
  • With exception of the capital ‘O’ and zero ‘0’, I don’t believe that there are any visually ambiguous characters

If you’re having trouble finding a good font for your reporting system, give Public Sans a shot.

Quick Hit: PostgreSQL performance on Raspberry Pi

Steven NgDaily Debug BlogLeave a Comment

I’m sure they’re answering a question that almost nobody is asking, but Rustproof Labs did some benchmarks of Postgresql running on Raspberry Pi.

I don’t think I’d recommend that anyone run a relational database on a Pi, especially using SD as storage. But having said that, a lot of businesses with modest reporting needs could actually use a Pi as a database server.

Heck, for Assign It To Us, a Pi running Postgresql is more than enough for the data volume we need for our internal reporting (although I’d definitely want to use something other than an SD card for storage).

Practically speaking though, a lot of small businesses could probably get away with an inexpensive NUC for their business intelligence/data warehousing needs.

Quick Hit: Chef Extends Open Source Licensing to All Its Software

Steven NgDaily Debug BlogLeave a Comment

Chef announced today that they plan on open sourcing all of their software using a model similar to Red Hat’s. There is still a paid Enterprise version for commercial use.

If you’re not familiar with Chef, it’s a good tool for provisioning servers consistently and predictably. Having used it for Assign It To Me, I would say that it’s better for complex, constantly changing environments than it is for simple ones. If you’ve got smaller unchanging environments, you can still get by with a bunch of shell scripts or Capistrano.

It looks like they’ve cleaned up their product line too. Instead of figuring out which of the different chef tool versions, it looks like there’s now a simpler product called Chef Workstation.

If you want to the Chef apps without having to give up contact information, you can go to downloads.chef.io.

Update: Looking at some of the comments (including clarifications by the Chef team) on the related Hacker News thread, it appears that the downloadable versions Chef provides will not be free for commercial release.

If you plan on using a version of Chef for production servers, you will need to wait for another group to build downloadable binaries for you to use for free. As I mentioned earlier, this is just like Red Hat. If you don’t want to pay for Red Hat Enterprise, you use CentOS.

I don’t imagine it will be long before an organization like CentOS will be providing free downloadable versions of Chef under a slightly different name.