Getting Started With Windows Subsystem for Linux

Steven NgDaily Debug BlogLeave a Comment

The Problem

My Windows text editor was choking while prettifying a complicated XML file, and I wanted to be able to use XMLint without having to set up a Linux VM.

One of the cooler features of Windows 10 is the Windows Subsystem for Linux, or WSL for short.

WSL basically lets you have a full, real Linux installation running on Windows, and you can access it from a command prompt. It’s has its own filesystem, but you can interact with your Windows filesystem and vice versa.

The WSL is especially handy for people switching from Mac to Windows, as it gives them a worthy substitute to OSX’s Terminal app (although I would argue CMDer/ConEMU do too).

So how it works is this. You need to enable the Windows Subsystem for Linux on your computer, and then you can pick the flavor(s) of Linux to install, many of which are available for free from the Windows store.

Once you’ve installed the WSL and a Linux distro for WSL, all you need to do is fire it up from the command line. Neat.

 

Enable WSL

In Windows, go to the Control Panel, and select Programs. In the Programs and Features section, click on “Turn on Windows features on or off”.

In the resulting dialog, make sure Windows Subsystem for Linux is ticked, and click OK. You’ll probably get another dialog telling you to reboot, so save any open files and reboot.

That’s all it takes to get setup, so now you need to add one or more Linux distributions.

 

Install a Linux Distribution

Open up the Windows Store app on your computer and search for “wsl” (without the quotes). You should get a list of available distributions for WSL. The big ones are free, but some people have put up other distributions for a fee.

Just pick a distro and click  on the “Get” button that appears on the page. For this example, I’m going to assume that you picked “Ubuntu”, which is probably the most popular Linux Distribution.

Note that I did not pick an Ubuntu LTS version, but if you choose one of those, the launch command may be different. The launch command is usually in the Windows Store description for the distro.

You will be prompted for a location to install the Linux distro. The distro needs to be installed on a system drive, so C:\ is usually a good destination.

It’s that simple.

 

Launching the Linux Shell

Once it’s done, fire up your favorite terminal app. I use CMDer, but the built in command prompt is fine.

Enter this command:

ubuntu

The command prompt will change to look like

{yourusername}@{machinename}: $

From there, you’re able to the command line as though you are SSH’ed into a Linux box. You can install packages and services etc. just as though it was running in a virtual machine. If you are planning on running services, make sure that you use port numbers that don’t conflict with any services you are running on Windows.

 

Common Activities

 

To access Windows files from WSL

All of your Windows drives are accessible from the /mnt path. For example, your C drive is at /mnt/c

 

To access your Linux Files from Windows

If you’re a version of Windows older than Windows 10 Version 1903*, you will have to set up Samba or something like SFTP to access your Linux files. It’s inconvenient, but not for long.

Microsoft has received a lot of feedback and fixed this in Windows 10 Version 1903 by allowing you to use Windows Explorer to access your files natively from the WSL command prompt. To access your Linux files from 1903, you open up an Explorer instance from your WSL command prompt using:

explorer.exe .

There is a gotcha with respect to accessing Linux files from Windows, irrespective of your chosen method. Your Linux instance must be running for Windows to be able to see the files.

* At the time of posting, Windows 10 Version 1903 is due to arrive by the end of May 2019.

To upgrade your Distro

In many cases, you can update your Linux distro from the WSL command line. For the Ubuntu distro, you can use this command:

do-release-upgrade

If you’re using another distro, the command may differ.

Use a Time Series to Quickly Generate A Date Dimension in Postgresql

Steven NgDaily Debug BlogLeave a Comment

generate_series

A neat feature in Postgresql is the generate_series function. generate_series is classified as a “Set Returning Function”, which in plain English means that it returns a bunch of rows. How it works is very similar to a for..next loop. You basically set up a start and stop point, and optionally add a step interval.

The syntax for generate_series is generate_series(start, stop, [step_interval])

Let’s say you wanted three rows, with values 1 to 3. To do this, you would use this:

select * from generate_series(1,3)

Which would return:

generate_series
1
2
3

Let’s try an example with a step interval. Let’s say you wanted alternating numbers between 1 and 10.

select * from generate_series(1,10,2)
generate_series
1
3
4
7
9

By using a step interval of 2, it means that from the starting number (1), it will jump every other number (in this case, the even numbers) until it reaches the end number. The start value always appears, but depending on the step interval, the end value may not.

While the last two examples are using numbers, the generate_series function also works with dates.

Let’s try generating rows for every month in 2019:

select * from generate_series( ‘2019-01-01’::timestamp, ‘2019-12-31’::timestamp, ‘1 month’::interval)

generate_series
2019-01-01 00:00:00
2019-02-01 00:00:00
2019-03-01 00:00:00
2019-04-01 00:00:00
2019-05-01 00:00:00
2019-06-01 00:00:00
2019-07-01 00:00:00
2019-08-01 00:00:00
2019-09-01 00:00:00
2019-10-01 00:00:00
2019-11-01 00:00:00
2019-12-01 00:00:00

Now that is pretty cool. But you might be asking, how can this be useful?

Well, if you’re building a data warehouse, you can quickly create a date dimension for any date range. This is similar to Vince’s post on how to generate a Date Dimension Table with Pentaho, except that it is written in native Postgresql SQL.

Creating a Date Dimension with generate_series

To create a date dimension, all you need to do is use generate_series for a given start and end date in a subquery. For example:

SELECT
  to_char(dimension_date, 'YYYYMMDD')::integer as date_id,
  dimension_date,
  to_char(dimension_date, 'YYYY')::integer as year,
  to_char(dimension_date, 'Q')::integer as quarter,
  to_char(dimension_date, 'MM')::integer as month,
  to_char(dimension_date, 'DD')::integer as day,
  to_char(dimension_date, 'Q')::integer * 10000 + to_char(dimension_date, 'MMDD')::integer as quarter_month_day,
  to_char(dimension_date, 'MMDD')::integer as month_day,
  to_char(dimension_date, 'Month') as month_name,
  to_char(dimension_date, 'Mon') as month_abbreviation,
  to_char(dimension_date, 'Day') as weekday_name,
  to_char(dimension_date, 'Dy') as weekday_abbreviation,
  to_char(dimension_date, 'W')::integer as week_in_month,
  to_char(dimension_date, 'WW')::integer as week_in_year,
  to_char(dimension_date, 'D')::integer as day_in_week,
  (dimension_date + 1) - to_date((date_part('year', dimension_date)::integer * 10000 + 101)::char(8),'YYYYMMDD') as day_in_year,
  case when to_char(dimension_date - INTERVAL '1 day','D')::integer < 6 then 1 else 0 end as is_working_day,
  to_char(dimension_date, 'YYYY')::integer - to_char(dimension_date - (day_in_dimension - 1), 'YYYY')::integer + 1 as year_in_dimension,
  (to_char(dimension_date, 'YYYY')::integer - to_char(dimension_date - (day_in_dimension - 1), 'YYYY')::integer) * 12 + to_char(dimension_date, 'MM')::integer as month_in_dimension,
  day_in_dimension,
  to_char(dimension_date, 'IYYY')::integer as iso_year,
  to_char(dimension_date, 'IW')::integer as iso_week_in_year,
  (to_char(dimension_date, 'IW')::integer - 1) * 7 + to_char(dimension_date - INTERVAL '1 day','D')::integer as iso_day_in_year,
  to_char(dimension_date - INTERVAL '1 day','D')::integer iso_day_in_week
FROM
  (
    SELECT
      date_trunc('day', generated_date):: date as dimension_date,
      (row_number() OVER ())::integer as day_in_dimension
    FROM
      generate_series( '2010-01-01'::timestamp, '2030-12-31'::timestamp, '1 day'::interval) generated_date
  ) time_series

The SQL above creates a query for a sample date dimension. Note in the subquery, I’ve also added a (row_number() OVER ())::integer column. This is simply a counter, which gives you a serial integer for every row in the subquery that I use for the “in dimension” column. I’ll explain that in my list of all the columns produced by the date dimension.

In my main query, I basically perform simple calculations on the date to derive all of my columns. Feel free to add to or change the calculations to suit your own use case.

List of Date Dimension Columns

ColumnUsage
date_idThe date expressed as an integer in YYYYMMDD format. This can be used as the primary key for the table.
dimension_dateThe date as a date type.
yearThe year
quarterThe quarter
monthThe month
dayThe day of the month
quarter_month_dayThe date expressed as an integer in QMMDD format. This is here to make multi-year quarter-to-date comparisons easier in your where clauses.
month_dayThe date expressed as in integer in MDD format. This is here to make your multi-year year-to-date or month-to-date comparisons easier in your where clauses.
month_nameThe full name of the month (e.g., “February”)
month_abbreviationThe three letter abbreviation of the month (e.g., “Feb”)
weekday_nameThe full name of the weekday (e.g., “Tuesday”)
weekday_abbreviationThe three letter abbreviation of the weekday (e.g., “Tue”)
week_in_monthThe week number in the month
week_in_yearThe week number in the year
day_in_weekThe day in the current week. Sunday = 1
day_in_yearThe day in the year
is_working_day1 if the day is a weekday, 0 if the day is on the weekend
year_in_dimensionThe ordinal year in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
month_in_dimensionThe ordinal calendar month in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
day_in_dimensionThe ordinal day in the dimension based on the start date. This is here for reference but can be handy for quick calculations between dates.
iso_yearISO 8601 year. This may not be in sync with traditional calendar years.
iso_week_in_yearThe week number in the ISO 8601 year.
iso_day_in_yearThe day number in the ISO 8601 year.
iso_day_in_weekThe ISO day in the current week. Monday = 1

You’ll notice that I have a bunch of seemingly pointless columns, like the quarter_month_day, month_day and “in dimension” columns. I put columns like that in to make my SQL where statements cleaner, or to do integer math instead of date math.

Let me give you an example. Let’s say today is April 10, 2019 and I want to do a year to date for this year and last year.

My where statement condition would be:

select 
  * from some_table left join 
  date_dimension 
  on 
  some_table.date_id = date_dimension.date_id 
where
  date_dimension.year in (2018, 2019)
  and 
  date_dimension.month_day <= 410

That’s a lot simpler than:

select 
  * 
from 
  some_table left join 
  date_dimension 
  on some_table.date_id = date_dimension.date_id 
where 
  (date_dimension.date_id between 20180101 and 20180410)
or
  (date_dimension.date_id between 20190101 and 20190410)

Depending on the types of reporting you do, you may find those columns useless, so it might make sense to exclude them from your implementation of this date dimension.

Because you’re probably going to modify the SQL to your situation, I didn’t include a CREATE TABLE statement to accompany the date dimension SQL. It’s also because you may actually choose to implement it without a table.

You can also use the code to create a view (materialized or not) or even use it as a subquery. Generally speaking, I wouldn’t go with those two options, but I don’t know your data situation, so go with the solution that works best for your particular situation.

 

 

Quick Hit: Microsoft Edge with Chromium is Now Ready to Preview

Steven NgDaily Debug BlogLeave a Comment

If you haven’t heard the news, Microsoft is moving the engine behind the Edge browser to Chromium, the engine behind Chrome. You can now try it out, as the preview program has gone live.

So you might be wondering how this will impact you. I don’t have an answer for that, but I can tell you how it will impact me.

  1. Assign It To Me, while capable of running on Edge, always had issues with performance when running in Edge compared to Chrome and Firefox. With the adoption of Chromium (which we use as a baseline for our development and testing) will make Assign It To Me much more performant for our users.
  2. If you’re a Windows laptop user, Microsoft’s own browsers tend to be better on battery consumption than other browsers. Chrome, has always been for lack of a better word, pig, on Windows. It eats up CPU cycles like you don’t believe, and of course, battery. Even with those flaws, I still can’t abandon Chrome for my own browsing use. I’m hoping this new incarnation of Edge will change that.
  3. If you’re a Cognos+Microsoft shop, you’ll no longer be forced to use IE to access Cognos. Cognos’s more recent user interfaces are better suited for Chromium/Webkit rendering engines, and now you’ll have a Cognos friendly Microsoft browser supporting modern web standards that is IT approved.

 

Quick Hit: Public Sans, A Free Font That’s Great for Reporting

Steven NgDaily Debug BlogLeave a Comment

The United States Web Design System released a free font called Public Sans. It’s available in multiple weights, but it has a few things going for it that make it a solid choice :

  • It’s a very nice sans serif font that is very legible
  • It has tabular figures, which are great for displaying numbers in charts and tables
  • It’s free and open source – the source is on Github, and you can report bugs and issues with the font (and from the looks of it, you’ll even get a prompt response)
  • With exception of the capital ‘O’ and zero ‘0’, I don’t believe that there are any visually ambiguous characters

If you’re having trouble finding a good font for your reporting system, give Public Sans a shot.

Quick Hit: PostgreSQL performance on Raspberry Pi

Steven NgDaily Debug BlogLeave a Comment

I’m sure they’re answering a question that almost nobody is asking, but Rustproof Labs did some benchmarks of Postgresql running on Raspberry Pi.

I don’t think I’d recommend that anyone run a relational database on a Pi, especially using SD as storage. But having said that, a lot of businesses with modest reporting needs could actually use a Pi as a database server.

Heck, for Assign It To Us, a Pi running Postgresql is more than enough for the data volume we need for our internal reporting (although I’d definitely want to use something other than an SD card for storage).

Practically speaking though, a lot of small businesses could probably get away with an inexpensive NUC for their business intelligence/data warehousing needs.