Postgresql Tip: A ranked list that includes some values irrespective of their actual rank

Steven NgDaily Debug BlogLeave a Comment

The Problem

You want to create a ranked list (like a top 10 list) that always includes one or more values.

What do I mean by this? The best example is during the Olympic Games. You’ll often see the top 10 countries with a medal count.

Let’s say you’re reading that list from the Toronto Star, and on that particular day, Canada is not in the top 10 for medal counts. Because the Toronto Star’s readership is mostly Canadian, they always want to include Canada in that list. Which is why you’ll generally still see a line for Canada as the 11th row, showing whatever ranking value Canada has with respect to medal counts.

For my particular use case, I’m looking at geographic origins for sessions on our web site.  For my ranking list, I always want Canada and the United States to show up on top 10 lists, irrespective of their actual rankings.

The Solution

The solution is pretty straightforward. My Google Analytics geographic data is stored in a table, and I want to rank hits by country for my entire data set. Before aggregation, it looks like this:

First, I need to create a query to rank all of the sessions by country.

select
  country,
  sum(sessions) as sessions,
  rank() over (order by sum(sessions) desc)
from
  fact.ga_geographic_visitors
group by country

That narrowed my output to something like this:

On a normal list, all I need to do is filter down the rank to be <=10 (you don’t necesarily want to use limit 10 because the rank() function in Postgresql spots ties).

Because my sample data includes the US and Canada in the top 10, I’m going to have to change my rules to better illustrate my example. Let’s force China and Mexico into my top ten list, since they fall at 13 and 17, respectively.

All I need to do is make my previous query a subquery and then add a couple of filters – one to limit the rank to 10 or below, and another to ensure that Mexico and China are always included (using an or boolean operator).

select * from
  (
    select
      country,
      sum(sessions) as sessions,
      rank() over (order by sum(sessions) desc)
    from
      fact.ga_geographic_visitors
    group by country
  ) ranked_query
  where
  ranked_query.rank <= 10
  or
  ranked_query.country in ('Mexico','China')

And that yields the following result:

And there you have it! If China and Mexico happen to be in the top 10, then only the values with a rank of 10 or lower will appear in the list (the number of actual items in the list will vary depending on whether there are any ties) . If China and Mexico are below the threshold, they’ll appear no matter what.

 

Pro Tip: Juggling remote desktop windows? Avoid destructive actions in the wrong window with this easy trick.

Steven NgDaily Debug BlogLeave a Comment

Are you a multitasker who juggles multiple Remote Desktop windows at the same time? It’s easy to fall into the trap of performing the wrong action in the wrong window when all of your windows look the same. It happens to the best of us.

So what’s an easy way to differentiate between the different remote desktop screens?

If your remote machines are using a modern version of Windows, you can use task bar and window title bar background colors to differentiate between them. A desktop background color can help too, but it’s not very useful if you maximize your windows.

  1. Go into Settings > Personalization > Colors
  2. Pick a color in the Windows colors section
  3. Turn off Transparency Effects (otherwise the color is less prominent)
  4. Make sure Start, taskbar and action center is checked
  5. Make sure Title bars and window borders is checked

So for servers you shouldn’t make changes to willy-nilly, make the background red. For servers you need to be cautious on, make the background gold or yellow. Setting the colors won’t prevent errors all of the time, but they’ll sure help.

Javascript Regex: How to remove all instances of tags that only appear within a specific tag

Steven NgDaily Debug BlogLeave a Comment

One of the things I’ve been working on for Extrata is a service that can parse out a Cognos Framework Model. While a Cognos Framework model is simply an XML file, its highly nested structure can make it hard to generate readable documentation out of it.

Before I prepare the model to be output more simply into a tabular format, I sanitize the XML to make the conversion to JSON easier as it specifically relates to the Javascript XML to JSON conversion library I’m using. For the most part, it’s a straightforward process, because the XML structure of a Framework Model is also (thankfully) pretty straightforward.

The Problem

There was, however, one area that gave me a bit of a headache. In a Cognos Framework model, the <refobj> tag is usually a child of another element, with one exception. And that exception is annoying. When a <refobj> is in an <expression> tag, it’s not a child element, it’s actually part of the text value:

<expression>(<refobj>[Dimensional view].[Sales].[Revenue]</refobj> -<refobj>[Dimensional view].[Sales].[Product cost]</refobj> )/<refobj>[Dimensional view].[Sales].[Revenue]</refobj> </expression>

It looks like these <refobj> tags in the context of an expression are used for syntax highlighting in the Framework Manager UI. This usage of the tag confuses the XML library I’m using, the library will break the expression node out into a useless object full of child elements.

Because I don’t need those <refobj> tags in Extrata, I need to eliminate them without eliminating the legitimate <refobj> tags that are truly child elements of other tags.

Because you can have multiple <refobj> tags within a single <expression> node, you can’t use a simple regex for the replacement. You need a more advanced replacement function call.

When I was doing my proof of concept, I did a quick and dirty match to find all the expression nodes and then iterated through each of them to do the string replacement. This was inefficient and slow. On my test model, I had over 8,000 nodes, and it took a couple of minutes to iterate through them. My temporary solution was not a good long term solution.

While I knew there was a better way of doing it with regex, but I was more concerned about getting my proof of concept working first, as it was more important for me to determine if it was even worthwhile parsing out a Cognos Framework Model in the first place.

The Solution

A “common” regex replacement looks like this:

xml="<xml><expression>(<refobj>[Dimensional view].[Sales].[Revenue]</refobj> -<refobj>[Dimensional view].[Sales].[Product cost]</refobj> )/<refobj>[Dimensional view].[Sales].[Revenue]</refobj> </expression></xml>";
xml = xml.replace(/<\/?refobj>/g,"") // finds  all <refobj> and </refobj> tags from the xml and replaces it with an empty string

The structure of the command is:

xml.replace(/regex expression/g, replacementString)

But the expression gets a lot more complicated when you are trying to replace multiple elements between an open and close tag.

Fortunately, the replace function can accept a Javascript function in place of replacementString. This lets you do additional processing as your replacement string. So this is the function I used:

xml="<xml><expression>(<refobj>[Dimensional view].[Sales].[Revenue]</refobj> -<refobj>[Dimensional view].[Sales].[Product cost]</refobj> )/<refobj>[Dimensional view].[Sales].[Revenue]</refobj> </expression></xml>";
xml = xml.replace(/(<expression[\s\S]*?>)([\s\S]*?)(<\/expression>)/g, function(match, captureGroup1, captureGroup2, captureGroup3) {
  return captureGroup1 + captureGroup2.replace(/<.*?refobj>/g, '') + captureGroup3;
});

Let’s break it down. My first regular expression /(<expression[\s\S]*?>)([\s\S]*?)(<\/expression>)/can be visualized below (courtesy of Regulex, which is an awesome site):

As shown in the diagram above, my first regular expression has 3 capture groups:

  1. The open <expression> tag
  2. The text value in the <expression> node that includes <refobj> tags
  3. The close </expression> tag

If you’re not familiar with a capture group, it is basically a search term that converts its matched result into a variable.

When passing a function as your replacement string in a Javascript regex replace, the format of the command is:

function (match, captureGroup1, captureGroup2,... captureGroupN, offset, sourceString) { 
  return replacementString 
}

The first parameter is the found match. In our particular case, we need to put the parameter in, but we can ignore it.

The next parameters are the capture groups. You need one for every capture group in the regular

The last two parameters, offset and sourceString are not required. The offset is the position of the match, and the sourceString is the original string you are searching with the regex.

Our replacement string function simply does an additional regex replacement /<.*?refobj>/ that gets rid of all <refobject> tags in captureGroup2 and re-encloses it in the same <expression> open and close tags (captureGroup1 and captureGroup3, respectively).

After wiring in this replacement code in place of my janky iterating code, replacement time dropped down from minutes to a couple of seconds.

And that’s it! I hope you find this piece of code handy for your own projects.

Quick Hit: HockeyViz – Incredible Infographics for Hockey Fans

Steven NgDaily Debug BlogLeave a Comment

I stumbled on a game summary for a recent game involving the Toronto Maple Leafs, and was super impressed by the visualization used (see below).

I’m a much bigger fan of great visualizations than I am of hockey, so I felt that hockeyviz.com deserved a shoutout.

The chart is simple, easy to read and has a high degree of information density. Mathematician and freelance data visualizer Micah Blake McCurdy has done a fantastic job of creating these game summary infographics. In the example above, I especially love how he uses varying line thicknesses to show comparisons of how much time players spend with each other.

I don’t know how he manages to produce these for every game (I assume he’s figured out how to automate the process), but this is top notch work. Simply outstanding.

 

Quick Hit: Designing Complex Tables for the Web (Smashing Magazine)

Steven NgDaily Debug BlogLeave a Comment

For the data nerds out there, Smashing Magazine has a pretty good article on how to design complex tables for the web.

In this age of multi-sized screens, designing tables that can provide the right amount of information density is very hard.

I’ve been recently tweaking some reports for Assign It To Me, and deciding what to hide and show on small screens is no easy chore. When your users have diverse reporting needs for a project, you’re simply not going to get it right for everyone.