Wednesday 4 September 2019

Using Level of Detail (LOD) to Determine Completeness


Recently a colleague and myself found ourselves in a situation where we needed to produce a report on whether something was 'complete' or not. But oh wait, there's a bit more to that than a simple TRUE/FALSE flag.

Take phone specs for example, I found a data source that lists approximately 30 different phone specification categories. Imagine if this information needs to be 'complete before we can show it to people, we'll need to know where the gap is in order to know where to put our efforts.In plain language: 'what's complete and what still missing'?

Ok, so let's say I go ask my leader, out of the 30 various specs that each phone has, which ones must be complete? The "mandatory fields" so to speak. And this is his/her requirements:
  1. Year must not be null/blank
  2. Out of the 2G, 3G, and 4G fields, we need at least 1 of them to be filled in
  3. Weight must not be null/blank
  4. OS must not be null/blank
  5. RAM must not be null/blank
  6. Primary camera must not be null/blank
Also, in addition to the specific phones that are incomplete, they also wants to know the % of "brands" and "models" that's complete. A sort of status report so to speak. In essence, they wants a report that gives the following information:
  • Which brands/models are complete vs incomplete? (if one phone within a brand is incomplete, then mark the entire brand incomplete.)
  • Click on brand or model to see the phone within each grouping
  • Which phones are complete vs incomplete? (if one field within a phone is incomplete, then mark the entire phone incomplete.)
  • Click on phone to see which mandatory fields are complete vs incomplete

First, for all the fields that cannot be null/blank, I quickly made calculated fields for each of them:

     IF ISNULL([Field])

     THEN "Incomplete"
     ELSE "Complete"
     END



But for the bandwidth, only 1 out of 3 fields need to be filled in. So the calculated field is slightly different:

     IF ISNULL([2G bands])
     THEN 
         (IF ISNULL([3G bands])
         THEN 
             (IF ISNULL([4G bands])
             THEN "Incomplete"
             ELSE "Complete"
             END)
         ELSE "Complete"
         END)
     ELSE "Complete"
     END


Now I need to report on this completeness based on the following 3 dimensions:
  • Per phone
  • Per brand
  • Per model
For phone, that's easy enough, all I have to do is create a calculated field based on each row:

     IF [C. Bandwidth]"Complete"
     THEN
         (IF [C. OS]"Complete"
         THEN
             (IF [C. PrimeCam]"Complete"
             THEN
                 (IF [C. RAM]"Complete"
                 THEN
                     (IF [C. Weight]"Complete"
                     THEN
                         (IF [C. Year]"Complete"
                         THEN "Complete"
                         ELSE "Incomplete"
                         END)
                     ELSE "Incomplete"
                     END)
                 ELSE "Incomplete"
                 END)
             ELSE "Incomplete"
             END)
         ELSE "Incomplete"
         END)
     ELSE "Incomplete"
     END


This essentially looks at all the columns and if one of the column is 'incomplete' then mark the entire record as incomplete. To me, on a visual level, it looks like this:


I'm using the information between Category 1 - 4 to determine what the overall flag should be for each row/phone.

But when it comes to Model or Brand, we have to look at a number of rows, and if out of that grouping of rows, one of them is 'incomplete', then mark that entire grouping of rows incomplete. Basically like this:


In this instance, I'm looking at each of the categories as well as the brand. as long as one of the overall flag for a brand is 'incomplete', flag all the rows in that particular brand 'incomplete'.

To do that, you could use the LOD function in Tableau. However, that only works on measures. How we did it, was to flag each row's complete/incomplete indication using numbers, then convert those numbers back into a string.

First, create a calculated field that turns the result per phone into a measurable number:

     IF [C. Bandwidth] = "Complete"
     THEN
         (IF [C. OS] = "Complete"
         THEN
             (IF [C. PrimeCam] = "Complete"
             THEN
                 (IF [C. RAM] = "Complete"
                 THEN
                     (IF [C. Weight] = "Complete"
                     THEN
                         (IF [C. Year] = "Complete"
                         THEN 0
                         ELSE 1
                         END)
                     ELSE 1
                     END)
                 ELSE 1
                 END)
             ELSE 1
             END)
         ELSE 1
         END)
     ELSE 1
     END


Now we can add them up. If all the records in the given brand are complete, the sum of all of them should still be 0. But if any of the rows are incomplete, then the sum would be greater than 0.

Create an LOD calculated field based on that measure:

IF
{FIXED [Brand]:SUM([C. Phone (measure)])} = 0
THEN "Complete"
ELSE "Incomplete"
END


Apply this to Model, and now we can now look at this set of data in the way that was required (dashboard embedded). Hopefully this was helpful. Let me know what other ways have you used LODs before in the comments below.

Monday 17 June 2019

Alphabet Browsing (Filter) Hack


Recently I've been requested to created a dimension-heavy dashboard and provide various ways for users to search for information. Other than the obvious wildcard search boxes and single/multi select filters, I created an alphabet that allowed users to click on and quickly navigate to a specific alphabet letter.

Using the WWF endangered species data as an example, at first, I simply used the LEFT function, created a calculated field with the first letter of each 'common name' and set them as the filter, but what happens is that the letters that does not correspond with a 'common name' (in this case: K, Q, U, X, and Z) simply didn't show up.



There are a few downsides to this, one, it is simply a filter and not an alphabet browser, two, if more data gets added in the future, the width of the alphabets will increase, hence making it hard for me to determine the final width I should aim for. In my real life scenario, I didn't have enough space to put all the alphabets in a single row, and had to find a way to cut them into two 13 letter rows. The goal was to create a filter with a full set of alphabet letters, and color the letters that has data a certain color, and the ones that don't another color.

After some thought, the solution came to me. What I ended up doing was creating a separate data source (using Excel) that contains all the letters in the alphabet, then blended the data source together to determine which letters has a 'common name' (teal) and which doesn't (red).


Details:
  • Drag the Alphabets pill from the List of Alphabets data source to Columns and Text
  • Drag the Alphabets pill from the Main data source to Color
  • Color null red, color all other letters teal
If you are working with your final data source, this is pretty much all you have to do. However, in my case, I had to develop the dashboard using test data from a test server, then another team actually moves my dashboard from the test server to UAT, then finally connecting to production. The problem with this, is that when a new letter appears (from null), Tableau will automatically pick a colour from the chosen colour palette, and 9 out of 10 times it won't be teal.

Not knowing which letters will be Null and which letter will be populated, this is what I did:

  • In test server, using existing data, colour null values one colour (red) and non-null values another colour (teal)
  • Use the user interface and manually created dummy data for each of the letters that were null. In this data set, it would be K, Q, U, X and Z.
  • Go back to tableau. You now see no nulls as every letter has a value. You should see that K, Q, U, X, and Z are all various colours. Colour them all teal.
  • Connect to final data source, and the colours updated accordingly. Since we coloured all the letters teal, whenever a letter has a value, it automatically colours it teal. And since we coloured null values red, whenever there isn't a null value, it colours it red!

The last step is to set the proper dashboard action to enable the alphabets to filter the results, and you're done!


Hopefully this helps you. It's just a cute little hack that make dashboard that much more user friendly.

Wednesday 10 October 2018

Data Visualization & Data Design Talks

Great talks on data visualization and information design you should watch.








More Hans Rosling on TED: https://www.ted.com/search?q=rosling

Thursday 21 December 2017

Universal Search Bar and URL Action


How to navigate around this dashboard:

  1. Type the name of a director or actor/actress and/or type a plot keyword, hit enter
  2. Click on any movie in the table under the search bars (hover over the menu icon "..." for more info)
  3. The movie title of your selection should show up in the rectangular box (large font), hover over it (also other information about the movie of your choice should be displayed)
  4. Your movie, if in the right format, should show up in the Wikipedia page underneath (more about this below).


This was a super fun dashboard to make. I had already decided that I was going to make a dashboard on universal search bars and just had to find the right data set (this part of the work always takes the longest). Luckily I came across this movies data set on Kaggle. After massaging the data a bit, I was able to make the dashboard I had envisioned.


Universal Search Bar

The universal search bar isn't actually "universal", but instead, it allows users to search across a defined number of columns. In the case of a movies data set, 2 types of searches stood out for me:

  • People, and
  • Keywords

In this data set, there are 4 people columns (director, actors 1-3) and 5 plot keyword columns. I will walk through setting up the universal search bar using the people columns.

First, set up a parameter; I called mine "People Search Parameter". Set to "string" and delete the current value.




Then set up a Boolean calculated field, which I called "Universal People Search Filter". Syntax is as follows:

CONTAINS(LOWER([First People Field]),LOWER([People Search Parameter]))
OR
CONTAINS(LOWER([Second People Field]),LOWER([People Search Parameter]))
OR
CONTAINS(LOWER([Nth People Field]),LOWER([People Search Parameter]))




Drag this calculated filed to filter and select only "TRUE", and away we go. Next time anyone types a name into the parameter, anything in these columns that contain the text will show up.


URL Action

To get your URL action to work within your dashboard, simply drag a web page object onto your dashboard. Don't worry too much about the URL that you input here.




Then create a dashboard action by clicking on "Dashboard" → "Actions..." → "Add Action >" → "URL"





I knew I wanted to link to Wikipedia, and needed to know how the links looked before I continued on. So, I first opened a browser and did a test search for a film in Wikipedia, and copied the URL using the same format in the URL field. The format of most recent movies were as follows:

https://en.wikipedia.org/wiki/<Movie Title>_(<Title Year> film)

But the design of a regular webpage was too wide for my tiny blog-sized dashboard. So I went the mobile route:

https://en.m.wikipedia.org/wiki/<Movie Title>_(<Title Year> film)

An important thing to note is that this format is locked down. If any movies were listed in Wikipedia under a different format, the URL action will result in a page error. 





For instance, the web page for the movie "2 Fast 2 Furious" on Wikipedia is actually:

https://en.m.wikipedia.org/wiki/2_Fast_2_Furious

and not

https://en.m.wikipedia.org/wiki/2 Fast 2 Furious_(2003 film)




Therefore, the URL action did not work on the dashboard. This is something to keep in mind when choosing web pages you want to link to. Web sites that has consistent naming conventions will generally work better, and maximize the success of your URL actions.


A good one is a simple Google search URL, the generic URL is as follows:

https://www.google.ca/search?q=<search term 1>+<search term 2>+<search term n>

In addition to the fields that you have in your data set, you can add other search terms in combination with your fields to better direct your search results, such as:

https://www.google.ca/search?q=<Movie Title>+film


Now that you have your universal search bar and your URL action set up, the rest is just some creative formatting. I hope these tips help you with your next task, and that this dashboard shows how even with a tiny 600 X 800 dashboard, you can still pack a lot in. Keep on playing!

Wednesday 9 August 2017

Using HTML as Your Datasource

I'm not a programmer and I don't have an IT background, so when I heard of this method of connection, I was super stoked to try it out!

In a nutshell, you would structure your data connections this way:
HTML to Google Sheets to Tableau

Selecting Your Source - The Web Page

Obviously it would be best to connect to a web page that updates over time. Meaning, the same URL will hold data that updates and thus update your data source.

I am working on a workbook related to the Canadian Federal budget and how they spend our hard earn tax dollars, so I connected my Google Sheets to the Authorities and Expenditures by Program page.



This is, however, not a good example as the government posts by groupings of years. It is shown in the URL and thus I don't believe this particular page will be updated with new information but rather, the new information will be published onto a new page that has a new URL altogether. Having the year in the URL kind of gave it away.


A better web page to connect to would be, for instance, like a Wikipedia page where tables are updated without the URL changing.


The Formula & The Table

In Google Sheets, use the following syntax:


= IMPORTHTML(url, query, index)

The URL is easy, simply copy and paste.
The query is "table" or "list depending on what type of structure contains the desired data.
The index, however, may take a little bit of work.

Open up HTML viewer and look for all the "tables" until you find the table in question. For instance, if your table is the 3rd out of 4 tables, you would put "3" in the formula for index. The federal expenditure web page only had 1 table so I put "1" for my formula.



*Open HTML in Google Chrome by going to Settings à More Tools à Developer Tools
  Chrome: https://developers.google.com/web/tools/chrome-devtools/
  IE: https://msdn.microsoft.com/en-us/library/dd565627(v=vs.85).aspx#htmltool
  Firefox: https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Examine_and_edit_HTML

In the end, my full formula looked like this:


=IMPORTHTML("https://www.canada.ca/en/treasury-board-secretariat/services/planned-government-spending/budgets-expenditures/expenditures-program-2016.html","table",1)

You'll see the formula load for a few seconds, and then the table should load.




Connect to Tableau

Open Tableau and connect to Google Sheets. A pop up window will prompt you to sign into your google account, then it should connect to Google Drive automatically. You can also copy and paste the URL of your Google Sheet directly in the search bar.


Canadian Federal Fiscal Budget by Type

Wednesday 12 April 2017

Mapping & Polygons

Tableau makes mapping pretty easy. The most common ones I've seen are:
  1. Proportional symbol maps - ex. coloured dots on a map based on sales/profit 
  2. Choropleth (or filled) maps - ex. coloured states based on population
See these maps and other types of maps on the Tableau online help page "Mapping in Tableau".

Tableau has most countries, states / provinces, and cities built in already (more info for the U.S. compared to other countries) so all you need to do is double click on the field and it should populate on the map. However, it's a bit of a different story when it comes to polygons, and I've been itching to try it out.

It's important to clarify, that Tableau is not a GIS tool. Although it has mapping capabilities, it can't to spatial analysis (like calculate area size or create spatial relationships). There are also limitations when it comes to drawing points with lines and polygons, though there are some workarounds. A topic for another day.


Converters

I wanted to work with Calgary communities but first, I had to find the data. I thought the City of Calgary's polygons must be super accurate, so I went to this page and got myself some community boundaries. There were 7 formats available for download, and I started with the CSV one. But I soon realized that it's not a Tableau-consumable format, so my next task was to search for a way to get any of these 7 formats into a workable format.



Through chatting with some folks at work, I discovered that there are online converters that does it for you. I'm sure there are more, but the 2 that I use are:


Both worked with the community boundaries shapefile I downloaded from the City of Calgary Open Data web site, which consisted of 4 files (.dbf, .prj, .shp, .shx). But when I was working with another project for work, the file I used had 3 sets of shapefiles zipped together (21 files of .dbf, .prj, .sbn, .sbx, .shp, .shp.xml, .shx per file). For this bundle, the Safe converter converted them into 3 neatly packaged tde.'s just fine but the Alteryx converter wasn't able to pick them up until I separated them into 3 zipped files.


Tableau Polygon 101

To draw polygons in Tableau, you need to have at least the following fields:
  1. Longitude
  2. Latitude
  3. ID (sometimes called Name)
  4. Order (sometimes called Path)
After the converters gave me the Tableau-consumable files, I double clicked on longitude and latitude and got a single point on the map:


I then dragged "Name" to Detail, and got all the points that has a name. Which, in this case, are all the Calgary communities:


Because I wanted to draw polygons, I needed to tell Tableau the order in which the points should be connected. This is determined by the "Path", which appears in the Marks menu when I changed the graph type to Polygon. Without the order the points should be connected by, the graph defaults to this:


Drag "Path" to Path, and the communities appeared as intended:


I connected the map with the 2016 Calgary census data and the Calgary secondary suites data, did some more formatting, and this is what I got:


This map shows me which communities has more/less secondary suites, but I also want to be able to zoom into the specific communities and see the exact location of the suites. So I made a second graph using the secondary suite's longitude and latitude and some basic information I find useful (ie. address).

Here's the final result:



*Not sure if you noticed, but some communities (like Bridgeland/Riverside) don't have census information. This is because I used "Community Name" as the link between each data sources, so whatever community name was not an exact match will not have information about them.


BONUS - Dashboard Action: Hyperlinks

I added a "google search" hyperlink action in the Secondary Suite Location / Address graph so I can quickly look up information about the community I'm looking at.

This is done through dashboard actions → add action (URL).Simply type in any search term in google, then replace the term with the field in your dashboard that you want google to search by (ie. "community").