Tableau and R Data Visualization

This text covers Tableau and R Data Visualization.

Tableau and R Data Visualization

Dr. Philip E Cannata

Oracle Data Scientist, Oracle Certified Professional, Adjunct Professor at the University of Texas Computer Science Department in Austin, Author, and Instructor for General Assembly

Chapter 1

Objective [top]

​The Objective of this book is to communicate a Methodology for systematically analyzing data using Tableau’s visualization techniques, and how to translate the Tableau visualization techniques into R programs running in R Markdown Interactive Documents. Fundamental and Advanced Tableau concepts will be presented and the reader will be introduced to many, very useful visualizations along the way.

This book contains detailed, step-by-step instructions for reproducing each visualization in Tableau and R.

Methodology [top]

​The Methodology developed in this book for Exploring and Understanding data in a dataset is simple and can be summarized in the following five steps:

1. Perform Non-Aggregated Measures Analysis using Boxplots by starting from one non-aggregated green thing *.

2. Perform Aggregated Measures Analysis  using Histograms by starting from one aggregated green thing.

3. Perform Aggregated Measures Analysis  using Bar Charts by starting from one blue thing and one aggregated green thing.

4. Perform Non-Aggregated Measures Analysis using Scatter Plots  by starting from two non-aggregated green things.

5. Perform Aggregated Measures Analysis using Crosstabs  by starting from two blue things and one aggregated green thing.

Each step in this Methodology will occupy one Chapter in this book (i.e., Chapters 2 - 6). Chapters 7 and 8 contains a variety of advanced topics. Chapters 1-6 will also cover additional relevant topics.

* The notion of green things and blue things comes from Tableau terminology where a green things are continuous and blue things are discrete. For more information on blue and green things in Tableau, see the article at this link

Data Science Pipeline [top]

This book addresses all of the boxes in the Data Science Pipeline shown below but mainly focuses on the Visualization box.

For this book, this diagram depicts data entering the pipeline via the Input box in the upper left-hand corner of the diagram. The data is Cleaned, Exported, and then Loaded into data.world. These steps correspond to the well know Extract-Transform-Load (ETL) process. Data is then Input into either Tableau or R, optionally Reformatted, and Transformed, and then Visualized. This process is repeated as many times as necessary to gain a thorough Understanding of the data and to hopefully find some "Interesting" things about the data. The results of this process are then Communicated to interested parties.

The technology used in this book for each of these steps is:

• Clean - R, R Regular Expressions (for more details, see Chapter 8), and data.world functions
• Export - the tidyverse readr::write_csv() function
• Load - data.world ingest facilities, see the data.world Section below
• Input 2 -
• Reformat -
• Tableau functions
• R tidyr::gather() and tidyr::spread() function in the tidyverse tidyr package
• Transform -
• Tableau facilities like Calculated Fields, Table Calculations and Level of Detail Calculations
• the functions in the tidyverse dplyr package
• Visualize -
• Tableau
• the ggplot2 package in R
• Communicate -
• Tableau Dashboards and Stories
• R Markdown Interactive Documents
• Model - see "Regression Analysis, Clustering, and Forecasting" in Chapter 7

Tableau [top]

"Tableau is business intelligence software that helps people see and understand their data." [Tableau].

For a quick peek at Tableau, please review the following video:

This video can also be viewed at this link.

Tableau Public will be used in this book. The following video gives a quick overview of Tableau Public.

​This video can also be viewed at this link.

A tour of Tableau can be found at this link.

For a tour of the Tableau Environment, play the video at this link.

Here are some useful links for the Tableau Environment:

The Tableau Workspace

The Tableau workspace is the user's interface to Tableau. Here is an annotated view of the Tableau workspace

A. Workbooks and Sheets name. A workbook contains sheets. A sheet can be a worksheet, a dashboard, or a story.

B. Shelves and Cards - Drag fields to the cards and shelves in the workspace to add data to your view.

C. Toolbar - Use the toolbar to access commands and analysis and navigation tools.

D. Parts of the View - This is the workspace where you create your data visualizations.

E. Goes to the start page. For more information, see Start Page.

F. The Side Bar - The side bar provides two panes: the Data pane and the Analytics pane. For more information, see The Side Bar.

G. Go to the data source page. For more information, see Data Source Page.

H. Status Bar - Displays information about the current view.

I. Sheet tabs - Tabs represent each sheet in your workbook. This can include worksheets, dashboards and stories. For more information, see Sheets.

Build-It-Yourself Exercises

Tableau provides a set of "Build-It-Yourself" exercises listed below, which show how to use various Tableau features. Reviewing these exercises may help as a preview of what is to come in this book.

R and RStudio [top]

R is the most popular free software environment for statistical computing and graphics. ggplot2 is a data visualization package for R that can be used to produce publication-quality graphics. This book will show readers how to use R and ggplot2 to produce production-quality data visualizations.

​To make using R easy, RStudio was used for developing all of the R code in this book. "RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management." Click here to see more RStudio features.

RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux) or in a browser connected to RStudio Server or RStudio Server Pro (Debian/Ubuntu, RedHat/CentOS, and SUSE Linux)." [RStudio]

For more details on RStudio, click  here to see a video.

This video can also be viewed at this link.

Interactive Documents using R Markdown [top]

R Markdown is a high level language that can be used to generate HTML documents with embedded R code. When the HTML is generated, the R code is executed and the output from the execution is added to the HTML document.

The following video gives a quick overview of  R Markdown:

​This video can also be viewed at this link.

R Markdown has recently been extended to allow for the creation of  Interactive Documents. Just like the name implies,  Interactive Documents are HTML documents that allow the read to do things like choose items from a select list and see the results of this selection dynamically in the document.

R Markdown uses Shiny to produce Interactive Documents.

data.world [top]

data.world is cloud platform "designed for data and the people who work with data. From professional projects to open data, data.world helps you host and share your data, collaborate with your team, and capture context and conclusions as you work" [data.world]. data.world is a young startup based in Austin Texas.

The founders of data.world

​​This video can also be viewed at this link.

All of the data used in this book is stored at data.world. To follow along with the examples in the book, readers should sign up for a data.world account here.

To learn about the "Open data community", click on the "How data.world works" link at this link.

Datasets

After creating an account, you can experiment with data.world by creating a new dataset. To do this, you can drag a CSV file(s) (or other supported files) to the blue area shown in the image below,

The following video gives more details for creating a data.world Dataset:

​​​This video can also be viewed at this link.

Projects

Datasets can be combined, and "Insights" into your data can be added in a data.world Project.

An example of a data.world Project can be found at this link. Here's how the workspace for this project looks:

In the upper left hand corner, the Projects files are the data files contained directly in the project, such as "CoffeeChain.csv", and other Dataset, which are linked into the project such as "bryon/U.S.Zipcodes". Collecting all of this data into a Project makes it easy to write queries using your data. Details for building a query at data.world can be found in this video. Details about building more advanced queries in data.world can be found in this video.

Index of Topics by Chapter [top]

Chapter 2

• ​Main Topic - Exploring Data with Boxplots (One Non-Aggregated Green Thing)
• ​Other Topics
• Pages with Animation, Animation History; Date Levels; and CTRL-drag (Command-drag on a Mac) to quickly duplicate fields in Tableau

Chapter 4

• ​Main Topic - Exploring Data with Histograms (One Aggregated Green Thing)
• ​Other Topics
• Analytics Tab, Formatting, Changing the Aggregation Function, and Dual Axis Plots in Tableau
• The tidyverse dplyr Package, and Reference Lines in R

Chapter 5

• ​Main Topic - Exploring Data with Bar Charts (A Blue Thing and an Aggregated Green Thing)
• ​Other Topics
• Table Calculations, Sets, Packed Bubble Charts, and Treemap Charts in Tableau
• Table Calculations in R

Chapter 6

• ​Main Topic - Exploring Data with Scatter Plots (Two Non-Aggregated Green Things)
• ​Other Topics
• Maps with Value Corrections, Actions, Dashboards, ANOVA Models and Stories in Tableau
• Maps in R

Chapter 7

• ​Level of Detail Calculations
• Statistical Learning - Regression Analysis, Groups, Clustering, and Forecasting
• More Charts

Chapter 8

• Regulare Expressions
• Data Cleaning
• Dates and Times with Lubridate