Home > Journalistic Visuals > Creating Interactive Charts with Flourish

Creating Interactive Charts with Flourish

Introduction

For this tutorial, we will be looking at the cumulative amount of COVID-19 cases for different countries over time. These data come from the European Centre for Disease Prevention and Control, which aggregates data from other national sources. (They even include some R code for how to pull those data automatically so your charts can automatically update. Do note that I appended some new variables to their dataset for this tutorial, so you should use my dataset below.)

By the end of the tutorial, we will have produced an animated bar chart of COVID-19 cases for 11 countries, just like the one found on this Boston Globe story. This is what that chart looks like this:

Your browser does not support this video.

For those who cannot wait, here’s what our final product will look like.

Data Processing

To create a data visualization, we need to provide tools like Flourish with data. CSV files are universally accepted by data visualization tools. We can use R to help us create such a file.

Loading the Data

The first step, as usual, is to read in the source data. Like before, we’ll use the readr::read_csv() function to read data from this CSV file.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.3     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter()     masks stats::filter()
## x dplyr::group_rows() masks kableExtra::group_rows()
## x dplyr::lag()        masks stats::lag()

corona_cases <- read_csv("https://dds.rodrigozamith.com/files/covid19_cases_by_country_20200326.csv")

## Rows: 6931 Columns: 13

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): countriesAndTerritories, geoId, countryCode
## dbl  (9): day, month, year, cases, deaths, popData2018, cumulativeCases, cum...
## date (1): dateRep

##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Here’s a sample of the data:

corona_cases %>%
  head(10)

dateRep	day	month	year	countriesAndTerritories	geoId	countryCode	popData2018	daysSince100
2019-12-31	31	12	2019	Afghanistan	AF	AFG	37172386	-1
2020-01-01	1	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-02	2	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-03	3	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-04	4	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-05	5	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-06	6	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-07	7	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-08	8	1	2020	Afghanistan	AF	AFG	37172386	-1
2020-01-09	9	1	2020	Afghanistan	AF	AFG	37172386	-1

Getting the Data We Need

The next thing we’ll want to do is extract only the information we need for creating the chart. While it can be harmless to include additional data, it can sometimes (a) confuse the software being used to create the chart; (b) make it unwieldy to select options using those software; and (c) exceed the dataset size limitations of the software, especially if you’re on a free tier.

In our case, we only need data from the following eleven countries: China, France, Germany, India, Iran, Italy, Japan, Singapore, South Korea, Spain and the United States. We also only want data after February 24, 2020.

We can thus use a combination of the dplyr::filter() and dplyr::select() functions.

corona_cases %>%
  filter(dateRep >= "2020-02-24" & countriesAndTerritories %in% c("China", "France", "Germany", "India", "Iran", "Italy", "Japan", "Singapore", "South Korea", "Spain", "United States of America")) %>%
  select(dateRep, countriesAndTerritories, cumulativeCases)

The %in% operator allows us to search for multiple strings within a single, simple filter() statement. It basically notes that the value for the variable name must appear in the vector we specify with the c() function. If there is a match with any element on that vector, the observation is included (filtered in). If there is no match, it is excluded (filtered out).

Here’s a sampling of the result:

dateRep	countriesAndTerritories	cumulativeCases
2020-02-24	China	77234
2020-02-25	China	77749
2020-02-26	China	78159
2020-02-27	China	78598
2020-02-28	China	78927
2020-02-29	China	79355

Cleaning Up the Data

A lot of the web-based tools have very limited functionality for cleaning up the data, so it is best to do that within R.

Here, I see two changes we need to make. First, we’ll want to abbreviate “United States of America” and then change our date format (using the lubridate package) to something that is easier for others to read. In both cases, we’ll use the mutate() function to change our variables.

First, let’s load the lubridate package.

library(lubridate)

##
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
##
##     date, intersect, setdiff, union

Make sure you have installed the lubridate package before loading it, or R will give you an error.

Then, let’s add the two mutate statements to our piped operation. (I am performing this as two separate operations for demonstrative purposes.)

corona_cases %>%
  filter(dateRep >= "2020-02-24" & countriesAndTerritories %in% c("China", "France", "Germany", "India", "Iran", "Italy", "Japan", "Singapore", "South Korea", "Spain", "United States of America")) %>%
  select(dateRep, countriesAndTerritories, cumulativeCases) %>%
  mutate(countriesAndTerritories=recode(countriesAndTerritories, "United States of America"="United States")) %>%
  mutate(dateRep=format(dateRep, format="%m/%d/%Y")) %>%

We are using the recode() function to shorten the country name for the United States, which will make our visualization look nicer. We use the format() function to change the way the date is presented to fit our cultural context, with %m being month, %d being day, and %Y being the year.

Here’s what our dataset now looks like. Notice the change in the date. (We would only see the change in the "United States" label by looking further down in the dataset.)

dateRep	countriesAndTerritories	cumulativeCases
02/24/2020	China	77234
02/25/2020	China	77749
02/26/2020	China	78159
02/27/2020	China	78598
02/28/2020	China	78927
02/29/2020	China	79355

Going from Long to Wide

Unlike ggplot, Flourish likes the data to be uploaded in wide format.

That is, each bar should be a separate row in the data, and each column represents a different data point for that bar. You’ll find that different tools have different expectations, which may require you to reshape your data to fit those expectations. It’s frustrating but, thankfully, easily solved with R.

We can quickly reshape our data with the dplyr::pivot_wider() function like so:

corona_cases %>%
  filter(dateRep >= "2020-02-24" & countriesAndTerritories %in% c("China", "France", "Germany", "India", "Iran", "Italy", "Japan", "Singapore", "South Korea", "Spain", "United States of America")) %>%
  select(dateRep, countriesAndTerritories, cumulativeCases) %>%
  mutate(countriesAndTerritories=recode(countriesAndTerritories, "United States of America"="United States")) %>%
  mutate(dateRep=format(dateRep, format="%m/%d/%Y")) %>%
  pivot_wider(id_col=countriesAndTerritories, names_from=dateRep, values_from=cumulativeCases)

What we’re doing with the pivot_wider() function is telling R to use the countriesAndTerritories variable to identify unique observations in our dataset (each year becomes a separate row), create variables (columns) based on the unique values in the dateRep variable, and populate the new cell values with the associated values stored in the cumulativeCases variable. Contrast this table with the one you saw above.

Here’s a sampling of our new data frame:

countriesAndTerritories	02/24/2020	02/25/2020	02/26/2020	02/27/2020	02/28/2020	02/29/2020	03/01/2020	03/02/2020	03/03/2020	03/04/2020	03/05/2020	03/06/2020	03/07/2020	03/08/2020	03/09/2020	03/10/2020	03/11/2020	03/12/2020	03/13/2020	03/14/2020	03/15/2020	03/16/2020	03/17/2020	03/18/2020	03/19/2020	03/20/2020	03/21/2020	03/22/2020	03/23/2020	03/24/2020	03/25/2020	03/26/2020
China	77234	77749	78159	78598	78927	79355	79929	80134	80261	80380	80497	80667	80768	80814	80859	80879	80908	80932	80954	80973	80995	81020	81130	81163	81238	81337	81416	81499	81649	81748	81847	81968
France	12	12	14	17	38	57	100	130	178	212	285	423	613	716	1126	1412	1784	2281	2876	3661	4499	5423	6633	7730	9134	10995	12612	14459	16018	19856	22302	25233
Germany	15	15	17	21	47	57	111	129	157	196	262	400	684	847	902	1139	1296	1567	2369	3062	3795	4838	6012	7156	8198	14138	18187	21463	24774	29212	31554	36508
India	3	3	3	3	3	3	3	3	5	6	28	29	31	34	NA	44	50	73	75	83	90	93	125	137	165	191	231	320	439	492	562	649
Iran	43	61	95	139	245	388	593	978	1501	2336	2922	3513	4747	5823	6566	7161	8042	9000	10075	11364	12729	13938	14991	16169	17361	18407	19644	20610	21638	23049	24811	27017
Italy	132	229	322	400	650	888	1128	1689	1835	2502	3089	3858	4636	5883	7375	9172	10149	12462	15113	17660	17750	23980	27980	31506	35713	41035	47021	53578	59138	63927	69176	74386
Japan	144	144	164	186	210	230	239	254	254	268	317	349	408	455	488	514	568	619	675	737	780	814	824	829	873	950	1007	1046	1089	1128	1193	1268
Singapore	89	90	91	93	96	98	102	106	108	110	112	117	130	138	150	160	166	178	187	200	214	226	243	266	313	345	385	432	455	509	558	568
South Korea	762	892	1146	1595	2022	2931	3526	4212	4812	5328	5766	6284	6767	7134	7382	7513	7755	7869	7979	8086	8162	8236	8320	8413	8565	8652	8799	8897	8961	9037	9137	9241
Spain	2	3	7	12	25	34	66	83	114	151	200	261	374	430	589	1204	1639	2140	3004	4231	5753	7753	9191	11178	13716	17147	19980	24926	28572	33089	39673	47610
United States	35	53	53	59	60	66	69	89	103	125	159	233	338	433	554	754	1025	1312	1663	2174	2951	3774	4661	6427	9415	14250	19624	26747	35206	46442	55231	69194

Producing a CSV File

If we want to get our data out of R, we’ll need to export it. The readr package (part of tidyverse) makes it easy for us to produce a properly formatted CSV file with its write_csv() function.

That function requires us to provide it just two arguments: the object (data frame) we’d like to export and the filename of the CSV file.

The CSV file will be saved in your working directory, unless you specify a different path.

corona_cases %>%
  filter(dateRep >= "2020-02-24" & countriesAndTerritories %in% c("China", "France", "Germany", "India", "Iran", "Italy", "Japan", "Singapore", "South Korea", "Spain", "United States of America")) %>%
  select(dateRep, countriesAndTerritories, cumulativeCases) %>%
  mutate(countriesAndTerritories=recode(countriesAndTerritories, "United States of America"="United States")) %>%
  mutate(dateRep=format(dateRep, format="%m/%d/%Y")) %>%
  pivot_wider(id_col=countriesAndTerritories, names_from=dateRep, values_from=cumulativeCases) %>%
  write_csv("covid_19_plot_data.csv")

Because we’re piping the information, the first argument (the data frame) is already filled in for us. Thus, we only need to specify the filename for where to save the data.

Our Final CSV File

You can download a copy of the CSV file we will be using below by clicking here.

Creating an Interactive Plot Online

A powerful tool for creating interactive data visualizations is Flourish. Flourish allows you to create a range of different charts and specify a lot of options—more than many of its competitors. Large newsrooms like The Boston Globe routinely make use of Flourish when creating data visualizations.

The first step is to create an account with Flourish. As is the case with many online visualization tools, Flourish provides you with a limited free tier and more feature-loaded paid tiers. The free tier will be good enough for our purposes.

After you create your account, you should be presented with a dashboard that looks like this:

Flourish’s dashboard interface

Click on the “New visualization” button to create a new chart. Then, since we want to create an animated line chart, select the “Simple” chart under the “Bar chart race” section.

Selecting the Bar chart race option in Flourish

A sample chart will appear, which should look similar to this:

Default options for the Bar chart race option in Flourish

First, let’s title our visualization by entering some text at the box near the top of the page (e.g., “Coronavirus Cases Race Bar Chart”).

Then, click the “Data” button in the middle of the gray row near the top of the screen to upload your dataset. Click the blue “Upload data” button and select the CSV file we just created (covid_19_plot_data.csv).

Data view in Flourish

Import it publicly. Set the Label column to A and the Values columns to cover the rest of the columns (or all possible remaining columns, like B-ZZ). Set the Categories and Image columns to blank ( ).

Data view in Flourish after importing data

When your interface looks similar to the above, hit “Preview.” Then, begin playing around with the different settings.

Boston Globe’s Settings

Here are the settings we can use to replicate the Boston Globe’s chart:

Bars
- No. bars: 11
Bar Colors
- Color mode: By bar
- Color overrides:
  - China: #63615f
  - France: #a5a29c
  - Germany: #7c51a8
  - India: #2e4764
  - Iran: #82bec8
  - Italy: #7099ac
  - Japan: #ffcf26
  - Singapore: #a6c180
  - South Korea: #84817d
  - Spain: #e39b46
  - United States: #ca7768
Labels
- Max font size: 0.8
Time counter & totalizer
- Current time counter, size (% of screen): 4
- Total: disable
Timeline & animation
- Loop timeline: disable
Header
- Title
  - Label: How coronavirus has spread around the world
  - Styling (change title styles)
    - Size: Bigger
- Subtitle
  - Label: Confirmed cases by country since Feb. 24
  - Styling (change subtitle styles)
    - Size: 1.3 (Custom, or ‘…’)
    - Line height: 1.3
Footer
- Source name: European Centre for Disease Prevention and Control (My modification, since that’s where I got the data from)
- Source url: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
- Size: 0.75

Publishing your Data Visualization

When you are satisfied with your data visualization, you can click the “Export & publish” button at the top-right corner of the screen. Then, select “Publish to share and embed”.

Preview of the final visualization in Flourish

Flourish will then provide you with a link as well as “embed code” that can be used to incorporate the visualization into your story on another platform.

If you used the options listed above, your visualization should look just like this one.