library(tidyverse)
library(jsonlite)
AE 13: Rectangling data from the PokéAPI
Suggested answers
Packages
We will use the following packages in this application exercise.
- tidyverse: For data import, wrangling, and visualization.
- jsonlite: For importing JSON files
Gotta catch em’ all!
Pokémon (also known as Pocket Monsters) is a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media.1 The PokéAPI contains detailed information about each Pokémon, including their name, type, and abilities. In this application exercise, we will use a set of JSON files containing API results from the PokéAPI to explore the Pokémon universe.
1 Source: Wikipedia
Importing the data
data/pokedex.json
and data/types.json
contain information about each Pokémon and the different types of Pokémon, respectively. We will use read_json()
to import these files.
<- read_json(path = "data/pokemon/pokedex.json")
pokemon <- read_json(path = "data/pokemon/types.json") types
Your turn: Use View()
to interactively explore each list object to identify their structure and the elements contained within each object.
Unnesting for analysis
For each of the exercises below, use an appropriate rectangling procedure to unnest_*()
one or more lists to extract the required elements for analysis.
How many Pokémon are there for each primary type?
Your turn: Use each Pokemon’s primary type to determine how many Pokémon there are for each type, then create a bar chart to visualize the distribution. The chart should label each Pokémon type in both English and Japanese.
# extract the primary type from the pokemon list and generate a frequency count
## using hoist()
<- tibble(pokemon) |>
poke_types unnest_wider(pokemon) |>
hoist(.col = type, main_type = 1L) |>
count(main_type)
## using unnest_wider() twice
<- tibble(pokemon) |>
poke_types unnest_wider(pokemon) |>
unnest_wider(type, names_sep = "_") |>
rename(main_type = type_1) |>
count(main_type)
# extract english and japanese names for types
<- tibble(types) |>
types_df unnest_wider(types)
# combine poke_types with types_df and create a name column that includes both
# english and japanese
left_join(x = poke_types, y = types_df, by = join_by(main_type == english)) |>
mutate(
name = str_glue("{main_type} ({japanese})"),
name = fct_reorder(.f = name, .x = n)
|>
) ggplot(mapping = aes(x = n, y = name)) +
geom_col() +
labs(
title = "Water-type Pokémon are the most common",
x = "Number of Pokémon",
y = NULL,
caption = "Source: PokéAPI"
+
) theme_minimal()
Which primary type of Pokémon are strongest based on total number of points?
Your turn: Use each Pokémon’s base stats to determine which primary type of Pokémon are strongest based on the total number of points. Create a boxplot to visualize the distribution of total points for each primary type.
# which primary type of pokemon are strongest based on total number of points
tibble(pokemon) |>
unnest_wider(pokemon) |>
# get base stats
unnest_wider(base) |>
# determine primary type
unnest_wider(type, names_sep = "_") |>
# calculate for each row the sum of HP:Speed
rowwise() |>
mutate(total = sum(c_across(cols = HP:Speed), na.rm = TRUE), .before = HP) |>
ungroup() |>
# exclude pokemon with total = 0 - means we don't have stats available
filter(total != 0) |>
# order the boxplots meaningfully
mutate(type_1 = fct_reorder(.f = type_1, .x = total)) |>
# generate the plot
ggplot(mapping = aes(x = total, y = type_1)) +
geom_boxplot() +
labs(
title = "Flying-type Pokémon are the most powerful on average",
x = "Total points",
y = NULL,
caption = "Source: PokéAPI"
+
) theme_minimal()
From what types of eggs to Pokemon hatch?
In Generation II, Pokémon introduced the concept of breeding, where Pokémon can produce offspring. In Generation III, Pokémon eggs were introduced, which can be hatched to produce a Pokémon.
Use each Pokémon’s egg group to determine from what types of eggs Pokémon hatch. Create a heatmap like the one below to visualize the distribution of egg groups for each primary type.
Consider using hoist()
to extract the main type and egg group from each Pokémon.
tibble(pokemon) |>
unnest_wider(pokemon) |>
# extract the main type
hoist(
.col = type, main_type = 1L
|>
) # extract the type of eggs from which the pokemon can hatch
hoist(
.col = profile, egg = "egg"
|>
) select(main_type, egg) |>
# some pokemon have more than one egg group, need to unnest longer
unnest_longer(egg) |>
# count the number of type-egg pairings
count(main_type, egg) |>
# draw the plot
ggplot(mapping = aes(x = egg, y = main_type, fill = n)) +
geom_tile() +
geom_text(mapping = aes(label = n), color = "white") +
scale_fill_viridis_c() +
theme_minimal() +
labs(
title = "A few Pokémon types are more likely to hatch from certain eggs",
x = "Egg group",
y = "Main Pokémon type",
caption = "Source: PokéAPI"
+
) theme(
legend.position = "none",
axis.text.x = element_text(angle = 30, hjust = 1)
)
Acknowledgments
- JSON data files obtained from
Purukitto/pokemon-data.json
::session_info() sessioninfo
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.1 (2023-06-16)
os macOS Ventura 13.5.2
system aarch64, darwin20
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2023-10-20
pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0)
digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0)
dplyr * 1.1.3 2023-09-03 [1] CRAN (R 4.3.0)
evaluate 0.22 2023-09-29 [1] CRAN (R 4.3.1)
fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.3.0)
fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
ggplot2 * 3.4.2 2023-04-03 [1] CRAN (R 4.3.0)
glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
gtable 0.3.3 2023-03-21 [1] CRAN (R 4.3.0)
here 1.0.1 2020-12-13 [1] CRAN (R 4.3.0)
hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0)
htmltools 0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)
htmlwidgets 1.6.2 2023-03-17 [1] CRAN (R 4.3.0)
jsonlite * 1.8.7 2023-06-29 [1] CRAN (R 4.3.0)
knitr 1.44 2023-09-11 [1] CRAN (R 4.3.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.3.0)
lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
lubridate * 1.9.2 2023-02-10 [1] CRAN (R 4.3.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.3.0)
pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
ragg 1.2.5 2023-01-12 [1] CRAN (R 4.3.0)
readr * 2.1.4 2023-02-10 [1] CRAN (R 4.3.0)
rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.3.0)
rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
scales 1.2.1 2022-08-20 [1] CRAN (R 4.3.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
stringi 1.7.12 2023-01-11 [1] CRAN (R 4.3.0)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.3.0)
systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.3.0)
textshaping 0.3.6 2021-10-13 [1] CRAN (R 4.3.0)
tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
tidyr * 1.3.0 2023-01-24 [1] CRAN (R 4.3.0)
tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0)
timechange 0.2.0 2023-01-11 [1] CRAN (R 4.3.0)
tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0)
utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)
viridisLite 0.4.2 2023-05-02 [1] CRAN (R 4.3.0)
withr 2.5.1 2023-09-26 [1] CRAN (R 4.3.1)
xfun 0.40 2023-08-09 [1] CRAN (R 4.3.0)
yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
[1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
──────────────────────────────────────────────────────────────────────────────