AE 13: Rectangling data from the PokéAPI

Suggested answers

Application exercise
Answers

Packages

We will use the following packages in this application exercise.

  • tidyverse: For data import, wrangling, and visualization.
  • jsonlite: For importing JSON files
library(tidyverse)
library(jsonlite)

Gotta catch em’ all!

Pokémon (also known as Pocket Monsters) is a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media.1 The PokéAPI contains detailed information about each Pokémon, including their name, type, and abilities. In this application exercise, we will use a set of JSON files containing API results from the PokéAPI to explore the Pokémon universe.

1 Source: Wikipedia

Importing the data

data/pokedex.json and data/types.json contain information about each Pokémon and the different types of Pokémon, respectively. We will use read_json() to import these files.

pokemon <- read_json(path = "data/pokemon/pokedex.json")
types <- read_json(path = "data/pokemon/types.json")

Your turn: Use View() to interactively explore each list object to identify their structure and the elements contained within each object.

Unnesting for analysis

For each of the exercises below, use an appropriate rectangling procedure to unnest_*() one or more lists to extract the required elements for analysis.

How many Pokémon are there for each primary type?

Your turn: Use each Pokemon’s primary type to determine how many Pokémon there are for each type, then create a bar chart to visualize the distribution. The chart should label each Pokémon type in both English and Japanese.

# extract the primary type from the pokemon list and generate a frequency count
## using hoist()
poke_types <- tibble(pokemon) |>
  unnest_wider(pokemon) |>
  hoist(.col = type, main_type = 1L) |>
  count(main_type)

## using unnest_wider() twice
poke_types <- tibble(pokemon) |>
  unnest_wider(pokemon) |>
  unnest_wider(type, names_sep = "_") |>
  rename(main_type = type_1) |>
  count(main_type)

# extract english and japanese names for types
types_df <- tibble(types) |>
  unnest_wider(types)

# combine poke_types with types_df and create a name column that includes both
# english and japanese
left_join(x = poke_types, y = types_df, by = join_by(main_type == english)) |>
  mutate(
    name = str_glue("{main_type} ({japanese})"),
    name = fct_reorder(.f = name, .x = n)
    ) |>
  ggplot(mapping = aes(x = n, y = name)) +
  geom_col() +
  labs(
    title = "Water-type Pokémon are the most common",
    x = "Number of Pokémon",
    y = NULL,
    caption = "Source: PokéAPI"
  ) +
  theme_minimal()

Which primary type of Pokémon are strongest based on total number of points?

Your turn: Use each Pokémon’s base stats to determine which primary type of Pokémon are strongest based on the total number of points. Create a boxplot to visualize the distribution of total points for each primary type.

# which primary type of pokemon are strongest based on total number of points
tibble(pokemon) |>
  unnest_wider(pokemon) |>
  # get base stats
  unnest_wider(base) |>
  # determine primary type
  unnest_wider(type, names_sep = "_") |>
  # calculate for each row the sum of HP:Speed
  rowwise() |>
  mutate(total = sum(c_across(cols = HP:Speed), na.rm = TRUE), .before = HP) |>
  ungroup() |>
  # exclude pokemon with total = 0 - means we don't have stats available
  filter(total != 0) |>
  # order the boxplots meaningfully
  mutate(type_1 = fct_reorder(.f = type_1, .x = total)) |>
  # generate the plot
  ggplot(mapping = aes(x = total, y = type_1)) +
  geom_boxplot() +
  labs(
    title = "Flying-type Pokémon are the most powerful on average",
    x = "Total points",
    y = NULL,
    caption = "Source: PokéAPI"
  ) +
  theme_minimal()

From what types of eggs to Pokemon hatch?

In Generation II, Pokémon introduced the concept of breeding, where Pokémon can produce offspring. In Generation III, Pokémon eggs were introduced, which can be hatched to produce a Pokémon.

A picture of Togepi.

Togepi was the first Pokémon to be introduced as an egg.

Use each Pokémon’s egg group to determine from what types of eggs Pokémon hatch. Create a heatmap like the one below to visualize the distribution of egg groups for each primary type.

Tip

Consider using hoist() to extract the main type and egg group from each Pokémon.

tibble(pokemon) |>
  unnest_wider(pokemon) |>
  # extract the main type
  hoist(
    .col = type, main_type = 1L
  ) |>
  # extract the type of eggs from which the pokemon can hatch
  hoist(
    .col = profile, egg = "egg"
  ) |>
  select(main_type, egg) |>
  # some pokemon have more than one egg group, need to unnest longer
  unnest_longer(egg) |>
  # count the number of type-egg pairings
  count(main_type, egg) |>
  # draw the plot
  ggplot(mapping = aes(x = egg, y = main_type, fill = n)) +
  geom_tile() +
  geom_text(mapping = aes(label = n), color = "white") +
  scale_fill_viridis_c() +
  theme_minimal() +
  labs(
    title = "A few Pokémon types are more likely to hatch from certain eggs",
    x = "Egg group",
    y = "Main Pokémon type",
    caption = "Source: PokéAPI"
  ) +
  theme(
    legend.position = "none",
    axis.text.x = element_text(angle = 30, hjust = 1)
  )

Acknowledgments

sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Ventura 13.5.2
 system   aarch64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2023-10-20
 pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
 colorspace    2.1-0   2023-01-23 [1] CRAN (R 4.3.0)
 digest        0.6.33  2023-07-07 [1] CRAN (R 4.3.0)
 dplyr       * 1.1.3   2023-09-03 [1] CRAN (R 4.3.0)
 evaluate      0.22    2023-09-29 [1] CRAN (R 4.3.1)
 fansi         1.0.5   2023-10-08 [1] CRAN (R 4.3.1)
 farver        2.1.1   2022-07-06 [1] CRAN (R 4.3.0)
 fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [1] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [1] CRAN (R 4.3.0)
 ggplot2     * 3.4.2   2023-04-03 [1] CRAN (R 4.3.0)
 glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
 gtable        0.3.3   2023-03-21 [1] CRAN (R 4.3.0)
 here          1.0.1   2020-12-13 [1] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [1] CRAN (R 4.3.0)
 htmltools     0.5.6.1 2023-10-06 [1] CRAN (R 4.3.1)
 htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.0)
 jsonlite    * 1.8.7   2023-06-29 [1] CRAN (R 4.3.0)
 knitr         1.44    2023-09-11 [1] CRAN (R 4.3.0)
 labeling      0.4.2   2020-10-20 [1] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
 lubridate   * 1.9.2   2023-02-10 [1] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [1] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
 purrr       * 1.0.2   2023-08-10 [1] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
 ragg          1.2.5   2023-01-12 [1] CRAN (R 4.3.0)
 readr       * 2.1.4   2023-02-10 [1] CRAN (R 4.3.0)
 rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
 rmarkdown     2.25    2023-09-18 [1] CRAN (R 4.3.1)
 rprojroot     2.0.3   2022-04-02 [1] CRAN (R 4.3.0)
 rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
 scales        1.2.1   2022-08-20 [1] CRAN (R 4.3.0)
 sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
 stringi       1.7.12  2023-01-11 [1] CRAN (R 4.3.0)
 stringr     * 1.5.0   2022-12-02 [1] CRAN (R 4.3.0)
 systemfonts   1.0.4   2022-02-11 [1] CRAN (R 4.3.0)
 textshaping   0.3.6   2021-10-13 [1] CRAN (R 4.3.0)
 tibble      * 3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [1] CRAN (R 4.3.0)
 tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [1] CRAN (R 4.3.0)
 timechange    0.2.0   2023-01-11 [1] CRAN (R 4.3.0)
 tzdb          0.4.0   2023-05-12 [1] CRAN (R 4.3.0)
 utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
 vctrs         0.6.4   2023-10-12 [1] CRAN (R 4.3.1)
 viridisLite   0.4.2   2023-05-02 [1] CRAN (R 4.3.0)
 withr         2.5.1   2023-09-26 [1] CRAN (R 4.3.1)
 xfun          0.40    2023-08-09 [1] CRAN (R 4.3.0)
 yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library

──────────────────────────────────────────────────────────────────────────────