Jenny Bryan’s “Row-oriented Workflows” webinar filled me with the courage to dive head-first into nested tibbles and list-columns. If you haven’t had time to watch it yet, carve out 45 minutes and treat yourself.
The appeal of keeping row-wise workflows arranged within an orderly “data rectangle” (a term coined by Jenny) was immediately apparent to me, but I ran into a problem: how can objects without a special map_*
variant be pulled out of list-column without losing their attributes?
This is one of those problems that feels like it has an obvious solution, but once you start poking around you realize it’s uglier than you thought. The problem is not specific to list-columns; any time objects are stored in a list special care needs to be taken to convert the list into a vector. Failing to do so runs the risk that important attributes may be inadvertently lost during the conversion.
I did some sleuthing on Github and found an issue or two addressing this topic, so I posted my approach there and got some helpful feedback. This note summarizes the approach that I came up with as well as the ideas and limitations that others contributed in the issue.
The Problem
library(gapminder, warn.conflicts = FALSE)
library(rbenchmark, warn.conflicts = FALSE)
library(lubridate, warn.conflicts = FALSE)
library(tidyverse, warn.conflicts = FALSE)
gap_nested <- gapminder %>%
rename_all(toupper) %>%
transmute()
# Create a vector
dttm_range <- as_datetime(seq(as.Date("1999/1/1"), as.Date("1999/12/31"), "months"))
print(dttm_range)
## [1] "1999-01-01 UTC" "1999-02-01 UTC" "1999-03-01 UTC" "1999-04-01 UTC"
## [5] "1999-05-01 UTC" "1999-06-01 UTC" "1999-07-01 UTC" "1999-08-01 UTC"
## [9] "1999-09-01 UTC" "1999-10-01 UTC" "1999-11-01 UTC" "1999-12-01 UTC"
# Map a function over the vector,
# changing each object to a type that doesn't have its own map_* variant
date_list <- map(dttm_range, as.Date)
# Extract the values using the different methods
chr_example <- map_chr(date_list, as.character) # inadequate: loses attributes
reduce_example <- reduce(date_list, c)
invoke_example <- invoke(c, date_list)
docall_example <- do.call("c",date_list)
# View the results, notice that the first just converts them to character
examples <-
tibble(chr_example,
reduce_example,
invoke_example,
docall_example
)
print(examples, n = 3)
## # A tibble: 12 x 4
## chr_example reduce_example invoke_example docall_example
## <chr> <date> <date> <date>
## 1 1999-01-01 1999-01-01 1999-01-01 1999-01-01
## 2 1999-02-01 1999-02-01 1999-02-01 1999-02-01
## 3 1999-03-01 1999-03-01 1999-03-01 1999-03-01
## # ... with 9 more rows
Different Approaches
# Benchmarks
benchmarks <-
benchmark(
chr_example = as.Date(map_chr(date_list, as.character)),
reduce_example = reduce(date_list, c),
invoke_example = invoke(c, date_list),
docall_example = do.call("c",date_list),
replications = 1e4,
columns = c(
"test", "elapsed", "relative")
) %>%
arrange(relative) %>%
as_tibble() %>%
rename_all(toupper)
print(benchmarks)
## # A tibble: 4 x 3
## TEST ELAPSED RELATIVE
## <fct> <dbl> <dbl>
## 1 docall_example 0.260 1.00
## 2 invoke_example 0.350 1.35
## 3 reduce_example 4.37 16.8
## 4 chr_example 9.53 36.7
Limitations
# Limitations
factor_vector <- c(factor("A"), factor("B"))
class(factor_vector) # should be 'factor'
## [1] "integer"
date_dttm_vector <- c(Sys.Date(), Sys.time())
class(date_dttm_vector) # should be 'date' and 'dttm'
## [1] "Date"