本文主要是介绍purr map walk 学习教程 完整版教程学习,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Function reference • purrrhttps://purrr.tidyverse.org/reference/index.htmlMap over multiple input simultaneously (in "parallel") — pmap • purrr
11 Other purrr functions | Functional Programming (stanford.edu)
关注微信:生信小博士
11.1 Map functions that output tibbles
Instead of creating an atomic vector or list, the map variants map_dfr()
and map_dfc()
create a tibble.
With these map functions, the assembly line worker creates a tibble for each input element, and the output conveyor belt ends up with a collection of tibbles.
The worker then combines all the small tibbles into a single, larger tibble. There are multiple ways to combine smaller tibbles into a larger tibble. map_dfr()
(r for rows) stacks the smaller tibbles on top of each other.
map_dfc()
(c for columns) stacks them side-by-side.
There are _dfr
and _dfc
variants of pmap()
and map2()
as well. In the following sections, we’ll cover map_dfr()
and map_dfc()
in more detail.
11.1.1 _dfr
map_dfr()
is useful when reading in data from multiple files. The following code reads in several very simple csv files, each of which contains the name of a different dinosaur genus.
read_csv("data/purrr-extras/file_001.csv")
#> # A tibble: 1 × 2
#> id genus
#> <dbl> <chr>
#> 1 1 Hoplitosaurusread_csv("data/purrr-extras/file_002.csv")
#> # A tibble: 1 × 2
#> id genus
#> <dbl> <chr>
#> 1 2 Herrerasaurusread_csv("data/purrr-extras/file_003.csv")
#> # A tibble: 1 × 2
#> id genus
#> <dbl> <chr>
#> 1 3 Coelophysis
read_csv()
produces a tibble, and so we can use map_dfr()
to map over all three file names and bind the resulting individual tibbles into a single tibble.
files <- str_glue("data/purrr-extras/file_00{1:3}.csv")
files
#> data/purrr-extras/file_001.csv
#> data/purrr-extras/file_002.csv
#> data/purrr-extras/file_003.csvfiles %>% map_dfr(read_csv)
#> # A tibble: 3 × 2
#> id genus
#> <dbl> <chr>
#> 1 1 Hoplitosaurus
#> 2 2 Herrerasaurus
#> 3 3 Coelophysis
The result is a tibble with three rows and two columns, because map_dfr()
aligns the columns of the individual tibbles by name.
The individual tibbles can have different numbers of rows or columns. map_dfr()
just creates a column for each unique column name. If some of the individual tibbles lack a column that others have, map_dfr()
fills in with NA
values.
read_csv("data/purrr-extras/file_004.csv")
#> # A tibble: 2 × 3
#> id genus start_period
#> <dbl> <chr> <chr>
#> 1 4 Dilophosaurus Sinemurian
#> 2 5 Segisaurus Pliensbachianc(files, "data/purrr-extras/file_004.csv") %>% map_dfr(read_csv)
#> # A tibble: 5 × 3
#> id genus start_period
#> <dbl> <chr> <chr>
#> 1 1 Hoplitosaurus <NA>
#> 2 2 Herrerasaurus <NA>
#> 3 3 Coelophysis <NA>
#> 4 4 Dilophosaurus Sinemurian
#> 5 5 Segisaurus Pliensbachian
11.1.2 _dfc
map_dfc()
is typically less useful than map_dfr()
because it relies on row position to stack the tibbles side-by-side. Row position is prone to error, and it will often be difficult to check if the data in each row is aligned correctly. However, if you have data with variables in different places and are positive the rows are aligned, map_dfc()
may be appropriate.
Unfortunately, even if the individual tibbles contain a unique identifier for each row, map_dfc()
doesn’t use the identifiers to verify that the rows are aligned correctly, nor does it combine identically named columns.
read_csv("data/purrr-extras/file_005.csv")
#> # A tibble: 1 × 3
#> id diet start_period
#> <dbl> <chr> <chr>
#> 1 1 herbivore Barremianc("data/purrr-extras/file_001.csv", "data/purrr-extras/file_005.csv") %>% map_dfc(read_csv)
#> # A tibble: 1 × 5
#> id...1 genus id...3 diet start_period
#> <dbl> <chr> <dbl> <chr> <chr>
#> 1 1 Hoplitosaurus 1 herbivore Barremian
Instead, you end up with a duplicated column (id...1
and id...3
).
If you have a unique identifier for each row, it is much better to join on that identifier.
left_join(read_csv("data/purrr-extras/file_001.csv"),read_csv("data/purrr-extras/file_005.csv"),by = "id"
)
#> # A tibble: 1 × 4
#> id genus diet start_period
#> <dbl> <chr> <chr> <chr>
#> 1 1 Hoplitosaurus herbivore Barremian
Also, because map_dfc()
combines tibbles by row position, the tibbles can have different numbers of columns, but they should have the same number of rows.
11.2 Walk
The walk functions work similarly to the map functions, but you use them when you’re interested in applying a function that performs an action instead of producing data (e.g., print()
).
The walk functions are useful for performing actions like writing files and printing plots. For example, say we used purrr to generate a list of plots.
set.seed(745)plot_rnorm <- function(sd) {tibble(x = rnorm(n = 5000, mean = 0, sd = sd)) %>% ggplot(aes(x)) +geom_histogram(bins = 40) +geom_vline(xintercept = 0, color = "blue")
}plots <-c(5, 1, 9) %>% map(plot_rnorm)
We can now use walk()
to print them out.
plots %>% walk(print)
这篇关于purr map walk 学习教程 完整版教程学习的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!