purr map walk 学习教程 完整版教程学习

2023-10-12 02:52

本文主要是介绍purr map walk 学习教程 完整版教程学习,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Function reference • purrricon-default.png?t=N7T8https://purrr.tidyverse.org/reference/index.htmlMap over multiple input simultaneously (in "parallel") — pmap • purrr

11 Other purrr functions | Functional Programming (stanford.edu)

关注微信:生信小博士

11.1 Map functions that output tibbles

Instead of creating an atomic vector or list, the map variants map_dfr() and map_dfc() create a tibble.

With these map functions, the assembly line worker creates a tibble for each input element, and the output conveyor belt ends up with a collection of tibbles.

The worker then combines all the small tibbles into a single, larger tibble. There are multiple ways to combine smaller tibbles into a larger tibble. map_dfr() (r for rows) stacks the smaller tibbles on top of each other.

map_dfc() (c for columns) stacks them side-by-side.

There are _dfr and _dfc variants of pmap() and map2() as well. In the following sections, we’ll cover map_dfr() and map_dfc() in more detail.

11.1.1 _dfr

map_dfr() is useful when reading in data from multiple files. The following code reads in several very simple csv files, each of which contains the name of a different dinosaur genus.

read_csv("data/purrr-extras/file_001.csv")
#> # A tibble: 1 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     1 Hoplitosaurusread_csv("data/purrr-extras/file_002.csv")
#> # A tibble: 1 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     2 Herrerasaurusread_csv("data/purrr-extras/file_003.csv")
#> # A tibble: 1 × 2
#>      id genus      
#>   <dbl> <chr>      
#> 1     3 Coelophysis

read_csv() produces a tibble, and so we can use map_dfr() to map over all three file names and bind the resulting individual tibbles into a single tibble.

files <- str_glue("data/purrr-extras/file_00{1:3}.csv")
files
#> data/purrr-extras/file_001.csv
#> data/purrr-extras/file_002.csv
#> data/purrr-extras/file_003.csvfiles %>% map_dfr(read_csv)
#> # A tibble: 3 × 2
#>      id genus        
#>   <dbl> <chr>        
#> 1     1 Hoplitosaurus
#> 2     2 Herrerasaurus
#> 3     3 Coelophysis

The result is a tibble with three rows and two columns, because map_dfr() aligns the columns of the individual tibbles by name.

The individual tibbles can have different numbers of rows or columns. map_dfr() just creates a column for each unique column name. If some of the individual tibbles lack a column that others have, map_dfr() fills in with NA values.

read_csv("data/purrr-extras/file_004.csv")
#> # A tibble: 2 × 3
#>      id genus         start_period 
#>   <dbl> <chr>         <chr>        
#> 1     4 Dilophosaurus Sinemurian   
#> 2     5 Segisaurus    Pliensbachianc(files, "data/purrr-extras/file_004.csv") %>% map_dfr(read_csv)
#> # A tibble: 5 × 3
#>      id genus         start_period 
#>   <dbl> <chr>         <chr>        
#> 1     1 Hoplitosaurus <NA>         
#> 2     2 Herrerasaurus <NA>         
#> 3     3 Coelophysis   <NA>         
#> 4     4 Dilophosaurus Sinemurian   
#> 5     5 Segisaurus    Pliensbachian

11.1.2 _dfc

map_dfc() is typically less useful than map_dfr() because it relies on row position to stack the tibbles side-by-side. Row position is prone to error, and it will often be difficult to check if the data in each row is aligned correctly. However, if you have data with variables in different places and are positive the rows are aligned, map_dfc() may be appropriate.

Unfortunately, even if the individual tibbles contain a unique identifier for each row, map_dfc() doesn’t use the identifiers to verify that the rows are aligned correctly, nor does it combine identically named columns.

read_csv("data/purrr-extras/file_005.csv")
#> # A tibble: 1 × 3
#>      id diet      start_period
#>   <dbl> <chr>     <chr>       
#> 1     1 herbivore Barremianc("data/purrr-extras/file_001.csv", "data/purrr-extras/file_005.csv") %>% map_dfc(read_csv)
#> # A tibble: 1 × 5
#>   id...1 genus         id...3 diet      start_period
#>    <dbl> <chr>          <dbl> <chr>     <chr>       
#> 1      1 Hoplitosaurus      1 herbivore Barremian

Instead, you end up with a duplicated column (id...1 and id...3).

If you have a unique identifier for each row, it is much better to join on that identifier.

left_join(read_csv("data/purrr-extras/file_001.csv"),read_csv("data/purrr-extras/file_005.csv"),by = "id"
)
#> # A tibble: 1 × 4
#>      id genus         diet      start_period
#>   <dbl> <chr>         <chr>     <chr>       
#> 1     1 Hoplitosaurus herbivore Barremian

Also, because map_dfc() combines tibbles by row position, the tibbles can have different numbers of columns, but they should have the same number of rows.

11.2 Walk

The walk functions work similarly to the map functions, but you use them when you’re interested in applying a function that performs an action instead of producing data (e.g., print()).

The walk functions are useful for performing actions like writing files and printing plots. For example, say we used purrr to generate a list of plots.

set.seed(745)plot_rnorm <- function(sd) {tibble(x = rnorm(n = 5000, mean = 0, sd = sd)) %>% ggplot(aes(x)) +geom_histogram(bins = 40) +geom_vline(xintercept = 0, color = "blue")
}plots <-c(5, 1, 9) %>% map(plot_rnorm)

We can now use walk() to print them out.

plots %>% walk(print)

这篇关于purr map walk 学习教程 完整版教程学习的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/192734

相关文章

SpringBoot集成redisson实现延时队列教程

《SpringBoot集成redisson实现延时队列教程》文章介绍了使用Redisson实现延迟队列的完整步骤,包括依赖导入、Redis配置、工具类封装、业务枚举定义、执行器实现、Bean创建、消费... 目录1、先给项目导入Redisson依赖2、配置redis3、创建 RedissonConfig 配

基于C#实现PDF转图片的详细教程

《基于C#实现PDF转图片的详细教程》在数字化办公场景中,PDF文件的可视化处理需求日益增长,本文将围绕Spire.PDFfor.NET这一工具,详解如何通过C#将PDF转换为JPG、PNG等主流图片... 目录引言一、组件部署二、快速入门:PDF 转图片的核心 C# 代码三、分辨率设置 - 清晰度的决定因

深入解析C++ 中std::map内存管理

《深入解析C++中std::map内存管理》文章详解C++std::map内存管理,指出clear()仅删除元素可能不释放底层内存,建议用swap()与空map交换以彻底释放,针对指针类型需手动de... 目录1️、基本清空std::map2️、使用 swap 彻底释放内存3️、map 中存储指针类型的对象

Java Scanner类解析与实战教程

《JavaScanner类解析与实战教程》JavaScanner类(java.util包)是文本输入解析工具,支持基本类型和字符串读取,基于Readable接口与正则分隔符实现,适用于控制台、文件输... 目录一、核心设计与工作原理1.底层依赖2.解析机制A.核心逻辑基于分隔符(delimiter)和模式匹

Unity新手入门学习殿堂级知识详细讲解(图文)

《Unity新手入门学习殿堂级知识详细讲解(图文)》Unity是一款跨平台游戏引擎,支持2D/3D及VR/AR开发,核心功能模块包括图形、音频、物理等,通过可视化编辑器与脚本扩展实现开发,项目结构含A... 目录入门概述什么是 UnityUnity引擎基础认知编辑器核心操作Unity 编辑器项目模式分类工程

spring AMQP代码生成rabbitmq的exchange and queue教程

《springAMQP代码生成rabbitmq的exchangeandqueue教程》使用SpringAMQP代码直接创建RabbitMQexchange和queue,并确保绑定关系自动成立,简... 目录spring AMQP代码生成rabbitmq的exchange and 编程queue执行结果总结s

Python学习笔记之getattr和hasattr用法示例详解

《Python学习笔记之getattr和hasattr用法示例详解》在Python中,hasattr()、getattr()和setattr()是一组内置函数,用于对对象的属性进行操作和查询,这篇文章... 目录1.getattr用法详解1.1 基本作用1.2 示例1.3 原理2.hasattr用法详解2.

python使用Akshare与Streamlit实现股票估值分析教程(图文代码)

《python使用Akshare与Streamlit实现股票估值分析教程(图文代码)》入职测试中的一道题,要求:从Akshare下载某一个股票近十年的财务报表包括,资产负债表,利润表,现金流量表,保存... 目录一、前言二、核心知识点梳理1、Akshare数据获取2、Pandas数据处理3、Matplotl

Python pandas库自学超详细教程

《Pythonpandas库自学超详细教程》文章介绍了Pandas库的基本功能、安装方法及核心操作,涵盖数据导入(CSV/Excel等)、数据结构(Series、DataFrame)、数据清洗、转换... 目录一、什么是Pandas库(1)、Pandas 应用(2)、Pandas 功能(3)、数据结构二、安

2025版mysql8.0.41 winx64 手动安装详细教程

《2025版mysql8.0.41winx64手动安装详细教程》本文指导Windows系统下MySQL安装配置,包含解压、设置环境变量、my.ini配置、初始化密码获取、服务安装与手动启动等步骤,... 目录一、下载安装包二、配置环境变量三、安装配置四、启动 mysql 服务,修改密码一、下载安装包安装地