Rcpp的开始Getting Started with Rcpp Nick Ulle

2023-10-13 21:58
文章标签 started rcpp getting nick ulle

本文主要是介绍Rcpp的开始Getting Started with Rcpp Nick Ulle,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

R语言混合编程

      C和C++语言的混合编程

1.Introduction

Compiled C and C++ routines can be called from R using the built-in .

R可用调用内置函数编译C和C++例程。

R objects passed to these routines have type SEXP. A SEXP is a pointer to an encapsulated structure that holds the object’s type, value, and other attributes used by the R interpreter.

这些例程是SEXP的R对象。SEXP对象是一个封装结构,具有类型、值和属性,由R解释器使用。

The R application programming interface (API) provides a limited set of macros and C routines for manipulating SEXPs and calling R functions.

R的“应用程序接口”API提供有限宏(左右)定义集合和C例程,以实现维护SEXP对象并且调用R语言函数。

The level of abstraction in the R API is low. Even simple tasks may require writing lengthy boilerplate code.

R API是简易的,简单任务也必须编写漫长的样板代码。

Using the R API from C++ is especially uncomfortable, because it doesn’t take advantage of any of C++’s features.

在C++中使用R API是不令人高兴的,因为它们没有任何C++的特性。

Rcpp is an R package that makes it easier to interface R and C++ code. Rcpp does this by providing a set of C++ wrapper classes for common R data types, as well as tools for automating the process of compiling and loading C++ routines for R.

Rcpp提供常见R数据类型的 C++ 包装类的集合与编译加载C++例程的工具。

2.文章中的例子

Create a blank text file and enter the code:

创建一个空白.txt文件,然后输入源码:

#include <Rcpp.h>
// [[Rcpp::export]]

void hello()

{

   Rprintf("Hello, world! ");

}  

Save the file as hello.cpp.

保存文件名为hello.cpp。

Rprintf()是R API。The syntax is the same as printf.

Time to test the code! Start R and enter the commands:

在R语言编程环境中,键入命令:

library(Rcpp)

sourceCpp("hello.cpp")   #编译hello.cpp文件

hello()

You should see “Hello, world!” printed on the R console.

在显示器R console上将看到"hello,world"字样。

3.The Rcpp Interface


3.1 Data Structures

Most of Rcpp’s functionality is provided through a set of C++ classes that wrap R data structures.A few of them are:

Rcpp 的大部分功能通过一组包装R 数据结构的C++类提供。有几个是:

        • IntegerVector, NumericVector, LogicalVector, CharacterVector

            整数向量,数值向量,逻辑向量,字符串向量

        • List, DataFrame

            列表,数据框

        • Named, Dimension

            命名,维度

        • IntegerMatrix, NumericMatrix

            整数矩阵,数值矩阵

        • Function

             函数

        • Environment

            环境

Memory management is handled automatically by the class constructors and destructors. These classes also have methods that mimic various R functions. A few of the most

内存管理是类构造函数和解析函数负责处理。这些类也有基本成员函数模仿各种R函数。最常用的一些方法是:

useful methods are:

        • isNULL

             判断空

        • attributeNames, hasAttribute, attr

          属性标签,属性,属性设置           

        • length, nrow, ncol        

           长度,行值,列值

The vector and list classes have constructors that accept the number of elements as a parameter, similar to their counterparts in R.

向量和列表类有类构造函数将成员元素作为参数,与R的对应函数类似。(注释:列表是特殊向量)

The helper class Dimension can be used to create a multidimensional vector:

Rcpp的辅助类"Dimension"能用在创建一个多维向量:

        // Create a 2-by-3-by-4 vector.

        NumericVector a = NumericVector( Dimension(2, 3, 4) );  #创建数值向量a,有维度(2,3,4)

They also have a static create method, for specifying the elements of the new vector. The helper class Named represents named vector elements. For instance,

它们也有统计创建成员函数,明确规定新的向量的元素。辅助类命名“Named"表示标签向量的元素值。?

        IntegerVector q1_days = IntegerVector::create(

                Named("January") = 31,    #赋值january=31

                Named("February") = 28,

                Named("March") = 31

);

creates an integer vector with 3 named elements.

创建一个整数标签有三个标签元素。

3.3 Other Details

Rcpp converts R objects to and from C++ objects with the templated routines as and wrap, respectively. It’s rarely necessary to call these routines explicitly, but since Rcpp makes frequent implicit use of them, it’s important to know what they do.

Rcpp软件包转换R对象与C++对象应用模板例程。Rcpp常常含蓄地使用这些例程,尽管几乎从不明说地调用,因此了解Rcpp的运行机制就是重要的。

The clone routine makes a copy of an Rcpp object. Since C++ uses reference semantics, you must explicitly call clone when you want to make a copy.

“克隆”例程复制Rcpp对象的一个副本。由于C++使用引用语义,因此必须在产生副本时明说调用“克隆”。

/**

   引用是变量的别名,因此C++编译器用特殊的编译方法为引用分配内存空间,而引用是不分配内存空间的。

**/

Missing values can be specified with the constants NA_INTEGER, NA_REAL, NA_LOGICAL, and NA_STRING. The special values NaN, Inf, and -Inf can be specified with the constants R_NaN, R_PosInf, and R_NegInf. These constants all come from the R API rather than Rcpp.

缺失值应被规范表示为NA_INTEGER, NA_REAL, NA_LOGICAL, 和NA_STRING常数量。而特殊值NaN, Inf, 和-Inf应被表示为R_NaN, R_PosInf, 和R_NegInf常量。这些常量产生在R API而不是Rcpp。

4.Programming Strategy 编程战略

Generally speaking, you should write most of your code in R, to take advantage of its high level of abstraction. Then you can profile your code to identify bottlenecks where R is unacceptably slow, and replace those sections with C++ code for a performance boost. The most straightforward way to do this is to rewrite an entire function. As long as your C++ routine has the same call signature as the R function it replaces, the change should be invisible to the rest of your application.

写程序的大部分内容用R语言,利用它的高等级抽象思维。然后将发现的R不可接受的慢的程序段,用C++程序代替,提高程序性能。最好的方法是重写一个完整函数。只要调用签名相同,函数的更新是不可见的,因此并不破坏剩下的部分。

5.Example: Row Maximums

Suppose we want to compute the maximum element of each row in a matrix. To achieve this, we loop over each row of the matrix and use the sugar routine max:
计算矩阵每一行元素的最大值。对每一行设置循环,并且应用sugar例程。

#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]

NumericVector row_max(NumericMatrix m)   ##计算矩阵每一行的最大值

{

int nrow = m.nrow();   ##行数nrow

NumericVector max(nrow);  ##声明max数组,用圆括号表示
for (int i = 0; i < nrow; i++) // Get row i with m(i, _).

max[i] = Rcpp::max( m(i, _) );  ##调用max()计算每一行的最大值,保存在数组max[]中
return max;  ##返回值max数组

}

Notice that the matrix classes in Rcpp use parentheses ( ) as the subset operator rather than square brackets

[ ]. This is due to limitations in C++.

Rcpp中的matrix类使用圆括号()作为子集合运算符而不是方括号。这是C++的限制。

6.Example: Box Packing 背包问题

Suppose we want to simulate a discrete box-packing Markov chain. At each time step, an item with weight randomly distributed in {1,...,w} arrives for packing. Items are placed in the same box so long as the box weight does not exceed w. If an item would make the current box’s weight exceed w, a new box is started with that item. We might be interested in the weight of the current box at each time step, as well as which times a new box is started.

假设我们想模拟一个离散包装箱的马尔科夫链(背包问题)。在每一步,到达一个为了装箱的物品,重量具有随机分布特征{1,...,w}。只要箱子重量不超过w,则多个物品重量放置在同一个箱子中。如果一件物品的重量使当前箱子的重量超过w,则将放到一个新的箱子中。我们感兴趣的是每一个新的箱子的重量,和新的箱子的开始时间。

A simulation of the box-packing chain can be implemented in R, but suppose we want to run the simulation for a large number of time steps in order to estimate long-run statistics. In that case, the simulation might be unacceptably slow. We can use C++ and Rcpp to write a much faster version.

包装箱链的模拟能在R语言环境中实施,但是设想我们希望运行模拟在一个大数量的时间步上,为了估计长期运行概率。在这个条件下,模拟可能不可接受地慢。我们能使用C++和Rcpp写一个更快的版本。

Implementation

Create a blank text file and enter the code skeleton:  创建.txt文件,输入源码框架

#include <Rcpp.h> using namespace Rcpp;
// [[Rcpp::export]]

List pack_boxes(int n, NumericVector p) {
// ...
}

The pack_boxes routine will contain our simulation. It needs to sample item weights, add each item weight to the previous time step’s box weight, and then check whether the box is too heavy, starting a new box when necessary. The routine has parameters n, the number of steps to simulate, and p, the probabilities of the item weights. We don’t need to make w a parameter, since w can be inferred from the length of p. The routine has return type List. Rcpp implicitly converts between SEXP and these input/output types.

包装箱进度将包括我们的模拟。它需要收集物品重量、增加每一个物品重量到上一个时间步的箱子重量中,并且
检查箱子是否过重,在必要的时候开始一个新的箱子。这个进度表有参数n,模拟的时间步,和p,物品重量的概率。我们不需要设置参数w,既然w能从p的长度推断出。这个例程返回list类型。Rcpp执行在SEXP和输入输出类的转换。

If we were implementing the simulation in R, we could sample the item weights with the sample function. The R API doesn’t have a corresponding C routine. Fortunately, Rcpp’s Function class makes calling R functions from C++ simple. The constructor takes the name of the desired function as parameter. After creating a Function object for sample, we can call it with the same parameters as the original R function. A word of caution: calling R functions from C++ code is at least as slow as calling them from R itself, so use them sparingly.

如果在R语言环境中我们执行模拟,将应用收集函数收集物品的重量。R API没有C语言例程。幸运的是,Rcpp的函数类能从C++调用R语言函数。构造器使用所需函数的名字作为参数。在为收集数据创建一个函数对象之后,我们能用相同的参数调用此函数当作R语言函数。注意:调用R语言函数尽管用C++源码,和调用R语言函数一样慢,所以应有节制地使用。

For the rest of the simulation, we need a vector weight of length n to hold the weight of the box at each time step, and another vector, first, to hold the first item times. We also need a variable n_boxes to keep track of how many boxes have been packed.

在模拟的其他部分,我们需要一个长度n的重量向量在每一个时间步保存包装箱的重量,并且有另一个向量保存第一个物品的时间。我们需要一个n_boxes变量保存多少个包装箱被使用的轨迹。

#include <Rcpp.h> using namespace Rcpp;
// [[Rcpp::export]]

List pack_boxes(int n, NumericVector p)  #p物品的概率

{

Function sample = Environment("package:base")["sample"];  #sample()函数

// Sample item weights.

int w = p.size();    #w=p.size()推理

IntegerVector item = sample(w, n, true, p); #item变量是sample()的值,重量向量
// Initialize loop variables.

IntegerVector weight(n);    #weight[]数组,重量向量

weight[0] = item[0];
IntegerVector first(n);   #重量向量,first[]

first[0] = 1;

int n_boxes = 1;  #包装箱数量
// ...

We don’t know how long first needs to be,but we can ensure it’s long enough by making it length n,as above. Alter natively, if we were concerned about memory usage, we could’ve used a data structure from C++’s standard template library and converted to a correctly-sized IntegerVector at the end of the simulation with Rcpp’s wrap routine.

我们不知道第一个箱子需要多久的时间,但是我们能根据长度n确保足够的程序运行时间。如果我们关注内存的使用率,我们使用了c++标准模板函数库中的一个数据结构,只能在模拟的最后用Rcpp 包装例程进行格式转换,将此数据结构转换为正确长度的整数向量。?

The core of our simulation is a for loop. Unlike R, where for loops be avoided in favor of vectorized code, there’s no penalty for using for loops in C++.

我们的模拟程序的核心是一个for 循环。与R语言不同,for循环避免用向量计算编码,在C++中使用for循环并不悔带来任何坏处。

for (int i = 1; i < n; i++)

{

int new_weight = weight[i - 1] + item[i];
if (new_weight <= w) {

    // Continue with current

    box. weight[i] = new_weight;

  } else  {

   // Start a new box.

    weight[i] = item[i];

   first[n_boxes++] = i + 1;

   }
}
// ...


这篇关于Rcpp的开始Getting Started with Rcpp Nick Ulle的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/206118

相关文章

Ubuntu 18启动失败 Started Hold until boot procss finishes up

原因: 启动ubuntu 的时候,磁盘空间不够了。 解决方法: 启动Ubuntu 的时候,选择Advanced options for Ubuntu 然后选择recovery 之后选择clean 清理之后,就可以打开了。

启动Eclipset提示: java was started but returned exit code=13

启动Eclipset提示: java was started but returned exit code=13 今天启动Eclipse时打不开,提示信息如下:  【解决办法】 这种情况一般是JDK版本和Eclipse版本不一致造成的,例如JDK是32位,Eclipse是64位。 卸载掉32位的JDK重新安装64的JDK即可。

Getting RateLimitError while implementing openai GPT with Python

题意:“在使用 Python 实现 OpenAI GPT 时遇到 RateLimitError 错误。” 问题背景: I have started to implement openai gpt model in python. I have to send a single request in which I am getting RateLimitError. “我开始在 Py

Tutorial : Getting Started with Kubernetes on your Windows Laptop with Minikube

https://rominirani.com/tutorial-getting-started-with-kubernetes-on-your-windows-laptop-with-minikube-3269b54a226#.d9lmuvzf2 本文的注意事项: 1, 截止到2017.01.20, window上的kubernetes依然是实验性的, 存在各种不可预知的bug

adb server version (31) doesn't match this client (40); killing... daemon started successfully

adb多个版本导致引发的问题 使用adb connect ip 连接局域网的手机的时候,总是报faile to connect ip ? 以前都是通过局域网wifi 连接手机,调试。但是最近一段时间总出现faile to connect xxxx.各种百度和 google 都么有找到解决方法。 然而,功夫不负有心人,在今天领导让调试创维的盒子的时候,需要使用到adb命令,使用adb GUI 可视

使用Rcpp提高性能之入门篇

C++能解决的瓶颈问题有: 由于迭代依赖于之前结果,循环难以简便的向量化运算递归函数,或者是需要对同一个函数运算成千上万次R语言缺少一些高级数据结构和算法 我们只需要在代码中写一部分C++代码来就可以处理上面这些问题。后续操作在Windows下进行,你需要安装Rtools,用install.packages("Rcpp")安装新版的Rcpp,最重要一点,你需要保证你R语言时不能是C:/Progr

「R高级」Rcpp学习笔记之数据结构

在使用R语言多年以后,我终于开始去学习Rcpp,利用C++来提高运行速度。其实当你能熟练的使用一门语言后,再去学一门新的语言,并没有想象中的那么难,更何况Rcpp把很多脏活累活都给包办了,在里面调用C++还是挺方便。 C++是一门静态编译面向对象的编程语言,R是动态解释性面向对象语言,那么有一个不同就在于,你需要先声明一个变量,才能调用该变量。而在声明变量的时候,你就会遇到一个R语言中不怎

Rcpp学习笔记之数据结构

在使用R语言多年以后,我终于开始去学习Rcpp,利用C 来提高运行速度。其实当你能熟练的使用一门语言后,再去学一门新的语言,并没有想象中的那么难,更何况Rcpp把很多脏活累活都给包办了,在里面调用C 还是挺方便。 C 是一门静态编译面向对象的编程语言,R是动态解释性面向对象语言,那么有一个不同就在于,你需要先声明一个变量,才能调用该变量。而在声明变量的时候,你就会遇到一个R语言中不怎么思考

zabbix出现active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to

出现active check configuration update from [127.0.0.1:10051] started to fail (cannot connect to [[127.0.0.1]:10051]: [111] Connection refused),直接编辑zabbix_agentd.conf(vi /usr/local/zabbix/etc/zabbix_agen

【Hadoop】Flume NG Getting Started(Flume NG 新手入门指南)翻译

新手入门 Flume NG是什么? 有什么改变? 获得Flume NG 从源码构建 配置 flume-ng全局选项flume-ng agent选项flume-ng avro-client 选项 提供反馈 Flume NG是什么? Flume NG的目标是比Flume OG在简单性,大小和容易部署上有显著性地提高。为了实现这个目标,Flume NG将不会兼容Flume OG.我们目