Rcpp-将sregex_token_iterator的结果捕获到矢量

Mar*_*ark 3 c++ boost r rcpp

我是R用户,正在学习c ++以在Rcpp中发挥作用。最近,我strsplit使用Rcpp 编写了R的替代方法,string.h但它不是基于正则表达式的(afaik)。我一直在阅读有关Boost的内容,并找到了sregex_token_iterator。

以下网站有一个示例:

std::string input("This is his face");
sregex re = sregex::compile(" "); // find white space

// iterate over all non-white space in the input. Note the -1 below:
sregex_token_iterator begin( input.begin(), input.end(), re, -1 ), end;

// write all the words to std::cout
std::ostream_iterator< std::string > out_iter( std::cout, "\n" );
std::copy( begin, end, out_iter );
Run Code Online (Sandbox Code Playgroud)

我的rcpp函数运行正常:

#include <Rcpp.h>
#include <boost/xpressive/xpressive.hpp>
using namespace Rcpp;

// [[Rcpp::export]]
StringVector testMe(std::string input,std::string uregex) {
  boost::xpressive::sregex re = boost::xpressive::sregex::compile(uregex); // find a date

  // iterate over the days, months and years in the input
  boost::xpressive::sregex_token_iterator begin( input.begin(), input.end(), re ,-1), end;

  // write all the words to std::cout
  std::ostream_iterator< std::string > out_iter( std::cout, "\n" );
  std::copy( begin, end, out_iter );
  return("Done");
}

/*** R
testMe("This is a funny sentence"," ")
*/
Run Code Online (Sandbox Code Playgroud)

但是它所做的只是打印出令牌。我是很新的C ++但我明白在作出载体的理念rcppStringVector res(10);(使长度为10的名为RES的向量),我可以那么指数res[1] = "blah"

我的问题是-如何获取输出boost::xpressive::sregex_token_iterator begin( input.begin(), input.end(), re ,-1), end;并将其存储在向量中以便返回?

http://www.boost.org/doc/libs/1_54_0/doc/html/xpressive/user_s_guide.html#boost_xpressive.user_s_guide.string_splitting_and_tokenization


最终的Rcpp解决方案

之所以要这样做,是因为我的需求是Rcpp特有的,因此我不得不对所提供的解决方案进行一些小的更改。

#include <Rcpp.h>
#include <boost/xpressive/xpressive.hpp>

typedef std::vector<std::string> StringVector; 
using boost::xpressive::sregex; 
using boost::xpressive::sregex_token_iterator;
using Rcpp::List;

void tokenWorker(/*in*/      const std::string& input, 
                 /*in*/      const sregex re,
                 /*inout*/   StringVector& v) 
{
  sregex_token_iterator begin( input.begin(), input.end(), re ,-1), end;

  // write all the words to v
  std::copy(begin, end, std::back_inserter(v));
}

//[[Rcpp::export]]
List tokenize(StringVector t, std::string tok = " "){
  List final_res(t.size());
  sregex re = sregex::compile(tok); 
  for(int z=0;z<t.size();z++){

    std::string x = "";

    for(int y=0;y<t[z].size();y++){
      x += t[z][y];
    }

    StringVector v;
    tokenWorker(x, re, v);
    final_res[z] = v;
  }
  return(final_res);
}

/*** R
tokenize("Please tokenize this sentence")
*/
Run Code Online (Sandbox Code Playgroud)

dec*_*uto 5

我的问题是-如何获取boost :: xpressive :: sregex_token_iterator的输出begin(input.begin(),input.end(),re,-1),end; 并将其存储在向量中以便我可以退回它?

您已经中途了。

缺少的链接只是 std::back_inserter

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>
#include <boost/xpressive/xpressive.hpp>

typedef std::vector<std::string> StringVector; 
using boost::xpressive::sregex; 
using boost::xpressive::sregex_token_iterator; 


void testMe(/*in*/      const std::string& input, 
            /*in*/      const std::string& uregex,
            /*inout*/   StringVector& v) 
{
    sregex re = sregex::compile(uregex); 

    sregex_token_iterator begin( input.begin(), input.end(), re ,-1), end;

    // write all the words to v
    std::copy(begin, end, std::back_inserter(v));
}

int main() 
{

    std::string input("This is his face");
    std::string blank(" ");
    StringVector v;
     // find white space
    testMe(input, blank, v);

    std::copy(v.begin(), v.end(), 
              std::ostream_iterator<std::string>(std::cout, "|"));

    std::cout << std::endl;
    return 0;
}
Run Code Online (Sandbox Code Playgroud)

输出:

This|is|his|face|
Run Code Online (Sandbox Code Playgroud)

我使用旧版C ++,因为您使用的是来自boost的regex lib而不是std <regex>;也许当您现在学习c ++时,最好从一开始就考虑一下C ++ 14。C ++ 14甚至可以缩短这个小片段,并使其更具表现力。