我正在尝试将映射文件读入矩阵.该文件是这样的:
name;phone;city\n
Luigi Rossi;02341567;Milan\n
Mario Bianchi;06567890;Rome\n
....
Run Code Online (Sandbox Code Playgroud)
而且它很安静.我写的代码工作正常,但不是那么快:
#include <iostream>
#include <fstream>
#include <string>
#include <boost/iostreams/device/mapped_file.hpp>
using namespace std;
int main() {
int i;
int j=0;
int k=0;
vector< vector<char> > M(10000000, vector<string>(3));
mapped_file_source file("file.csv");
// Check if file was successfully opened
if(file.is_open()) {
// Get pointer to the data
const char * c = (const char *)file.data();
int size=file.size();
for(i = 0; i < (size+1); i++){
if(c[i]=='\n' || i==size){
j=j+1;
k=0;
}else if(c[i]==';'){
k=k+1;
}else{
M[j][k]+=c[i];
}
}//end for
}//end if
return(0)
}
Run Code Online (Sandbox Code Playgroud)
有更快的方法吗?我读过有关memcyp的内容,但我不知道如何使用它来加速我的代码.
我有很多这样做的例子/类似的SO上写的.
让我列出最相关的:
我做了很多这些基准测试.是的,对于顺序freading,read/scanf有一个微小的边缘(参见例如scanf/iostreams和文件与映射,解析浮点数,或读取稍微快一点的顺序读取).
一个有趣的方法是懒惰地解析(为什么要将整个输入复制到内存中?那么点内存映射是什么).这里的答案显示了这种方法(在那里模拟多图):
在所有其他情况下,考虑在其上抨击Spirit Qi作业,可能使用boost::string_ref而不是vector<char>(除非映射文件不是"const",当然).
这string_ref也显示在之前链接的最后一个答案中.另一个有趣的例子(延迟转换为未转义的字符串值)在这里如何正确解析Boost.Xpressive的胡子?
这是齐工作抨击它:
它将一个994的MiB文件解析为2.9s内约3200万行的文件
struct Line {
boost::string_ref name, city;
long id;
};
Run Code Online (Sandbox Code Playgroud)请注意我们解析数字,并通过引用它们在内存映射中的位置来存储字符串+ length(string_ref)
string_ref是16个字节.#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/iostreams/device/mapped_file.hpp>
#include <boost/utility/string_ref.hpp>
namespace qi = boost::spirit::qi;
using sref = boost::string_ref;
namespace boost { namespace spirit { namespace traits {
template <typename It>
struct assign_to_attribute_from_iterators<sref, It, void> {
static void call(It f, It l, sref& attr) { attr = { f, size_t(std::distance(f,l)) }; }
};
} } }
struct Line {
sref name, city;
long id;
};
BOOST_FUSION_ADAPT_STRUCT(Line, (sref,name)(long,id)(sref,city))
int main() {
boost::iostreams::mapped_file_source mmap("input.txt");
using namespace qi;
std::vector<Line> parsed;
parsed.reserve(32000000);
if (phrase_parse(mmap.begin(), mmap.end(),
omit[+graph] >> eol >>
(raw[*~char_(";\r\n")] >> ';' >> long_ >> ';' >> raw[*~char_(";\r\n")]) % eol,
qi::blank, parsed))
{
std::cout << "Parsed " << parsed.size() << " lines\n";
} else {
std::cout << "Failed after " << parsed.size() << " lines\n";
}
std::cout << "Printing 10 random items:\n";
for(int i=0; i<10; ++i) {
auto& line = parsed[rand() % parsed.size()];
std::cout << "city: '" << line.city << "', id: " << line.id << ", name: '" << line.name << "'\n";
}
}
Run Code Online (Sandbox Code Playgroud)
输入生成的像
do grep -v "'" /etc/dictionaries-common/words | sort -R | xargs -d\\n -n 3 | while read a b c; do echo "$a $b;$RANDOM;$c"; done
Run Code Online (Sandbox Code Playgroud)
输出是例如
Parsed 31609499 lines
Printing 10 random items:
city: 'opted', id: 14614, name: 'baronets theosophy'
city: 'denominated', id: 24260, name: 'insignia ophthalmic'
city: 'mademoiselles', id: 10791, name: 'smelter orienting'
city: 'ducked', id: 32155, name: 'encircled flippantly'
city: 'garotte', id: 3080, name: 'keeling South'
city: 'emirs', id: 14511, name: 'Aztecs vindicators'
city: 'characteristically', id: 5473, name: 'constancy Troy'
city: 'savvy', id: 3921, name: 'deafer terrifically'
city: 'misfitted', id: 14617, name: 'Eliot chambray'
city: 'faceless', id: 24481, name: 'shade forwent'
Run Code Online (Sandbox Code Playgroud)