使用C++过滤CSV数据

Question

使用C++过滤CSV数据

has*_*shb 6 c++ csv filter

很抱歉提出许多人可能认为已经被问过的问题.

我有一个很长的CSV数据文件(dat.csv),有5列.我有另一个简短的CSV(filter.csv)文件,其中包含1列.

现在,我只需要从dat.csv中提取列,其中column-1与filter.csv的column-1匹配.

我通常会在BASH中使用sed/awk.但是,由于某些其他原因,我需要在C++文件中执行此操作.你能建议一种有效的方法吗？

样本数据:

data.csv

ID,Name,CountryCode,District,Population

3793,NewYork,USA,NewYork,8008278
3794,LosAngeles,USA,California,3694820
3795,Chicago,USA,Illinois,2896016
3796,Houston,USA,Texas,1953631
3797,Philadelphia,USA,Pennsylvania,1517550
3798,Phoenix,USA ,Arizona,1321045
3799,SanDiego,USA,California,1223400
3800,Dallas,USA,Texas,1188580
3801,SanAntonio,USA,Texas,1144646

Run Code Online (Sandbox Code Playgroud)

filter.csv

3793
3797
3798
Run Code Online (Sandbox Code Playgroud)

Answer 1

Ale*_*all 7

这个.csv排序库可能会有所帮助:

http://www.partow.net/programming/dsvfilter/index.html

您可以将两个表的列合并为一个较大的表,然后在新表中查询匹配(表A的第1列是表B的第1列,而第2列是表B).或者该库可能具有比较表的功能.

Answer 2

0x4*_*2D2 1

以下是一些提示：

您从中读取数据的流需要忽略逗号，因此应该使用std::ctype<char>其语言环境中嵌入的方面将逗号字符设置为空格。下面是修改分类表的示例：

struct ctype : std::ctype<char>
{
private:
    static mask* get_table()
    {
        static std::vector<mask> v(classic_table(),
                                   classic_table() + table_size);

        v[','] &= ~space;
        return &v[0];
    }
public:
    ctype() : std::ctype<char>(get_table()) { }
};

Run Code Online (Sandbox Code Playgroud)

读取第一个 csv。按行文件（意思是std::getline()）。提取第一个单词并将其与第二个 .csv 文件中的提取内容进行比较。继续此操作，直到到达第一个文件的末尾：

int main()
{
    std::ifstream in1("test1.csv");
    std::ifstream in2("test2.csv");

    typedef std::istream_iterator<std::string> It;

    in1 >> comma_whitespace;
    in2 >> comma_whitespace;

    std::vector<std::string> in2_content(It(in2), It());
    std::vector<std::string> matches;

    while (std::getline(in1, line))
    {
        std::istringstream iss(line);
        It beg(iss);

        if (std::find(in2_content.begin(),
                      in2_content.end(), *beg) != in2_content.end())
        {
            matches.push_back(line);
        }
    }
}

// After the above, the vector matches should hold all the rows that
// have the same ID number as in the second csv file

Run Code Online (Sandbox Code Playgroud)

comma_whitespace是一个操纵器，它将区域设置更改为ctype上面定义的自定义区域设置。

_{免责声明：我还没有测试过这段代码。}

归档时间：	11 年，8 月前
查看次数：	978 次
最近记录：	9 年，11 月前