当线路被独立处理时,如何并行输入文件中的读取线？

Question

当线路被独立处理时,如何并行输入文件中的读取线？

Leg*_*end 6 c++ parallel-processing openmp

我刚开始使用C++进行OpenMP.我在C++中的序列代码如下所示:

#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    string line;
    std::ifstream inputfile(argv[1]);

    if(inputfile.is_open()) {
        while(getline(inputfile, line)) {
            // Line gets processed and written into an output file
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

因为每条线都是独立处理的,所以我试图使用OpenMP来并行化,因为输入文件的大小是千兆字节.所以我猜我首先需要获取输入文件中的行数,然后以这种方式并行化代码.有人可以帮帮我吗？

#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
#include <stdlib.h>

#ifdef _OPENMP
#include <omp.h>
#endif

int main(int argc, char* argv[]) {
    string line;
    std::ifstream inputfile(argv[1]);

    if(inputfile.is_open()) {
        //Calculate number of lines in file?
        //Set an output filename and open an ofstream
        #pragma omp parallel num_threads(8)
        {
            #pragma omp for schedule(dynamic, 1000)
            for(int i = 0; i < lines_in_file; i++) {
                 //What do I do here? I cannot just read any line because it requires random access
            }
        }
    }
}

Run Code Online (Sandbox Code Playgroud)

编辑:

重要的事情

每条线都是独立处理的
结果顺序无关紧要

Answer 1

Nik*_*sov 2

不是直接的 OpenMP 答案 - 但您可能正在寻找的是Map/Reduce方法。看一下Hadoop - 它是用 Java 完成的，但至少有一些 C++ API。

一般来说，您希望在不同的机器上处理这么多数据，而不是在同一进程的多个线程中处理（虚拟地址空间限制、物理内存不足、交换等）。此外，内核还必须将磁盘文件带入无论如何按顺序（您想要的 - 否则硬盘驱动器将只需要为每个线程进行额外的搜索）。

归档时间：	15 年，5 月前
查看次数：	4241 次
最近记录：	14 年，2 月前