Oka*_*ias 3 ruby csv data-integration
我想解析MaxMind GeoIP2数据库的两个CSV文件,根据列进行一些连接并将结果合并到一个输出文件中.
我使用标准的CSV ruby库,它很慢.我认为它试图将所有文件加载到内存中.
block_file = File.read(block_path)
block_csv   = CSV.parse(block_file, :headers => true) 
location_file = File.read(location_path)
location_csv = CSV.parse(location_file, :headers => true)
CSV.open(output_path, "wb",
    :write_headers=> true,
    :headers => ["geoname_id","Y","Z"] ) do |csv|
    block_csv.each do |block_row|
    puts "#{block_row['geoname_id']}"
        location_csv.each do |location_row|
            if (block_row['geoname_id'] === location_row['geoname_id'])
                puts " match :"    
                csv << [block_row['geoname_id'],block_row['Y'],block_row['Z']]
                break location_row
            end
        end
    end
是否有另一个ruby库支持chuncks中的处理?
block_csv是800MB,   location_csv是100MB.
只需使用CSV.open(block_path, 'r', :headers => true).each do |line|而不是File.read和CSV.parse.它会逐行解析文件.
在当前版本中,您明确告诉它读取所有文件,File.read然后将整个文件解析为字符串CSV.parse.所以它完全按照你所说的去做.