Jig*_*hel 22 ruby csv malformed
Ubuntu 12.04 LTS
Ruby ruby 1.9.3dev(2011-09-23修订版33323)[i686-linux]
Rails 3.2.9
以下是我收到的CSV文件的内容:
"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"
但是,当我尝试解析CSV文件时,我收到错误:
1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 
1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
然后我尝试简化数据,即
"name","age","email"
"jignesh","30","jignesh@example.com"
但是我仍然得到同样的错误:
      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
我再次尝试简化这样的数据:
name,age,email
jignesh,30,jignesh@example.com
它的工作原理.见下面的输出:
  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  jignesh@example.com
   => nil 
但我将收到带有引用数据的CSV文件,因此删除引号解决方案实际上并不是我正在寻找.我无法弄清楚导致错误的原因:CSV :: MalformedCSVError:在我之前的示例中的第1行中的非法引用.
我已经通过在文本编辑器中启用"显示空白字符"和"显示行结尾"来验证在CSV中没有前导/尾随空格.此外,我已使用以下内容验证了编码.
  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 
注意:我也尝试使用CSV.read但该方法的错误相同.
任何人都可以帮我解决问题并让我明白哪里出错了?
=====================
我刚刚发现以下帖子:http://www.ruby-forum.com/topic/448070并尝试以下内容:
  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)
  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end
得到以下输出:
   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]
由于使用的默认col_sep是逗号字符,因此搞乱了正确读取数据.但是我尝试使用这样的quote_char选项:
  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")
但它最终出现以下错误:
   CSV::MalformedCSVError (Illegal quoting in line 1.):
谢谢,Jignesh
Vad*_*rov 23
quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end
它并不完美,但大部分时间都可以使用.
NB CSV.parse采用CSV.read与之相同的参数,因此可以使用文件或来自内存的数据
the*_*ide 19
Anand,谢谢你的编码建议.这解决了我的非法引用问题.
注意:如果您希望迭代器跳过标题行添加headers: :first_row,如下所示:
CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)
mAr*_*5MB 13
Rails 6 版本,ruby 2.4+
CSV.foreach(file, liberal_parsing: true, headers: :first_row) do |row|
    // do whatever
end
https://ruby-doc.org/stdlib-2.4.0/libdoc/csv/rdoc/CSV.html
小智 12
我刚遇到这样的问题,发现CSV不喜欢col-sep和引号字符之间的空格.一旦我删除那些一切都很顺利.所以我有:
12,  "N",  12, "Pacific/Majuro"
但是一旦我使用了空间
.gsub(/,\s+\"/,',\"')
导致
12,"N",  12,"Pacific/Majuro"
一切都很顺利.
从这个线程传递选项:quote_char => "|"
CSV.read(filename, :quote_char => "|")