Ruby无法解析CSV文件:CSV :: MalformedCSVError(第1行中的非法引用)

Jig*_*hel 22 ruby csv malformed

Ubuntu 12.04 LTS

Ruby ruby​​ 1.9.3dev(2011-09-23修订版33323)[i686-linux]

Rails 3.2.9

以下是我收到的CSV文件的内容:

"date/time","settlement id","type","order id","sku","description","quantity","marketplace","fulfillment","order city","order state","order postal","product sales","shipping credits","gift wrap credits","promotional rebates","sales tax collected","selling fees","fba fees","other transaction fees","other","total"
"Mar 1, 2013 12:03:54 AM PST","5481545091","Order","108-0938567-7009852","ALS2GL36LED","Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor","1","amazon.com","Amazon","Pasadena","CA","91104-1056","43.00","3.25","0","-3.25","0","-6.45","-3.75","0","0","32.80"
Run Code Online (Sandbox Code Playgroud)

但是,当我尝试解析CSV文件时,我收到错误:

1.9.3dev :016 > options = { col_sep: ",", quote_char:'"' }
=> {:col_sep=>",", :quote_char=>"\""} 

1.9.3dev :022 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
CSV::MalformedCSVError: Illegal quoting in line 1.
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
    from (irb):22
    from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
Run Code Online (Sandbox Code Playgroud)

然后我尝试简化数据,即

"name","age","email"
"jignesh","30","jignesh@example.com"
Run Code Online (Sandbox Code Playgroud)

但是我仍然得到同样的错误:

      1.9.3dev :023 > CSV.foreach("/tmp/my_data.csv", options) { |row| puts row }
  CSV::MalformedCSVError: Illegal quoting in line 1.
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1925:in `block (2 levels) in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1887:in `block in shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `loop'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1849:in `shift'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1791:in `each'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1208:in `block in foreach'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1354:in `open'
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/lib/ruby/1.9.1/csv.rb:1207:in `foreach'
      from (irb):23
      from /home/jigneshgohel/.rvm/rubies/ruby-1.9.3-rc1/bin/irb:16:in `<main>'
Run Code Online (Sandbox Code Playgroud)

我再次尝试简化这样的数据:

name,age,email
jignesh,30,jignesh@example.com
Run Code Online (Sandbox Code Playgroud)

它的工作原理.见下面的输出:

  1.9.3dev :024 > CSV.foreach("/tmp/my_data.csv") { |row| puts row }
  name
  age
  email
  jignesh
  30
  jignesh@example.com
   => nil 
Run Code Online (Sandbox Code Playgroud)

但我将收到带有引用数据的CSV文件,因此删除引号解决方案实际上并不是我正在寻找.我无法弄清楚导致错误的原因:CSV :: MalformedCSVError:在我之前的示例中的第1行中的非法引用.

我已经通过在文本编辑器中启用"显示空白字符"和"显示行结尾"来验证在CSV中没有前导/尾随空格.此外,我已使用以下内容验证了编码.

  1.9.3dev :026 > File.open("/tmp/my_data.csv").read.encoding
  => #<Encoding:UTF-8> 
Run Code Online (Sandbox Code Playgroud)

注意:我也尝试使用CSV.read但该方法的错误相同.

任何人都可以帮我解决问题并让我明白哪里出错了?

=====================

我刚刚发现以下帖子:http://www.ruby-forum.com/topic/448070并尝试以下内容:

  file_data = file.read
  file_data.gsub!('"', "'")
  arr_of_arrs = CSV.parse(file_data)

  arr_of_arrs.each do |arr|
    Rails.logger.debug "=======#{arr}"
  end
Run Code Online (Sandbox Code Playgroud)

得到以下输出:

   =======["\xEF\xBB\xBF'date/time'", "'settlement id'", "'type'", "'order id'", "'sku'", "'description'", "'quantity'", "'marketplace'", "'fulfillment'", "'order city'", "'order state'", "'order postal'", "'product sales'", "'shipping credits'", "'gift wrap credits'", "'promotional rebates'", "'sales tax collected'", "'selling fees'", "'fba fees'", "'other transaction fees'", "'other'", "'total'"]
    =======["'Mar 1", " 2013 12:03:54 AM PST'", "'5481545091'", "'Order'", "'108-0938567-7009852'", "'ALS2GL36LED'", "'Solar Two Directional 36 Bright White LED Security Flood Light with Motion Activated Sensor'", "'1'", "'amazon.com'", "'Amazon'", "'Pasadena'", "'CA'", "'91104-1056'", "'43.00'", "'3.25'", "'0'", "'-3.25'", "'0'", "'-6.45'", "'-3.75'", "'0'", "'0'", "'32.80'"]
Run Code Online (Sandbox Code Playgroud)

由于使用的默认col_sep是逗号字符,因此搞乱了正确读取数据.但是我尝试使用这样的quote_char选项:

  arr_of_arrs = CSV.parse(file_data, :quote_char => "'")
Run Code Online (Sandbox Code Playgroud)

但它最终出现以下错误:

   CSV::MalformedCSVError (Illegal quoting in line 1.):
Run Code Online (Sandbox Code Playgroud)

谢谢,Jignesh

Vad*_*rov 23

quote_chars = %w(" | ~ ^ & *)
begin
  @report = CSV.read(csv_file, headers: :first_row, quote_char: quote_chars.shift)
rescue CSV::MalformedCSVError
  quote_chars.empty? ? raise : retry 
end
Run Code Online (Sandbox Code Playgroud)

它并不完美,但大部分时间都可以使用.

NB CSV.parse采用CSV.read与之相同的参数,因此可以使用文件或来自内存的数据


the*_*ide 19

Anand,谢谢你的编码建议.这解决了我的非法引用问题.

注意:如果您希望迭代器跳过标题行添加headers: :first_row,如下所示:

CSV.foreach("test.csv", encoding: "bom|utf-8", headers: :first_row)
Run Code Online (Sandbox Code Playgroud)

  • 谢谢!`encoding: "bom|utf-8"` 解决了我的问题。 (3认同)
  • 对于那些获取“ ArgumentError:未知编码名称-bom | utf-8”且使用ruby 2.4+的用户,请确保将`csv` gem更新到版本3或更高版本(`gem'csv','〜&gt; 3.0'` Gemfile)。 (2认同)

mAr*_*5MB 13

Rails 6 版本,ruby 2.4+

CSV.foreach(file, liberal_parsing: true, headers: :first_row) do |row|
    // do whatever
end
Run Code Online (Sandbox Code Playgroud)

https://ruby-doc.org/stdlib-2.4.0/libdoc/csv/rdoc/CSV.html


小智 12

我刚遇到这样的问题,发现CSV不喜欢col-sep和引号字符之间的空格.一旦我删除那些一切都很顺利.所以我有:

12,  "N",  12, "Pacific/Majuro"
Run Code Online (Sandbox Code Playgroud)

但是一旦我使用了空间

.gsub(/,\s+\"/,',\"')
Run Code Online (Sandbox Code Playgroud)

导致

12,"N",  12,"Pacific/Majuro"
Run Code Online (Sandbox Code Playgroud)

一切都很顺利.


Gil*_*Him 5

这个线程传递选项:quote_char => "|"

CSV.read(filename, :quote_char => "|")

Run Code Online (Sandbox Code Playgroud)


小智 -4

试试这个提示:

  1. 在文本编辑器中打开 CSV 文件
  2. 选择整个文件并复制它
  3. 打开一个新的文本文件
  4. 将 CSV 数据粘贴到新文件中并保存新文件
  5. 导入新的 CSV 文件