Ruby:如何在Ruby中读取包含两个头文件的CSV文件?

dav*_*mcd 5 ruby csv parsing ruby-on-rails

我有一个".CSV"文件,我正在尝试使用CSVruby 进行解析.该文件有两行标题,我以前从未遇到过这种情况,也不知道如何处理它.下面是标题和行的示例.

第2行

"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"
Run Code Online (Sandbox Code Playgroud)

第2行

"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"

第3行

"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"

上面的示例中没有返回我刚添加它们以便更容易阅读.是否CSV有可用于处理此结构的方法,或者我是否必须编写自己的方法来处理此问题?谢谢!

Ser*_*gov 8

看起来您的CSV文件是从Excel电子表格生成的,该电子表格的列分组如下:

... |        Rushing        |         Passing         | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...
Run Code Online (Sandbox Code Playgroud)

(不确定我是否正确恢复了组.)

没有标准工具可以使用这种类型的CSV文件AFAIK.你必须手动完成这项工作.

  • 阅读第一行,将其视为第一个标题行.
  • 阅读第二行,将其视为第二个标题行.
  • 阅读第三行,将其视为第一条数据线.
  • ...


Til*_*ilo 5

我建议使用smarter_csvgem,并手动提供正确的标题:

 require 'smarter_csv'
 options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ], 
            :headers_in_file => false}
 data = SmarterCSV.process(filename, options)
 data.pop # to ignore the first header line
 data.pop # to ignore the second header line
 # data now contains an array of hashes with your data
Run Code Online (Sandbox Code Playgroud)

请查看 GitHub 页面以获取选项和示例。 https://github.com/tilo/smarter_csv

您应该使用的一个选项是:user_provided_headers,然后只需在数组中指定您想要的标题。这样你就可以解决这样的情况。

您必须data.pop忽略文件中的标题行。


Cod*_*lan 3

您必须编写自己的逻辑。CSV 实际上只是行和列,它本身并没有固有的概念来了解每列或行到底是什么,它只是原始数据。因此,CSV 没有概念或意识它有两个标题行,这是人类的事情,因此您需要构建自己的启发式方法。

鉴于您的数据行如下所示:

"721","Air Force","09/01/12",
Run Code Online (Sandbox Code Playgroud)

当您开始解析数据时,如果第一列代表一个整数,那么,如果您将其转换为 int,并且如果它> 0比您知道您正在处理有效的“行”而不是标题。