dav*_*mcd 5 ruby csv parsing ruby-on-rails
我有一个".CSV"文件,我正在尝试使用CSV
ruby 进行解析.该文件有两行标题,我以前从未遇到过这种情况,也不知道如何处理它.下面是标题和行的示例.
第2行
"Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name","Rushing","","","","","Passing","","","","","","Total Off.","","Receiving","","","Pass Int","","","Fumble Ret","","","Punting","","Punt Ret","","","KO Ret","","","Total TD","Off xpts","","","","Def xpts","","","","FG","","Saf","Points"
Run Code Online (Sandbox Code Playgroud)
第2行
"","","","","","","Rushes","Gain","Loss","Net","TD","Att","Cmp","Int","Yards","TD","Conv","Plays","Yards","No.","Yards","TD","No.","Yards","TD","No.","Yards","TD","No.","Yards","No.","Yards","TD","No.","Yards","TD","","Kicks Att","Kicks Made","R/P Att","R/P Made","Kicks Att","Kicks Made","Int/Fum Att","Int/Fum Made","Att","Made"
第3行
"721","AirForce","09/01/12","19","BASKA","DAVID","","","","","","","","","","","","0","0","","","","","","","","","","2","85","","","","","","","","","","","","","","","","","","","0"
上面的示例中没有返回我刚添加它们以便更容易阅读.是否CSV
有可用于处理此结构的方法,或者我是否必须编写自己的方法来处理此问题?谢谢!
看起来您的CSV文件是从Excel电子表格生成的,该电子表格的列分组如下:
... | Rushing | Passing | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...
Run Code Online (Sandbox Code Playgroud)
(不确定我是否正确恢复了组.)
没有标准工具可以使用这种类型的CSV文件AFAIK.你必须手动完成这项工作.
我建议使用smarter_csv
gem,并手动提供正确的标题:
require 'smarter_csv'
options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ],
:headers_in_file => false}
data = SmarterCSV.process(filename, options)
data.pop # to ignore the first header line
data.pop # to ignore the second header line
# data now contains an array of hashes with your data
Run Code Online (Sandbox Code Playgroud)
请查看 GitHub 页面以获取选项和示例。 https://github.com/tilo/smarter_csv
您应该使用的一个选项是:user_provided_headers
,然后只需在数组中指定您想要的标题。这样你就可以解决这样的情况。
您必须data.pop
忽略文件中的标题行。
您必须编写自己的逻辑。CSV 实际上只是行和列,它本身并没有固有的概念来了解每列或行到底是什么,它只是原始数据。因此,CSV 没有概念或意识它有两个标题行,这是人类的事情,因此您需要构建自己的启发式方法。
鉴于您的数据行如下所示:
"721","Air Force","09/01/12",
Run Code Online (Sandbox Code Playgroud)
当您开始解析数据时,如果第一列代表一个整数,那么,如果您将其转换为 int,并且如果它> 0
比您知道您正在处理有效的“行”而不是标题。