moo*_*emu 1 python bash perl awk
我有一个像这样的csv文件:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97
Run Code Online (Sandbox Code Playgroud)
前5列始终相同,后5列始终采用不同的顺序.我需要保持前5列相同,并重新排序最后5列,始终按以下顺序阅读理解,句子技巧,算术,大学水平数学,初等代数
如果其中一个字符串不存在,请添加逗号
所以最终结果如下:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53
Run Code Online (Sandbox Code Playgroud)
如果他们总是以相同的顺序,我可以做这样的事情:
awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'
Run Code Online (Sandbox Code Playgroud)
如果他们总是至少在同一列,我可以做一个
awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}
Run Code Online (Sandbox Code Playgroud)
但是每一行都有不同的顺序,并且在末尾有一个数字变量,因为它引发了我的循环.
我想用AWK做这件事,但我现在对任何事情持开放态度.
从逻辑上讲,我认为我需要做类似的事情:j =读*,i =句子*,k =算术*,l =学院*,m =初级*
然后awk {打印$ 6j,$ 7i,$ 8k,$ 9l,$ 10m}
但我的谷歌搜索已经返回了犯罪结果.所以即使评论是在这里看或寻找这个或查看这个答案...将不胜感激
注意:我尽力确保输入和输出正确.我已经发布了另一个与此类似的问题,但那时列始终处于相同的顺序.所以这是一个不同的要求.
这是一个用python编写的简单干净的解决方案.您必须替换input.csv和output.csv使用CSV文件.
import csv
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
with open('output.csv', 'wb') as outfile, \
open('input.csv', 'rb') as infile:
writer = csv.writer(outfile)
reader = csv.reader(infile)
for row in reader:
head = row[:5]
tail = []
for label in labels:
tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
writer.writerow(head + tail)
Run Code Online (Sandbox Code Playgroud)
这是另一个更短的解决方案,它使用管道:
#!/usr/bin/python
from sys import stdin, stdout
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
for line in stdin:
values = line.strip().split(',')
stdout.write(','.join(values[:5]))
for label in labels:
stdout.write(',')
stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
stdout.write('\n')
stdout.flush()
Run Code Online (Sandbox Code Playgroud)
如果将此代码保存在文件中,例如调用reorder,并使该文件可执行,则可以重新格式化CSV文件,如下所示:
$ cat input.csv | ./reorder
Run Code Online (Sandbox Code Playgroud)
然后将重新格式化的csv内容写入标准输出.