按字符串变量重新排序列

moo*_*emu 1 python bash perl awk

我有一个像这样的csv文件:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97
Run Code Online (Sandbox Code Playgroud)

前5列始终相同,后5列始终采用不同的顺序.我需要保持前5列相同,并重新排序最后5列,始终按以下顺序阅读理解,句子技巧,算术,大学水平数学,初等代数

如果其中一个字符串不存在,请添加逗号

所以最终结果如下:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53
Run Code Online (Sandbox Code Playgroud)

如果他们总是以相同的顺序,我可以做这样的事情:

awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'
Run Code Online (Sandbox Code Playgroud)

如果他们总是至少在同一列,我可以做一个

awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}
Run Code Online (Sandbox Code Playgroud)

但是每一行都有不同的顺序,并且在末尾有一个数字变量,因为它引发了我的循环.

我想用AWK做这件事,但我现在对任何事情持开放态度.

从逻辑上讲,我认为我需要做类似的事情:j =读*,i =句子*,k =算术*,l =学院*,m =初级*

然后awk {打印$ 6j,$ 7i,$ 8k,$ 9l,$ 10m}

但我的谷歌搜索已经返回了犯罪结果.所以即使评论是在这里看或寻找这个或查看这个答案...将不胜感激

注意:我尽力确保输入和输出正确.我已经发布了另一个与此类似的问题,但那时列始终处于相同的顺序.所以这是一个不同的要求.

mii*_*lek 5

这是一个用python编写的简单干净的解决方案.您必须替换input.csvoutput.csv使用CSV文件.

import csv 

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

with open('output.csv', 'wb') as outfile, \
     open('input.csv', 'rb') as infile:
    writer = csv.writer(outfile)
    reader = csv.reader(infile) 

    for row in reader: 
        head = row[:5]
        tail = []
        for label in labels:
            tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
        writer.writerow(head + tail)
Run Code Online (Sandbox Code Playgroud)

这是另一个更短的解决方案,它使用管道:

#!/usr/bin/python    
from sys import stdin, stdout

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

for line in stdin: 
    values = line.strip().split(',')
    stdout.write(','.join(values[:5]))
    for label in labels:
        stdout.write(',')
        stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
    stdout.write('\n')
stdout.flush()
Run Code Online (Sandbox Code Playgroud)

如果将此代码保存在文件中,例如调用reorder,并使该文件可执行,则可以重新格式化CSV文件,如下所示:

$ cat input.csv | ./reorder
Run Code Online (Sandbox Code Playgroud)

然后将重新格式化的csv内容写入标准输出.