af.*_*.bj 0 awk text-processing csv
我正在研究自动化一些流程/计算,但我可能首先需要格式化一个有点尴尬的CSV文件集。(为此,我bash根据要求使用了 )。
csv 文件集遵循(大致)以下格式
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
,Second Sitting,,,,,,,,1
RTHM_A8,First Sitting,,,1,,3,,,6
,Second Sitting,,,,,1,,,1
FFBJ_FA9,First Sitting,,,,8,6,,,25
,Second Sitting,,,,,11,,,12
UUYIOR_HJ9,First Sitting,,,1,3,6,,,17
IKRO_Lk8,First Sitting,,,,3,3,,,37
,Second Sitting,,,,6,11,,,34
Run Code Online (Sandbox Code Playgroud)
我试图CODE用上一行的字段内容填充列中的空字段(通常这些空字段出现在第 2 列中的“第二次坐”实例旁边)。所以,对于上面的例子,结果应该是
CODE,Sitting,Jan,Feb,Mar,Apr,May,Jun,Jul,Totals
CLLK_J9,First Sitting,,,2,5,2,,,10
CLLK_J9,Second Sitting,,,,,,,,1
etc.
Run Code Online (Sandbox Code Playgroud)
我开始阅读一些awk文档,因为它似乎是完成这项任务的相当强大的实用程序 - 但还没有取得任何进展。想法?
塔
使用 Miller ( https://github.com/johnkerl/miller ) 非常简单。跑步
mlr --csv fill-down -f CODE input.csv >output.csv
Run Code Online (Sandbox Code Playgroud)
你将会有
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
| CODE | Sitting | Jan | Feb | Mar | Apr | May | Jun | Jul | Totals |
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
| CLLK_J9 | First Sitting | - | - | 2 | 5 | 2 | - | - | 10 |
| CLLK_J9 | Second Sitting | - | - | - | - | - | - | - | 1 |
| RTHM_A8 | First Sitting | - | - | 1 | - | 3 | - | - | 6 |
| RTHM_A8 | Second Sitting | - | - | - | - | 1 | - | - | 1 |
| FFBJ_FA9 | First Sitting | - | - | - | 8 | 6 | - | - | 25 |
| FFBJ_FA9 | Second Sitting | - | - | - | - | 11 | - | - | 12 |
| UUYIOR_HJ9 | First Sitting | - | - | 1 | 3 | 6 | - | - | 17 |
| IKRO_Lk8 | First Sitting | - | - | - | 3 | 3 | - | - | 37 |
| IKRO_Lk8 | Second Sitting | - | - | - | 6 | 11 | - | - | 34 |
+------------+----------------+-----+-----+-----+-----+-----+-----+-----+--------+
Run Code Online (Sandbox Code Playgroud)