xor*_*eed 5 sed awk text-processing csv
我有很多.csv
包含客户信息的文件。FIRSTNAME
在所有这些文件中,我想在该列旁边添加一个附加列FULLNAME
。名字可以通过抓取第一个单词来生成FULLNAME
。
没有像让·保罗这样只有两个字的名字。在最后一列中,字段文本中使用了逗号
输入
COMPANY,FULLNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Run Code Online (Sandbox Code Playgroud)
预期产出
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,firstname.infix.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix infix2 Lastname,Firstname,firstname.infix12.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Run Code Online (Sandbox Code Playgroud)
如何使用 awk、sed 或其他东西来做到这一点?
使用支持 CSV 的实用程序Miller ( mlr
):
mlr --csv \
put '$FIRSTNAME = sub($FULLNAME," .*","")' then \
reorder -f COMPANY,FULLNAME,FIRSTNAME file
Run Code Online (Sandbox Code Playgroud)
...鉴于问题中的数据,结果是
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Run Code Online (Sandbox Code Playgroud)
Miller 的这种使用首先FIRSTNAME
通过基于正则表达式的替换创建一个新字段 ,该替换会删除该FULLNAME
字段中第一个空格字符之后的所有内容。
由于新字段最后呈现,因此这些字段将被重新排序,以确保前几个字段按此顺序为COMPANY
、FULLNAME
、 和。FIRSTNAME
其余字段保留其原始顺序。
您可以使用with 的函数来代替put
表达式 using ,以空格分割字段的值并选出第一个生成的字符串:sub()
put
splitnv()
FIRSTNAME
COMPANY,FULLNAME,FIRSTNAME,EMAIL,FUNCTION,CITY,INDUSTRY,COMMENT
Company name,Firstname Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Company name,Firstname infix Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, workhome, work"
Company name,Firstname infix infix2 Lastname,Firstname,firstname.lastname@example.com,Marketing Manager,New York,Health Care,"home, work"
Run Code Online (Sandbox Code Playgroud)
为了更漂亮的输出:
mlr --csv \
put '$FIRSTNAME = splitnv($FULLNAME," ")[1]' then \
reorder -f COMPANY,FULLNAME,FIRSTNAME file
Run Code Online (Sandbox Code Playgroud)