我正在尝试从 person.csv(如下)中删除行,条件是该人不是在过去 1 年出生的:
数据集1:
"Index","User Id","First Name","Last Name","Date of birth","Job Title"
"1","9E39Bfc4fdcc44e","new, Diamond","Dudley","06 Dec 1945","Photographer"
"3","32C079F2Bad7e6F","Ethan","Hanson","08 Mar 2014","Actuary"
"2","aaaaaaa, bbbbbb","Grace","Huerta","21 Jan 2023","Visual merchandiser"
Run Code Online (Sandbox Code Playgroud)
因此,预期的输出如下所示(最后一行在不到一年的时间内被删除):
"Index","User Id","First Name","Last Name","Date of birth","Job Title"
"1","9E39Bfc4fdcc44e","new, Diamond","Dudley","06 Dec 1945","Photographer"
"3","32C079F2Bad7e6F","Ethan","Hanson","08 Mar 2014","Actuary"
Run Code Online (Sandbox Code Playgroud)
我尝试使用 awk 来执行以下操作:
awk -F , '{print $5 ....}' person.csv > output.csv
Run Code Online (Sandbox Code Playgroud)
但是,无法弄清楚如何将每行日期与(今天减去 1 年)进行比较。
Dataset2:有时双引号字段内可能有双引号,例如(line1 field4):
"Index","User Id","First Name","Last Name","Date of birth","Job Title"
"1","9E39Bfc4fdcc44e","new, Diamond","Dudley (aka "dud")","03 Oct 2023","Photographer"
"3","32C079F2Bad7e6F","Ethan","Hanson","03 Dec 2022","Actuary"
"2","aaaaaaa, bbbbbb","Grace","Huerta","21 Jan 2023","Visual merchandiser"
Run Code Online (Sandbox Code Playgroud)
如果“sed”可以做到这一点,我也持开放态度。请任何帮助,谢谢!