Tom*_*456 1 regex unix bash shell awk
我想逐行读取数据,而且我发现双引号我想用空格替换新行字符,直到第二个双引号遇到像
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing
Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
Run Code Online (Sandbox Code Playgroud)
就像上面的数据第二行一样,因为它在第3行中找到双引号(打开)和关闭双引号所以我们需要将这些行合并为单个空格,如下所示:
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
Run Code Online (Sandbox Code Playgroud)
你可以使用这个gnu-awk单线程:
awk -v RS='"[^"]*"' -v ORS= '{gsub(/\n/, " ", RT); print $0 RT}' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
Run Code Online (Sandbox Code Playgroud)
RS='"[^"]*"' - 输入记录分隔符设置为正则表达式 '"[^"]*"'-v ORS= - 输出记录分隔符设置为空gsub(/\n/, " ", RT) - 在匹配的文本中用空格替换换行符 Input Record Separator这是一个perl单行:
perl -0pe 's/"[^\n"]*"(*SKIP)(*F)|("[^"\n]*)\n([^"]*")/$1 $2/g' file
090033ec82b13639,CPDM Initiated,Logistical,"There corrected.",Gul Y Serbest,Urology
090033ec82ae0c07,Initiated,NA,"To local testing Rohit 3 to 4.",Julienne B Orr,Oncology
090033ec82b35fd0,Externally Initiated,NA,regulatory agency requests,Kenneth A Lord,Oncology
Run Code Online (Sandbox Code Playgroud)