我想使用一些 sed 命令修改 csv 文件,但仅限于与某些正则表达式匹配的行。
我有一个在脚本中运行良好的 grep 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
Run Code Online (Sandbox Code Playgroud)
以及所需的正确工作的 sed 命令:
#!/usr/bin/bash
sed -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+/\1/g
' happiness.csv
Run Code Online (Sandbox Code Playgroud)
当我将它们组合在一个脚本中时,grep 命令被省略,脚本仅运行 sed 命令:
#!/usr/bin/bash
egrep '^[A-Z][a-z]*,2018' happiness.csv
sed -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+/\1/g
' happiness.csv
Run Code Online (Sandbox Code Playgroud)
样本数据:
Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,
Run Code Online (Sandbox Code Playgroud)
期望的输出:
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014"
AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL, ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,
Run Code Online (Sandbox Code Playgroud)
您可以使用相同的正则表达式进行搜索egrep,并确保对所有替换命令进行分组:
sed -nE '1p; /^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]+/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+/\1/gp
}' happiness.csv
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL
Run Code Online (Sandbox Code Playgroud)