use*_*552 3 unix csv sorting shell
我正在尝试按最后一列对这条线(整个美国的 2010 年人口普查区块人口密度)进行排序。
12001,2,1009,Alachua FL,29.65612,-82.327274,0.0005131,0.013289229,12,902.9869232
Run Code Online (Sandbox Code Playgroud)
censusBlockDensities.csv (从评论移到这里)
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
Run Code Online (Sandbox Code Playgroud)
我假设有一个 unix shell(即 bash)。
阅读 sort 命令的手册页:
man sort
从手册页:
环境指定的语言环境会影响排序顺序。设置 LC_ALL=C 以获取使用本机字节值的传统排序顺序。
export LC_ALL=C
sort -t , -k 10,10 -n censusBlockDensities.csv
标志说明:
-t ,: 指定逗号作为字段分隔符。
-k 10,10: 指定仅对第 10 个字段进行排序 (start,stop)(第一个字段为 1,而不是 0)。
KEYDEF 为 F[.C][OPTS][,F[.C][OPTS]] 开始和停止位置,其中 F 是字段编号,C 是字段中的字符位置;两者都是原点 1,停止位置默认为线的末端。如果 -t 和 -b 均无效,则字段中的字符从前一个空格的开头开始计数。OPTS 是一个或多个单字母排序选项 [bdfgiMhnRrV],它会覆盖该键的全局排序选项。如果没有给出键,则使用整行作为键。
-n:执行数字排序而不是默认的字母数字排序(或者,将“n”添加到-k参数中,如下面的评论中所述)。
censusBlockDensities.csv
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
Run Code Online (Sandbox Code Playgroud)
输出:
17001,1,1020,Adams IL,39.955861,-91.354113,0.19038,0.493081936,2,4.05612100686
17001,1,1031,Adams IL,39.956978,-91.369,0.002268,0.005874093,0,0,22.8543955664
17001,1,1051,Adams IL,39.948201,-91.352052,0.213797,0.553731688,64,115.5794427
17001,1,1041,Adams IL,39.94333,-91.345319,0.000358,0.0009236128,0,0480.4506562
17001,1,1010,Adams IL,39.960197,-91.373363,0.08861,00.037495258,23,613.41090336
Run Code Online (Sandbox Code Playgroud)
编辑:有用的评论表明我的答案有误。您还需要该-n标志来执行数字排序(默认为字母数字)。我已经修改了我的答案以包含它。您还可以通过尝试-r按相反顺序对标志进行排序来验证它是否正常工作。我还在参数中添加了停止字段索引-k 10,如另一篇文章所述。
此外,您应该检查输入文件以确保每行中的字段数相同:
awk '{print gsub(/,/,"")}' censusBlockDensities.csv
9
9
10 <-- the third record has an additional field
9
9
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
348 次 |
| 最近记录: |