使用awk按名称拾取CSV字段

mer*_*011 5 csv awk gawk

假设我有一个CSV文件,其中包含以下格式的标题:

Field1,Field2
3,262000
4,449000
5,650000
6,853000
7,1061000
8,1263000
9,1473000
10,1683000
11,1893000
Run Code Online (Sandbox Code Playgroud)

我想写一个awk脚本,它将以逗号分隔的字段名列表target,将其拆分为一个数组,然后只选择那些具有我指定名称的列.

这是我到目前为止所尝试的,并且我已经验证了head数组包含所需的头,并且该targets数组包含由给定命令行传入的所需目标.

BEGIN{
    FS=","
    split(target, targets, ",")

}

NR==1 {
    for (i = 1; i <= NF; i++) head[i] = $i
}

NR !=1{
    for (i = 1; i <= NF; i++) {
        if (head[i] in targets){
            print $i
        }
    }
}
Run Code Online (Sandbox Code Playgroud)

当我使用该命令调用此脚本时

awk -v target = Field1 -f GetCol.awk Debug.csv

我没有打印出来.

mer*_*011 10

我想出来并发布答案以防其他人遇到同样的问题.

它与in我用于测试数组成员资格的关键字有关.此关键字仅测试左侧的操作数是否是右侧数组中的索引之一,而不是值的值.修复是创建反向查找数组,如下所示.

BEGIN{
    OFS=FS=","
    split(target, t_targets, ",")
    for (i in t_targets)
        targets[t_targets[i]] = i
}
Run Code Online (Sandbox Code Playgroud)


Chr*_*our 6

我的两分钱:

BEGIN{
    OFS=FS=","
    split(target,fields,FS)            # We just set FS don't hard the comma here
    for (i in fields)                  # Distinct var name to aviod headaches
        field_idx[fields[i]] = i       # Reverse lookup 
}
NR==1 {                                # Process header
    for (i=1;i<=NF;i++)                # For each field header
        head[i] = $i                   # Add to hash for comparision with target
    next                               # Skip to next line
}
{                                      # Don't need invert condition (used next)
    sep=""                             # Set for leading separator
    for (i=1;i<=NF;i++)                # For each field
        if (head[i] in field_idx) {    # Test for current field is a target field
            printf "%s%s",sep,$i       # Print the column if matched 
            sep=OFS                    # Set separator to OFS                  
    }
    printf "\n"                        # Print newline character
}
Run Code Online (Sandbox Code Playgroud)