如何使用AWK解决这2个表数据加入?

s20*_*016 -2 perl awk filemaker

我有2个数据表,如图所示(它们是2 x制表符分隔文件).我正在尝试使用Table-1中的相应国家填充Table-2 Country列.需要从表2的名字字段中的信息"加入".

2个输入表和理想的结果

考虑到Table-2,Firstname列中数据的复杂性,这里最好的方法是什么?其他Mac工具是否比AWK更好用,例如Excel公式,Perl,Filemaker等?

表1(输入):

city_ascii  country iso2
Mavinga Angola  AO
Menongue    Angola  AO
Mucusso Angola  AO
Guines  Cuba    CU
Havana  Cuba    CU
Holguin Cuba    CU
Las Tunas   Cuba    CU
Manzanillo  Cuba    CU
Matanzas    Cuba    CU
Moron   Cuba    CU
Santa Clara Cuba    CU
Varadero    Cuba    CU
Run Code Online (Sandbox Code Playgroud)

表2(输入):

Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)
Run Code Online (Sandbox Code Playgroud)

表2(结果):

Firstname   Country
Fred, Havana  Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba) Cuba
Harry Varadero, Cuba    Cuba
Josh Cuba   Cuba
Gary, Mavinga & Other, Angola   Angola
Jamie, (Angola) Angola
Run Code Online (Sandbox Code Playgroud)

============以下是回答Ed's Qs的调试信息:

awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table3.txt | cat -v

    1<city_ascii  country iso2><><>
    1<Mavinga Angola  AO><><>
    1<Menongue    Angola  AO><><>
    1<Mucusso Angola  AO><><>
    1<Guines  Cuba    CU><><>
    1<Havana  Cuba    CU><><>
    1<Holguin Cuba    CU><><>
    1<Las Tunas   Cuba    CU><><>
    1<Manzanillo  Cuba    CU><><>
    1<Matanzas    Cuba    CU><><>
    1<Moron   Cuba    CU><><>
    1<Santa Clara Cuba    CU><><>
    1<Varadero    Cuba    CU><><>

    ==============
    awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table4.txt | cat -v

    1<Firstname><><>
    1<Fred, Havana><><>
    1<James, (Varadero, Cuba)><><>
    1<Jack (Cuba)><><>
    1<Harry Varadero, Cuba><><>
    1<Josh Cuba><><>
    1<Gary, Mavinga & Other, Angola><><>
    1<Jamie, (Angola)><><>

    ===============
    cat -v tst.awk

    BEGIN { FS=OFS="\t" }
    NR==FNR {
        map[$1] = $2
        map[$2] = $2
        next
    }
    FNR==1 {
        print
        FS=" "
        next
    }
    {
        orig = $0
        country = ""
        gsub(/[^[:alpha:]]/," ")
        for (i=NF; i>0; i--) {
            if ($i in map) {
                country = map[$i]
                break
            }
        }
        print orig, country
    }

    ===============
    awk -f tst.awk Table3.txt Table4.txt >output.txt

    Firstname
    Fred, Havana    
    James, (Varadero, Cuba) 
    Jack (Cuba) 
    Harry Varadero, Cuba    
    Josh Cuba   
    Gary, Mavinga & Other, Angola   
    Jamie, (Angola) 

    ================
    awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' output.txt | cat -v

    1<Firstname><><>
    2<Fred, Havana><><>
    2<James, (Varadero, Cuba)><><>
    2<Jack (Cuba)><><>
    2<Harry Varadero, Cuba><><>
    2<Josh Cuba><><>
    2<Gary, Mavinga & Other, Angola><><>
    2<Jamie, (Angola)><><>
Run Code Online (Sandbox Code Playgroud)

dax*_*xim 6

use DBI qw();
require DBD::CSV;
use List::Util 1.45 qw(uniq);

chdir '/tmp'; # location of csv files
my $dbh = DBI->connect("dbi:CSV:", undef, undef, {
    f_ext => '.csv',
    csv_sep_char => "\t",
    RaiseError => 1,
}) or die "Cannot connect: $DBI::errstr";

for my $country (
    uniq map { $_->[0] }
    # sql distinct not implemented
    $dbh->selectall_array('select country from table1')
) {
    $dbh->do(
        'update table2 set Country = ? where Firstname like ' .
            $dbh->quote("%$country%"),
        {},
        $country
    );
}
Run Code Online (Sandbox Code Playgroud)

  • 我已经用List :: Util替换它,你希望它已经安装了. (2认同)
  • @ s2016,DBI :: db是DBI内部用于数据库句柄(`$ dbh`)的命名空间.尝试升级DBI(`cpan DBI`).也许你的版本没有`$ dbh-> selectall_array`? (2认同)