s20*_*016 -2 perl awk filemaker
我有2个数据表,如图所示(它们是2 x制表符分隔文件).我正在尝试使用Table-1中的相应国家填充Table-2 Country列.需要从表2的名字字段中的信息"加入".
考虑到Table-2,Firstname列中数据的复杂性,这里最好的方法是什么?其他Mac工具是否比AWK更好用,例如Excel公式,Perl,Filemaker等?
表1(输入):
city_ascii country iso2
Mavinga Angola AO
Menongue Angola AO
Mucusso Angola AO
Guines Cuba CU
Havana Cuba CU
Holguin Cuba CU
Las Tunas Cuba CU
Manzanillo Cuba CU
Matanzas Cuba CU
Moron Cuba CU
Santa Clara Cuba CU
Varadero Cuba CU
Run Code Online (Sandbox Code Playgroud)
表2(输入):
Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)
Run Code Online (Sandbox Code Playgroud)
表2(结果):
Firstname Country
Fred, Havana Cuba
James, (Varadero, Cuba) Cuba
Jack (Cuba) Cuba
Harry Varadero, Cuba Cuba
Josh Cuba Cuba
Gary, Mavinga & Other, Angola Angola
Jamie, (Angola) Angola
Run Code Online (Sandbox Code Playgroud)
============以下是回答Ed's Qs的调试信息:
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table3.txt | cat -v
1<city_ascii country iso2><><>
1<Mavinga Angola AO><><>
1<Menongue Angola AO><><>
1<Mucusso Angola AO><><>
1<Guines Cuba CU><><>
1<Havana Cuba CU><><>
1<Holguin Cuba CU><><>
1<Las Tunas Cuba CU><><>
1<Manzanillo Cuba CU><><>
1<Matanzas Cuba CU><><>
1<Moron Cuba CU><><>
1<Santa Clara Cuba CU><><>
1<Varadero Cuba CU><><>
==============
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' Table4.txt | cat -v
1<Firstname><><>
1<Fred, Havana><><>
1<James, (Varadero, Cuba)><><>
1<Jack (Cuba)><><>
1<Harry Varadero, Cuba><><>
1<Josh Cuba><><>
1<Gary, Mavinga & Other, Angola><><>
1<Jamie, (Angola)><><>
===============
cat -v tst.awk
BEGIN { FS=OFS="\t" }
NR==FNR {
map[$1] = $2
map[$2] = $2
next
}
FNR==1 {
print
FS=" "
next
}
{
orig = $0
country = ""
gsub(/[^[:alpha:]]/," ")
for (i=NF; i>0; i--) {
if ($i in map) {
country = map[$i]
break
}
}
print orig, country
}
===============
awk -f tst.awk Table3.txt Table4.txt >output.txt
Firstname
Fred, Havana
James, (Varadero, Cuba)
Jack (Cuba)
Harry Varadero, Cuba
Josh Cuba
Gary, Mavinga & Other, Angola
Jamie, (Angola)
================
awk -F'\t' '{print NF"<"$1"><"$2"><"$3">"}' output.txt | cat -v
1<Firstname><><>
2<Fred, Havana><><>
2<James, (Varadero, Cuba)><><>
2<Jack (Cuba)><><>
2<Harry Varadero, Cuba><><>
2<Josh Cuba><><>
2<Gary, Mavinga & Other, Angola><><>
2<Jamie, (Angola)><><>
Run Code Online (Sandbox Code Playgroud)
use DBI qw();
require DBD::CSV;
use List::Util 1.45 qw(uniq);
chdir '/tmp'; # location of csv files
my $dbh = DBI->connect("dbi:CSV:", undef, undef, {
f_ext => '.csv',
csv_sep_char => "\t",
RaiseError => 1,
}) or die "Cannot connect: $DBI::errstr";
for my $country (
uniq map { $_->[0] }
# sql distinct not implemented
$dbh->selectall_array('select country from table1')
) {
$dbh->do(
'update table2 set Country = ? where Firstname like ' .
$dbh->quote("%$country%"),
{},
$country
);
}
Run Code Online (Sandbox Code Playgroud)