在awk中,如何在printf中使用包含多个格式字符串的文件？

Question

在awk中,如何在printf中使用包含多个格式字符串的文件？

我有一个案例,我想使用文件中的输入作为printf()awk 格式.当我在代码中的字符串中设置它时,我的格式化工作,但是当我从输入加载它时它不起作用.

这是问题的一个小例子:

$ # putting the format in a variable works just fine:
$ echo "" | awk -vs="hello:\t%s\n\tfoo" '{printf(s "bar\n", "world");}'
hello:  world
        foobar
$ # But getting the format from an input file does not.
$ echo "hello:\t%s\n\tfoo" | awk '{s=$0; printf(s "bar\n", "world");}'
hello:\tworld\n\tfoobar
$

Run Code Online (Sandbox Code Playgroud)

所以...格式替换工作(" %s"),但不是像tab和换行符这样的特殊字符.知道为什么会这样吗？有没有办法"做某事"输入数据,使其可用作格式字符串？

更新#1:

作为进一步的示例,请考虑以下使用bash heretext:

[me@here ~]$ awk -vs="hello: %s\nworld: %s\n" '{printf(s, "foo", "bar");}' <<<""
hello: foo
world: bar
[me@here ~]$ awk '{s=$0; printf(s, "foo", "bar");}' <<<"hello: %s\nworld: %s\n"
hello: foo\nworld: bar\n[me@here ~]$

Run Code Online (Sandbox Code Playgroud)

据我所知,同样的事情发生在多个不同的awk解释器上,我无法找到解释原因的任何文档.

更新#2:

我正在尝试替换的代码当前看起来像这样,在shell中嵌套循环.目前,awk 仅用于它printf,可以替换为基于shell的printf:

#!/bin/sh

while read -r fmtid fmt; do
  while read cid name addy; do
    awk -vfmt="$fmt" -vcid="$cid" -vname="$name" -vaddy="$addy" \
      'BEGIN{printf(fmt,cid,name,addy)}' > /path/$fmtid/$cid
  done < /path/to/sampledata
done < /path/to/fmtstrings

Run Code Online (Sandbox Code Playgroud)

示例输入将是:

## fmtstrings:
1 ID:%04d Name:%s\nAddress: %s\n\n
2 CustomerID:\t%-4d\t\tName: %s\n\t\t\t\tAddress: %s\n
3 Customer: %d / %s (%s)\n

## sampledata:
5 Companyname 123 Somewhere Street
12 Othercompany 234 Elsewhere

Run Code Online (Sandbox Code Playgroud)

我希望我能够通过一次调用awk来构造这样的东西来完成整个事情,而不是在shell中使用嵌套循环:

awk '

  NR==FNR { fmts[$1]=$2; next; }

  {
    for(fmtid in fmts) {
      outputfile=sprintf("/path/%d/%d", fmtid, custid);
      printf(fmts[fmtid], $1, $2) > outputfile;
    }
  }

' /path/to/fmtstrings /path/to/sampledata

Run Code Online (Sandbox Code Playgroud)

显然,这不起作用,因为这个问题的实际主题,因为我还没有弄清楚如何优雅地使awk将$ 2 .. $ n加入单个变量.(但这是未来可能问题的主题.)

FWIW,我使用内置的FreeBSD 9.2,但如果可以找到解决方案,我愿意使用gawk.

Answer 1

Ed *_*ton 4

为什么这个例子如此冗长和复杂？这说明了这个问题：

$ echo "" | awk '{s="a\t%s"; printf s"\n","b"}'
a       b

$ echo "a\t%s" | awk '{s=$0; printf s"\n","b"}'
a\tb

Run Code Online (Sandbox Code Playgroud)

在第一种情况下，字符串“a\t%s”是字符串文字，因此会被解释两次 - 一次是在 awk 读取脚本时解释一次，然后是在执行脚本时解释一次，因此在\t第一次传递时会展开，然后执行时 awk 在格式化字符串中有一个文本制表符。

在第二种情况下，awk 在格式化字符串中仍然具有字符反斜杠和 t - 因此行为不同。

您需要一些东西来解释这些转义字符，一种方法是调用 shell 的 printf 并读取结果（根据 @EtanReiser 的出色观察进行更正，我在应该使用单引号的地方使用了双引号，在这里由 \047 实现，以避免 shell 扩展）：

$ echo 'a\t%s' | awk '{"printf \047" $0 "\047 " "b" | getline s; print s}'
a       b

Run Code Online (Sandbox Code Playgroud)

如果您不需要变量中的结果，则可以直接调用system().

如果您只是想扩展转义字符，这样您就不需要%s在 shellprintf调用中提供参数，您只需要转义所有%s （注意已经转义的%s）。

如果您愿意，可以调用 awk 而不是 shell printf。

请注意，这种方法虽然笨拙，但比调用eval可能只执行像rm -rf /*.*!这样的输入行要安全得多。

在 Arnold Robbins（gawk 的创建者）和 Manuel Collado（另一位著名的 awk 专家）的帮助下，以下脚本将扩展单字符转义序列：

$ cat tst2.awk
function expandEscapes(old,     segs, segNr, escs, idx, new) {
    split(old,segs,/\\./,escs)
    for (segNr=1; segNr in segs; segNr++) {
        if ( idx = index( "abfnrtv", substr(escs[segNr],2,1) ) )
            escs[segNr] = substr("\a\b\f\n\r\t\v", idx, 1)
        new = new segs[segNr] escs[segNr]
    }
    return new
}

{
    s = expandEscapes($0)
    printf s, "foo", "bar"
}

Run Code Online (Sandbox Code Playgroud)

。

$ awk -f tst2.awk <<<"hello: %s\nworld: %s\n"
hello: foo
world: bar

Run Code Online (Sandbox Code Playgroud)

或者，这应该在功能上等效，但不是 gawk 特定的：

function expandEscapes(tail,   head, esc, idx) {
    head = ""
    while ( match(tail, /\\./) ) {
        esc  = substr( tail, RSTART + 1, 1 )
        head = head substr( tail, 1, RSTART-1 )
        tail = substr( tail, RSTART + 2 )
        idx  = index( "abfnrtv", esc )
        if ( idx )
             esc = substr( "\a\b\f\n\r\t\v", idx, 1 )
        head = head esc
    }

    return (head tail)
}

Run Code Online (Sandbox Code Playgroud)

如果您愿意，可以将 split() RE 更改为

/\\(x[0-9a-fA-F]*|[0-7]{1,3}|.)/

Run Code Online (Sandbox Code Playgroud)

对于后面的十六进制值\\：

c = sprintf("%c", strtonum("0x" rest_of_str))

Run Code Online (Sandbox Code Playgroud)

对于八进制值：

c = sprintf("%c", strtonum("0" rest_of_str))

Run Code Online (Sandbox Code Playgroud)

@EdMorton，你认为你的代码比 eval 更安全，再想一想： `echo '$(rm -rf /)' | awk '{"printf \"" $0 "\" " "b" | awk '{"printf \"" $0 "\" " "b" | 获取线路；print s}'` 请不要在您的系统上执行此操作:) (3认同)
一种比推荐的更可怕的可能性是编写一个“awk”函数，例如“function map_escapes(s, t) { t = s; gsub(/\\n/, "\n", t); gsub(/\\t/, "\t", t); ……；返回t；}` 并使用它来操作从文件读取的格式字符串。您可以根据需要扩展它来处理其他转义序列。 (2认同)

归档时间：	11 年，4 月前
查看次数：	940 次
最近记录：	11 年，4 月前