使用 POSIX awk 解码 Base64 编码文本

Fra*_*ona 1 awk base64 posix openldap

我需要解码;中的大量 base64 编码的文本字符串。awk因为我不想大量分叉不可移植的二进制base64文件,所以我编写了一个awk函数来进行解码:

function base64_decode(str,    out,i,n,v) {
    out = ""
    if ( ! ("A" in _BASE64_DECODE_c2i) )
        for (i = 1; i <= 64; i++)
            _BASE64_DECODE_c2i[substr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",i,1)] = i-1
    i = 0
    n = length(str)
    while (i <= n) {
        v = _BASE64_DECODE_c2i[substr(str,++i,1)] * 262144 + \
            _BASE64_DECODE_c2i[substr(str,++i,1)] * 4096 + \
            _BASE64_DECODE_c2i[substr(str,++i,1)] * 64 + \
            _BASE64_DECODE_c2i[substr(str,++i,1)]
        out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
    }
    return out
}
Run Code Online (Sandbox Code Playgroud)

效果很好:

printf '%s\n' SmFuZQ== amRvZQ== |

LANG=C command -p awk '
    { print base64_decode($0) }
    function base64_decode(...) {...} # placeholder for the real function
'
Run Code Online (Sandbox Code Playgroud)
Jane
jdoe
Run Code Online (Sandbox Code Playgroud)
语境问题

我试图从 的输出中获取属于=givenName成员的用户,例如:GroupCode025496ldapsearch -LLL -o ldif-wrap=no ... '(|(uid=*)(GroupCode=*))' uid givenName GroupCode memberUid

dn: uid=jsmith,ou=users,dc=example,dc=com
givenName: John
uid: jsmith

dn: uid=jdoe,ou=users,dc=example,dc=com
uid: jdoe
givenName:: SmFuZQ==

dn: cn=group1,ou=groups,dc=example,dc=com
GroupCode: 025496
memberUid:: amRvZQ==
memberUid: jsmith

Run Code Online (Sandbox Code Playgroud)

这是awk这样做的:

Jane
jdoe
Run Code Online (Sandbox Code Playgroud)

在 BSD 和 Solaris 上,结果是:

dn: uid=jsmith,ou=users,dc=example,dc=com
givenName: John
uid: jsmith

dn: uid=jdoe,ou=users,dc=example,dc=com
uid: jdoe
givenName:: SmFuZQ==

dn: cn=group1,ou=groups,dc=example,dc=com
GroupCode: 025496
memberUid:: amRvZQ==
memberUid: jsmith

Run Code Online (Sandbox Code Playgroud)

在 Linux 上是:

LANG=C command -p awk -F '\n' -v RS='' -v GroupCode=025496 '
    {
        delete attrs
        for (i = 2; i <= NF; i++) {
            match($i,/::? /)
            key = substr($i,1,RSTART-1)
            val = substr($i,RSTART+RLENGTH)
            if (RLENGTH == 3)
                val = base64_decode(val)
            attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
        }
        if ( /\nuid:/ )
            givenName[ attrs["uid"] ] = attrs["givenName"]
        else
            memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
    }
    END {
        n = split(memberUid[GroupCode],uid,SUBSEP)
        for ( i = 1; i <= n; i++ )
            print givenName[ uid[i] ]
    }

    function base64_decode(...) { ... } # placeholder for the real function
'
Run Code Online (Sandbox Code Playgroud)

我不明白问题出在哪里。base64_decode该函数和/或使用它的代码是否有问题?

M. *_*din 5

当函数的参数(编码字符串)以填充字符 ( =s) 结尾时,函数会生成 NUL 字节。以下是循环的更正版本while

while (i < n) {
    v = _BASE64_DECODE_c2i[substr(str,1+i,1)] * 262144 + \
        _BASE64_DECODE_c2i[substr(str,2+i,1)] * 4096 + \
        _BASE64_DECODE_c2i[substr(str,3+i,1)] * 64 + \
        _BASE64_DECODE_c2i[substr(str,4+i,1)]
    i += 4
    if (v%256 != 0)
        out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
    else if (int(v/256)%256 != 0)
        out = out sprintf("%c%c", int(v/65536), int(v/256))
    else
        out = out sprintf("%c", int(v/65536))
}
Run Code Online (Sandbox Code Playgroud)

请注意,如果解码的字节包含嵌入的 NUL,则此方法可能无法正常工作。