使用PDFTk填充PDF时出现奇怪的字符

ser*_*gio 3 php pdf ubuntu encoding pdftk

我在Ubuntu上使用PHP和PDFTK.当用数据填充PDF时,我会在这些带有重音的字母上得到奇怪的字符:áóí.我正在使用UTF-8编码:我使用echo mb_check_encoding($ var,'UTF-8')检查,输出1 - TRUE.知道我能做什么吗?

我也尝试使用utf8_decode转换为ISO,但仍然没有运气.

谢谢

use*_*228 9

你没错,utf8_decode()适用于可编码为Windows-1252的字符(即U + 0000-U + 00FF).

但是,它不适用于无法在Windows-1252中编码的字符.

但是,您始终可以使用UTF-16BE对字符进行编码.您只能对单个字段执行此操作,例如对"Özil"一词进行编码:

<<
/V (þÿ^@ö^@z^@i^@l)
/T (name)
>>
Run Code Online (Sandbox Code Playgroud)

(这里"^ @"表示NUL字符(U + 0000).如果文件在Windows-1252(latin1)中编码,这就是它在我的编辑器(vim)中的样子.)

Note that you need to use a byte order mark (which will appear as "þÿ" if your file is encoded in Windows-1252) and you'll need to encode the entire string (between the two parentheses) in UTF-16.

If you're generating the FDF in a PHP script you can do something like this:

<<
/V (<?php echo chr(0xfe) . chr(0xff) . str_replace(array('\\', '(', ')'), array('\\\\', '\(', '\)'), mb_convert_encoding("özil", 'UTF-16BE')); ?>)
/T (name)
>>
Run Code Online (Sandbox Code Playgroud)

You can also write out the hex codes like this (i.e. enclosed in angular brackets rather than parentheses):

<<
/V <FEFF00F6007A0069006C>
/T (name)
>>
Run Code Online (Sandbox Code Playgroud)

This has exactly the same result (the string "özil"). It's less efficient in terms of characters, but it actually seems to be more reliable in pdftk, which has some bugs I've found (in version 2.02).

Finally, you can also write out the Unicode code point for any character in octal notation (\ddd). For example, ö has codepoint U+00F6, which in octal is 366, so you can write:

<<
/V (\366zil)
/T (name)
>>
Run Code Online (Sandbox Code Playgroud)

However, this only works up to U+00FF (octal 377). Beyond that, you'd have to use UTF-16.

The PDF standard allows you to set the encoding to UTF-8 for the whole FDF document. I tried this and it didn't work with pdftk, however in theory it would be done like this:

%FDF-1.2
1 0 obj
<<
/Version /1.3
/Encoding /utf_8
/FDF
Run Code Online (Sandbox Code Playgroud)

(You would presumably have to set the FDF version to 1.3 (or more) in the header too, according to the standard.)

You can also do this at the field level:

<<
/V (özil)
/T (name)
/Encoding /utf_8
>>
Run Code Online (Sandbox Code Playgroud)

But as I said, I didn't manage to get any of this to work. pdftk just seems to ignore it.