如何解码可引用的字符(从quotable到char)?

Ska*_*rab 5 java encoding

我有一个带引号的文本.以下是此类文本的示例(来自维基百科文章):

如果你相信真理= 3Dbeauty,那么肯定= 20 =
数学是哲学中最美丽的分支.

我正在寻找一个Java类,它将编码形式解码为chars,例如,= 20到一个空格.

更新:感谢The Elite Gentleman,我知道我需要使用QuotedPrintableCodec:

import org.apache.commons.codec.DecoderException;
import org.apache.commons.codec.net.QuotedPrintableCodec;
import org.junit.Test;

public class QuotedPrintableCodecTest { 
private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20=mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws DecoderException
    {
        QuotedPrintableCodec.decodeQuotedPrintable( TXT.getBytes() );           
    }
}   
Run Code Online (Sandbox Code Playgroud)

但是我一直得到以下异常:

org.apache.commons.codec.DecoderException: Invalid URL encoding: not a valid digit (radix 16): 109
    at org.apache.commons.codec.net.Utils.digit16(Utils.java:44)
    at org.apache.commons.codec.net.QuotedPrintableCodec.decodeQuotedPrintable(QuotedPrintableCodec.java:186)
Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么?

更新2:我在@ SO中找到了这个问题并了解了MimeUtility:

import javax.mail.MessagingException;
import javax.mail.internet.MimeUtility;

public class QuotedPrintableCodecTest {
    private static final String TXT =  "If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.";

    @Test
    public void processSimpleText() throws MessagingException, IOException  
    {
        InputStream is = new ByteArrayInputStream(TXT.getBytes());

            BufferedReader br = new BufferedReader ( new InputStreamReader(  MimeUtility.decode(is, "quoted-printable") ));         
            StringWriter writer = new StringWriter(); 

            String line;
            while( (line = br.readLine() ) != null )
            {
                writer.append(line);
            }
            System.out.println("INPUT:  "  + TXT);
            System.out.println("OUTPUT: " +  writer.toString() );       
    }
    }
Run Code Online (Sandbox Code Playgroud)

但是输出仍然不完美,它包含'=':

INPUT:  If you believe that truth=3Dbeauty, then surely=20= mathematics is the most beautiful branch of philosophy.
OUTPUT: If you believe that truth=beauty, then surely = mathematics is the most beautiful branch of philosophy.
Run Code Online (Sandbox Code Playgroud)

现在我做错了什么?

Buh*_*ndi 9

Apache Commons Codec QuotedPrintableCodec类确实是RFC 1521 Quoted-Printable部分的实现.


更新,您的quoted-printable字符串是错误的,因为维基百科上的示例使用Soft-line break.

软线断裂:

Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
      that encoded lines be no more than 76 characters long. If longer
      lines are to be encoded with the Quoted-Printable encoding, 'soft'
      line breaks must be used. An equal sign as the last character on a
      encoded line indicates such a non-significant ('soft') line break
      in the encoded text. Thus if the "raw" form of the line is a
      single unencoded line that says:

          Now's the time for all folk to come to the aid of
          their country.

      This can be represented, in the Quoted-Printable encoding, as

          Now's the time =
          for all folk to come=
           to the aid of their country.

      This provides a mechanism with which long lines are encoded in
      such a way as to be restored by the user agent.  The 76 character
      limit does not count the trailing CRLF, but counts all other
      characters, including any equal signs.
Run Code Online (Sandbox Code Playgroud)

所以你的文字应该如下:

private static final String CRLF = "\r\n";
private static final String S = "If you believe that truth=3Dbeauty, then surely=20=" + CRLF + "mathematics is the most beautiful branch of philosophy.";
Run Code Online (Sandbox Code Playgroud)

Javadoc明确指出:

引用的可打印规范的规则#3,#4和#5尚未实现,因为完整的quoted-printable规范不适合面向byte []的编解码器框架.一旦可运行的编解码器框架准备就绪,完成编解码器.以部分形式提供编解码器背后的动机是,对于那些不需要可引用的可打印行格式(规则#3,#4,#5)的应用程序,例如Q编​​解码器,它已经派上用场了.

并且Apache QuotedPrintableCodec 记录了一个错误,因为它不支持软换行符.