cru*_*sam 5 java xml escaping unmarshalling
想象一下以下情况:我们从某些外部工具收到一个 xml 文件。最近,在此 xml 中,节点名中或其 richcontent 标记中可能存在一些转义字符,如以下示例(简化)所示:
<map>
<node TEXT="Project">
<node TEXT="ää">
<richcontent TYPE="NOTE"><html>
<head>
</head>
<body>
<p>
I am a Note for Node ää!
</p>
</body>
</html>
</richcontent>
</node>
</node>
</map>
Run Code Online (Sandbox Code Playgroud)
使用 JAXB 解组文件后,那些转义的字符就不会转义。不幸的是,我需要他们保持原样,这意味着逃跑。有什么方法可以避免在解组时取消转义这些字符?
在研究时,我发现了很多有关编组 xml 文件的问题,其中出现了相反的问题,但这些也没有帮助我:
是否有可能使用 JAXB 来实现这一目标,或者我们是否必须考虑更改为不同的 xml 读取器 API?
提前谢谢你,ymene
您只需替换&#为&#then 调用
unmarshaller.unmarshal(new AmpersandingStream(new FileInputStream(...)));
Run Code Online (Sandbox Code Playgroud)
和
import java.io.IOException;
import java.io.InputStream;
/**
* Replaces numerical entities with their notation as text.
*/
public class AmpersandingStream extends InputStream {
private InputStream in;
private boolean justReadAmpersand;
private String lookAhead = "";
public AmpersandingStream(InputStream in) {
this.in = in;
}
@Override
public int read() throws IOException {
if (!lookAhead.isEmpty()) {
int c = lookAhead.codePointAt(0);
lookAhead = lookAhead.substring(Character.charCount(c));
return c;
}
int c = in.read();
if (c == (int)'#' && justReadAmpersand) {
c = (int)'a';
lookAhead = "mp;#";
}
justReadAmpersand = c == (int)'&';
return c;
}
@Override
public int available() throws IOException {
return in.available();
}
@Override
public void close() throws IOException {
in.close();
}
@Override
public synchronized void mark(int readlimit) {
in.mark(readlimit);
}
@Override
public boolean markSupported() {
return in.markSupported();
}
@Override
public int read(byte[] b) throws IOException {
return in.read(b);
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
return in.read(b, off, len);
}
@Override
public synchronized void reset() throws IOException {
in.reset();
}
@Override
public long skip(long n) throws IOException {
return in.skip(n);
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
6211 次 |
| 最近记录: |