我有一个InputStream,它将html文件作为输入参数.我必须从输入流中获取字节.
我有一个字符串:"XYZ".我想将此字符串转换为字节格式,并检查我从InputStream获取的字节序列中是否存在匹配项.如果有,那么我必须将匹配替换为其他字符串的再见序列.
有没有人可以帮我这个?我使用正则表达式来查找和替换.然而,我发现并替换字节流,我不知道.
以前,我使用jsoup来解析html并替换字符串,但是由于一些utf编码问题,当我这样做时,文件似乎已经损坏了.
TL; DR:我的问题是:
是一种在Java中的原始InputStream中以字节格式查找和替换字符串的方法吗?
aio*_*obe 29
不确定您是否选择了解决问题的最佳方法.
也就是说,我不喜欢(并且有政策不要)用"不要"回答问题所以这里......
从文档:
FilterInputStream包含一些其他输入流,它将其用作其基本数据源,可能会沿途转换数据或提供其他功能.
写一下这是一个有趣的练习.这是一个完整的例子:
import java.io.*;
import java.util.*;
class ReplacingInputStream extends FilterInputStream {
LinkedList<Integer> inQueue = new LinkedList<Integer>();
LinkedList<Integer> outQueue = new LinkedList<Integer>();
final byte[] search, replacement;
protected ReplacingInputStream(InputStream in,
byte[] search,
byte[] replacement) {
super(in);
this.search = search;
this.replacement = replacement;
}
private boolean isMatchFound() {
Iterator<Integer> inIter = inQueue.iterator();
for (int i = 0; i < search.length; i++)
if (!inIter.hasNext() || search[i] != inIter.next())
return false;
return true;
}
private void readAhead() throws IOException {
// Work up some look-ahead.
while (inQueue.size() < search.length) {
int next = super.read();
inQueue.offer(next);
if (next == -1)
break;
}
}
@Override
public int read() throws IOException {
// Next byte already determined.
if (outQueue.isEmpty()) {
readAhead();
if (isMatchFound()) {
for (int i = 0; i < search.length; i++)
inQueue.remove();
for (byte b : replacement)
outQueue.offer((int) b);
} else
outQueue.add(inQueue.remove());
}
return outQueue.remove();
}
// TODO: Override the other read methods.
}
Run Code Online (Sandbox Code Playgroud)
class Test {
public static void main(String[] args) throws Exception {
byte[] bytes = "hello xyz world.".getBytes("UTF-8");
ByteArrayInputStream bis = new ByteArrayInputStream(bytes);
byte[] search = "xyz".getBytes("UTF-8");
byte[] replacement = "abc".getBytes("UTF-8");
InputStream ris = new ReplacingInputStream(bis, search, replacement);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
int b;
while (-1 != (b = ris.read()))
bos.write(b);
System.out.println(new String(bos.toByteArray()));
}
}
Run Code Online (Sandbox Code Playgroud)
给出"Hello xyz world"它打印的字符串的字节数:
Hello abc world
Run Code Online (Sandbox Code Playgroud)
我也需要这样的东西,并决定推出我自己的解决方案,而不是使用@aioobe 上面的例子。看看代码。您可以从 maven central 中提取库,或者只是复制源代码。
这就是你如何使用它。在这种情况下,我使用嵌套实例来替换两个固定 dos 和 mac 行结尾的模式。
new ReplacingInputStream(new ReplacingInputStream(is, "\n\r", "\n"), "\r", "\n");
这是完整的源代码:
/**
* Simple FilterInputStream that can replace occurrances of bytes with something else.
*/
public class ReplacingInputStream extends FilterInputStream {
// while matching, this is where the bytes go.
int[] buf=null;
int matchedIndex=0;
int unbufferIndex=0;
int replacedIndex=0;
private final byte[] pattern;
private final byte[] replacement;
private State state=State.NOT_MATCHED;
// simple state machine for keeping track of what we are doing
private enum State {
NOT_MATCHED,
MATCHING,
REPLACING,
UNBUFFER
}
/**
* @param is input
* @return nested replacing stream that replaces \n\r (DOS) and \r (MAC) line endings with UNIX ones "\n".
*/
public static InputStream newLineNormalizingInputStream(InputStream is) {
return new ReplacingInputStream(new ReplacingInputStream(is, "\n\r", "\n"), "\r", "\n");
}
/**
* Replace occurances of pattern in the input. Note: input is assumed to be UTF-8 encoded. If not the case use byte[] based pattern and replacement.
* @param in input
* @param pattern pattern to replace.
* @param replacement the replacement or null
*/
public ReplacingInputStream(InputStream in, String pattern, String replacement) {
this(in,pattern.getBytes(StandardCharsets.UTF_8), replacement==null ? null : replacement.getBytes(StandardCharsets.UTF_8));
}
/**
* Replace occurances of pattern in the input.
* @param in input
* @param pattern pattern to replace
* @param replacement the replacement or null
*/
public ReplacingInputStream(InputStream in, byte[] pattern, byte[] replacement) {
super(in);
Validate.notNull(pattern);
Validate.isTrue(pattern.length>0, "pattern length should be > 0", pattern.length);
this.pattern = pattern;
this.replacement = replacement;
// we will never match more than the pattern length
buf = new int[pattern.length];
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
// copy of parent logic; we need to call our own read() instead of super.read(), which delegates instead of calling our read
if (b == null) {
throw new NullPointerException();
} else if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
} else if (len == 0) {
return 0;
}
int c = read();
if (c == -1) {
return -1;
}
b[off] = (byte)c;
int i = 1;
try {
for (; i < len ; i++) {
c = read();
if (c == -1) {
break;
}
b[off + i] = (byte)c;
}
} catch (IOException ee) {
}
return i;
}
@Override
public int read(byte[] b) throws IOException {
// call our own read
return read(b, 0, b.length);
}
@Override
public int read() throws IOException {
// use a simple state machine to figure out what we are doing
int next;
switch (state) {
case NOT_MATCHED:
// we are not currently matching, replacing, or unbuffering
next=super.read();
if(pattern[0] == next) {
// clear whatever was there
buf=new int[pattern.length]; // clear whatever was there
// make sure we start at 0
matchedIndex=0;
buf[matchedIndex++]=next;
if(pattern.length == 1) {
// edgecase when the pattern length is 1 we go straight to replacing
state=State.REPLACING;
// reset replace counter
replacedIndex=0;
} else {
// pattern of length 1
state=State.MATCHING;
}
// recurse to continue matching
return read();
} else {
return next;
}
case MATCHING:
// the previous bytes matched part of the pattern
next=super.read();
if(pattern[matchedIndex]==next) {
buf[matchedIndex++]=next;
if(matchedIndex==pattern.length) {
// we've found a full match!
if(replacement==null || replacement.length==0) {
// the replacement is empty, go straight to NOT_MATCHED
state=State.NOT_MATCHED;
matchedIndex=0;
} else {
// start replacing
state=State.REPLACING;
replacedIndex=0;
}
}
} else {
// mismatch -> unbuffer
buf[matchedIndex++]=next;
state=State.UNBUFFER;
unbufferIndex=0;
}
return read();
case REPLACING:
// we've fully matched the pattern and are returning bytes from the replacement
next=replacement[replacedIndex++];
if(replacedIndex==replacement.length) {
state=State.NOT_MATCHED;
replacedIndex=0;
}
return next;
case UNBUFFER:
// we partially matched the pattern before encountering a non matching byte
// we need to serve up the buffered bytes before we go back to NOT_MATCHED
next=buf[unbufferIndex++];
if(unbufferIndex==matchedIndex) {
state=State.NOT_MATCHED;
matchedIndex=0;
}
return next;
default:
throw new IllegalStateException("no such state " + state);
}
}
@Override
public String toString() {
return state.name() + " " + matchedIndex + " " + replacedIndex + " " + unbufferIndex;
}
}
Run Code Online (Sandbox Code Playgroud)