假设我正在运行一项服务,用户可以提交正则表达式来搜索大量数据.如果用户提交一个非常慢的正则表达式(即,需要几分钟才能返回Matcher.find()),我想要一种方法来取消该匹配.我能想到这样做的唯一方法是让另一个线程监视匹配的持续时间,并在必要时使用Thread.stop()取消它.
成员变量:
long REGEX_TIMEOUT = 30000L;
Object lock = new Object();
boolean finished = false;
Thread matcherThread;
Run Code Online (Sandbox Code Playgroud)
匹配线程:
try {
matcherThread = Thread.currentThread();
// imagine code to start monitor thread is here
try {
matched = matcher.find();
} finally {
synchronized (lock) {
finished = true;
lock.notifyAll();
}
}
} catch (ThreadDeath td) {
// send angry message to client
// handle error without rethrowing td
}
Run Code Online (Sandbox Code Playgroud)
监控线程:
synchronized (lock) {
while (! finished) {
try {
lock.wait(REGEX_TIMEOUT);
if (! finished) {
matcherThread.stop();
}
} catch (InterruptedException ex) {
// ignore, top level method in dedicated thread, etc..
}
}
}
Run Code Online (Sandbox Code Playgroud)
我已经阅读了java.sun.com/j2se/1.4.2/docs/guide/misc/threadPrimitiveDeprecation.html,我认为这种用法是安全的,因为我控制ThreadDeath通过同步抛出的地方并处理它而且只有损坏对象可能是我的Pattern和Matcher实例,无论如何都会被丢弃.我认为这会破坏Thread.stop(),因为我不会重新抛出错误,但我不想让线程死掉,只是中止find()方法.
到目前为止,我已设法避免使用这些已弃用的API组件,但Matcher.find()似乎不可中断,并且可能需要很长时间才能返回.有没有更好的方法来做到这一点?
Kri*_*ris 42
来自Heritrix:(crawler.archive.org)
/**
* CharSequence that noticed thread interrupts -- as might be necessary
* to recover from a loose regex on unexpected challenging input.
*
* @author gojomo
*/
public class InterruptibleCharSequence implements CharSequence {
CharSequence inner;
// public long counter = 0;
public InterruptibleCharSequence(CharSequence inner) {
super();
this.inner = inner;
}
public char charAt(int index) {
if (Thread.interrupted()) { // clears flag if set
throw new RuntimeException(new InterruptedException());
}
// counter++;
return inner.charAt(index);
}
public int length() {
return inner.length();
}
public CharSequence subSequence(int start, int end) {
return new InterruptibleCharSequence(inner.subSequence(start, end));
}
@Override
public String toString() {
return inner.toString();
}
}
Run Code Online (Sandbox Code Playgroud)
用这个包裹你的CharSequence并且线程中断将起作用......
稍加改动,就可以避免为此使用额外的线程:
public class RegularExpressionUtils {
// demonstrates behavior for regular expression running into catastrophic backtracking for given input
public static void main(String[] args) {
Matcher matcher = createMatcherWithTimeout(
"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y", 2000);
System.out.println(matcher.matches());
}
public static Matcher createMatcherWithTimeout(String stringToMatch, String regularExpression, int timeoutMillis) {
Pattern pattern = Pattern.compile(regularExpression);
return createMatcherWithTimeout(stringToMatch, pattern, timeoutMillis);
}
public static Matcher createMatcherWithTimeout(String stringToMatch, Pattern regularExpressionPattern, int timeoutMillis) {
CharSequence charSequence = new TimeoutRegexCharSequence(stringToMatch, timeoutMillis, stringToMatch,
regularExpressionPattern.pattern());
return regularExpressionPattern.matcher(charSequence);
}
private static class TimeoutRegexCharSequence implements CharSequence {
private final CharSequence inner;
private final int timeoutMillis;
private final long timeoutTime;
private final String stringToMatch;
private final String regularExpression;
public TimeoutRegexCharSequence(CharSequence inner, int timeoutMillis, String stringToMatch, String regularExpression) {
super();
this.inner = inner;
this.timeoutMillis = timeoutMillis;
this.stringToMatch = stringToMatch;
this.regularExpression = regularExpression;
timeoutTime = System.currentTimeMillis() + timeoutMillis;
}
public char charAt(int index) {
if (System.currentTimeMillis() > timeoutTime) {
throw new RuntimeException("Timeout occurred after " + timeoutMillis + "ms while processing regular expression '"
+ regularExpression + "' on input '" + stringToMatch + "'!");
}
return inner.charAt(index);
}
public int length() {
return inner.length();
}
public CharSequence subSequence(int start, int end) {
return new TimeoutRegexCharSequence(inner.subSequence(start, end), timeoutMillis, stringToMatch, regularExpression);
}
@Override
public String toString() {
return inner.toString();
}
}
}
Run Code Online (Sandbox Code Playgroud)
非常感谢 dawce 为我指出这个解决方案来回答一个不必要的复杂问题!