Tika 1.13 RuntimeException

Zai*_*mir 1 java exception apache-tika

我最近更新了我现有的tika项目,使用tika 1.13而不是1.10.我唯一做的就是将依赖版本从1.10更改为1.13.该项目成功建成.然而,每当我尝试运行应用程序时,我都会遇到以下异常:

java.lang.RuntimeException: Unable to parse the default media type registry
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:580)
    at org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:69)
    at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:218)
    at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:341)
    at org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:51)
    at com.app.tikamanager.MetaParser.<init>(MetaParser.java:54)
    at com.app.services.MyService.HandleItemInThread(IntelligentDocumentsService.java:260)
    at com.app.intelligentservicebase.ItemHandlerThread.run(ItemHandlerThread.java:41)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.tika.mime.MimeTypeException: Invalid type configuration
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:126)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:64)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:93)
    at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:170)
    at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:577)
    ... 10 more
Caused by: org.xml.sax.SAXNotRecognizedException: http://javax.xml.XMLConstants/feature/secure-processing
    at org.apache.xerces.parsers.AbstractSAXParser.setFeature(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.setFeatures(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParserImpl(Unknown Source)
    at org.apache.xerces.jaxp.SAXParserFactoryImpl.setFeature(Unknown Source)
    at org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:119)
    ... 14 more
Run Code Online (Sandbox Code Playgroud)

从我的MetaParser类的构造函数抛出异常,唯一的事情是初始化AutoDetectParser:

private final AutoDetectParser _tikaExtractor;
public MetaParser()
    {
        _tikaExtractor = new AutoDetectParser();
    }
Run Code Online (Sandbox Code Playgroud)

我正在使用Oracle JDK 1.8.0_91-b14在Ubuntu 14.04上运行该应用程序.

我在网上查了一下,这个例外被提了几次,一旦可能的修复是安装OpenJDK但是那个旧版本的Tika,并且由于旧版本曾经在同一个JDK上工作正常我不认为那是问题.

在调用AutoDetectParser构造函数之前是否需要执行或初始化?

Gag*_*arr 6

将评论提升为答案 - 您的类路径上有一个非常旧版本的Xerces.您的JVM正在选择它作为默认的XML Parser,所以当Tika说"Hi JVM,我能否拥有安全的XML Parser"时它会失败.

(Tika在1.10到1.13期间对XML解析如何完成进行了改进,包括设置更安全的默认值,这就是为什么这已经开始发生)

您需要删除旧的Xerces jar,以便开始使用JVM提供的XML Parser,或者用更新的Xerces版本替换它们

你也可以在Java 8中找到错误解组XML中的一些建议"安全处理org.xml.sax.SAXNotRecognizedException"很有帮助,特别是如果你正在努力在你的构建中找到讨厌的旧Xerces jar!