相关疑难解决方法(0)

Apache Tika能够提取中文,日文等外语吗?

Apache Tika能够提取中文,日文等外语吗?

我有以下代码:

    Detector detector = new DefaultDetector();
    Parser parser = new AutoDetectParser(detector);
    InputStream stream = new ByteArrayInputStream(bytes);
    OutputStream outputstream = new ByteArrayOutputStream();
    ContentHandler textHandler = new BodyContentHandler(outputstream);
    Metadata metadata = new Metadata();
    // Set<String> langs = LanguageIdentifier.getSupportedLanguages();
    // metadata.set(Metadata.CONTENT_LANGUAGE, lang);
    // metadata.set(Metadata.FORMAT, hint);
    ParseContext context = new ParseContext();
    try {
        parser.parse(stream, textHandler, metadata, context);
        String extractedText = outputstream.toString();
        return extractedText;
    } catch (IOException e) {
        e.printStackTrace();
    } catch (SAXException e) {
        e.printStackTrace();
    } catch (TikaException e) {
        e.printStackTrace();
    } …
Run Code Online (Sandbox Code Playgroud)

apache apache-tika

5
推荐指数
1
解决办法
2787
查看次数

标签 统计

apache ×1

apache-tika ×1