标签: stanford-nlp

public class lemmafirst {

    protected StanfordCoreNLP pipeline;

    public lemmafirst() {
        // Create StanfordCoreNLP object properties, with POS tagging
        // (required for lemmatization), and lemmatization
        Properties props;
        props = new Properties();
        props.put("annotators", "tokenize, ssplit, pos, lemma");

        /*
         * This is a pipeline that takes in a string and returns various analyzed linguistic forms. 
         * The String is tokenized via a tokenizer (such as PTBTokenizerAnnotator), 
         * and then other sequence model style …

Run Code Online (Sandbox Code Playgroud)

java nlp jar stanford-nlp maven

Loh*_*que

lucky-day

5
推荐指数

2
解决办法

1万
查看次数

关于如何在新语料库上训练 nn 依赖解析器的任何提示？

我们想在俄罗斯语料库上训练斯坦福神经网络依赖解析器，有什么关于如何做的提示吗？论文中描述了超参数，但是了解如何准备训练数据（注释，特别是如何创建 word2vec 注释）会很好。非常感谢任何帮助或对某些文档的引用！

谢谢！

stanford-nlp

ran*_*123

lucky-day

5
推荐指数

1
解决办法

1093
查看次数

初始堆错误太小-斯坦福解析器

我正在尝试使用Stanford依赖解析器。我尝试使用以下命令从Windows上的命令行运行解析器以提取依赖项：

java -mx100m -cp "stanford-parser.jar" edu.stanford.nlp.trees.EnglishGrammaticalStructure -sentFile english-onesent.txt -collapsedTree -CCprocessed -parserFile englishPCFG.ser.gz

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

Error occurred during initialization of VM  
Too small initial heap

Run Code Online (Sandbox Code Playgroud)

我将内存大小更改为-mx1024，-mx2048以及-mx4096。它没有任何改变，错误仍然存在。

我想念什么？

stanford-nlp

Vip*_*per

lucky-day

5
推荐指数

2
解决办法

2万
查看次数

运行斯坦福corenlp服务器与法国模型

我试图用Stanford CoreNLP工具分析一些法语文本(这是我第一次尝试使用任何StanfordNLP软件)

为此,我已经下载了v3.6.0 jar和相应的法语模型.

然后我运行服务器:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

Run Code Online (Sandbox Code Playgroud)

如本回答所述,我将API称为:

wget --post-data 'Bonjour le monde.' 'localhost:9000/?properties={"parse.model":"edu/stanford/nlp/models/parser/nndep/UD_French.gz", "annotators": "parse", "outputFormat": "json"}' -O -

Run Code Online (Sandbox Code Playgroud)

但我得到以下日志+错误:

 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP  
 Adding annotator tokenize
 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP -   Adding annotator ssplit
 [pool-1-thread-1] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
 [pool-1-thread-1] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/parser/nndep/UD_French.gz ... 

 edu.stanford.nlp.io.RuntimeIOException: java.io.StreamCorruptedException: invalid stream header: 64696374
    at edu.stanford.nlp.parser.common.ParserGrammar.loadModel(ParserGrammar.java:188)
    at …

Run Code Online (Sandbox Code Playgroud)

stanford-nlp stanford-nlp-server

ste*_*sia

2017 05-23

5
推荐指数

1
解决办法

1643
查看次数

Word2Vec - 向向量表示添加约束

我正在尝试将预先训练的 Google 新闻 word2vec 模型适应我的特定领域。对于我正在查看的领域，已知某些单词彼此相似，因此在理想的世界中，这些单词的 Word2Vec 表示应该代表它。我知道我可以在特定领域数据的语料库上训练预训练模型来更新向量。

但是，如果我确定某些单词非常相似并且应该放在一起，那么我是否可以将该约束合并到 word2vec 模型中？在数学上，我想在 word2vec 的损失函数中添加一个术语，如果我知道相似的两个在向量空间中的位置彼此不靠近，则该函数会提供惩罚。有没有人对如何实现这一点有建议？这是否需要我解压 word2vec 模型，或者是否有办法将附加项添加到损失函数中？

nlp stanford-nlp word2vec

Ali*_*Ali

lucky-day

5
推荐指数

1
解决办法

509
查看次数

斯坦福 NER 小写实体

我在检测以小写字母开头的命名实体时遇到问题。如果我只用小写单词训练模型，那么准确率是合理的；但是，当模型使用完全大写的标记或什至是小写和大写混合训练时，结果非常糟糕。我尝试了斯坦福 NLP 组类 NERFeatureFactory提供的一些功能以及各种句子，但无法获得预期的结果。我面临的问题的一个例子如下：

“阿里在密歇根大学学习，现在他为我们海军工作。”

我希望模型能够识别实体如下：