我尝试为NLP任务安装YamCha工具,如NER,POS和分块.
在尝试安装时,我按照安装步骤进行操作
% ./configure
% make
% make check
% su
# make install
Run Code Online (Sandbox Code Playgroud)
我收到以下错误消息: -
param.cpp:在成员函数'bool YamCha :: Param :: open(int,char**,const YamCha :: Option*)':param.cpp:102:42:error:'strlen'未在此声明scope size_t nlen = strlen(opts [i] .name); ^ param.cpp:103:68:错误:'strncmp'未在此范围内声明if(nlen == len && strncmp(&argv [ind] [2],opts [i] .name,len)== 0) {^ param.cpp:在成员函数'bool YamCha :: Param :: open(const char*,const YamCha :: Option*)':param.cpp:182:28:error:'strncpy'未在此声明scope strncpy(str,arg,1024); ^ param.cpp:185:12:警告:不推荐使用从字符串常量转换为'char*'[-Wwrite-strings] make all-recursive make 1:进入目录
/home/hamada/Documents/YamCha/yamcha-0.33' Making all in src make[2]: Entering directory
/home/hamada/Documents/YamCha/yamcha-0.33/src '/ bin/bash ../libtool --mode = compile --tag = …
我有一个句子列表的列表,先用单词标记,然后用pos标记,所以结果显然是一个包含元素的列表:
[(w1,pos_tag1)(w2,pos_tag2)]
[(w3,pos_tag3),(w4,pos_tag4),(w5,pos_tag5)]
[(w6,pos_tag6),(w7,pos_tag7)]
Run Code Online (Sandbox Code Playgroud)
我只需要按照所有句子中出现的顺序来获得pos_tags列表。我尝试的是对该列表进行迭代
标签= [x [1]对于列表中元素中的x]
但这不起作用。如何在这些列表中包含所有标签?
谢谢
我正在尝试使用Perceptron执行监督分类,从而执行句子的POS标记.我现在假设每个单词的标签都是独立的.(即我只使用这个词作为一个特征).我对机器学习算法相当新,所以我无法弄清楚如何为每个单词表示特征函数.
我有一个100个句子的训练集,每个单词都有一个特定的标签(比如N,V,J(形容词)等等).例如,
Jack(N)和(&)Jill(N)去了(PR)秘鲁(N)
标签在括号中的位置.假设我总共有10个可能的标签.现在我的问题是杰克这个词的特征向量是怎样的?
我非常感兴趣将它作为向量实现,因为我的代码将更好地匹配符号.一旦我弄清楚功能函数的外观,我将能够实现Perceptron算法!
另外,我想添加像(a)首字母大写的功能吗?(b)单词是否连字符等,如何将其合并到我的特征向量中?
直观地说,我可以看到向量只需要二进制值,但我无法超越它.
如果可能的话,请尝试用具体的例子来解释!
当我使用 Brill Tagger 时,出现此错误。
TypeError: '_sre.SRE_Pattern' object is not iterable
WARNING:root:2016-04-05 00:05:37.503718 is when this event was logged.
ERROR:root:'_sre.SRE_Pattern' object is not iterable
Traceback (most recent call last):
File "D:\Dropbox\VCL\MyWrapper.py", line 137, in run_alg
CLC_POS.tag_file(input_utf8, path_out + '.pos', file_encoding, CLC_POS.load_tagger('pos_tbl_86943.model'), '')
File "D:\Dropbox\VCL\CLC_POS.py", line 277, in tag_file
token_tag = tagger.tag(word_list)
File "C:\Python34\lib\site-packages\nltk\tag\brill.py", line 264, in tag
tagged_tokens = self._initial_tagger.tag(tokens)
File "C:\Python34\lib\site-packages\nltk\tag\sequential.py", line 61, in tag
tags.append(self.tag_one(tokens, i, tags))
File "C:\Python34\lib\site-packages\nltk\tag\sequential.py", line 81, in tag_one
tag = tagger.choose_tag(tokens, index, history)
File …
Run Code Online (Sandbox Code Playgroud) spaCy 词性标注器通常用于整个句子。有没有一种方法可以有效地将一元词性标记应用于单个单词(或单个单词列表)?
像这样的东西:
words = ["apple", "eat", good"]
tags = get_tags(words)
print(tags)
> ["NNP", "VB", "JJ"]
Run Code Online (Sandbox Code Playgroud)
谢谢。
我有一个文本,我想找到“ADJs”、“PRONs”、“VERBs”、“NOUNs”等的数量。我知道有.pos_tag()
函数,但它给了我不同的结果,我想要结果为“ADJ” ','PRON', '动词', '名词'。这是我的代码:
import nltk
from nltk.corpus import state_union, brown
from nltk.corpus import stopwords
from nltk import ne_chunk
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
from nltk.tokenize import RegexpTokenizer
from nltk.stem import WordNetLemmatizer
from collections import Counter
sentence = "this is my sample text that I want to analyze with programming language"
# tokenizing text (make list with evey word)
sample_tokenization = word_tokenize(sample)
print("THIS IS TOKENIZED SAMPLE TEXT, LIST OF WORDS:\n\n", sample_tokenization)
print()
# tagging words …
Run Code Online (Sandbox Code Playgroud) 有人可以帮我用hunpos标记nltk中的语料库的语法吗?
我要为hunpos.HunPosTagger
模块导入什么?
我如何HunPosTag语料库?请参见下面的代码。
import nltk
from nltk.corpus import PlaintextCorpusReader
from nltk.corpus.util import LazyCorpusLoader
corpus_root = './'
reader = PlaintextCorpusReader (corpus_root, '.*')
ntuen = LazyCorpusLoader ('ntumultien', PlaintextCorpusReader, reader)
ntuen.fileids()
isinstance (ntuen, PlaintextCorpusReader)
# So how do I hunpos tag `ntuen`? I can't get the following code to work.
# please help me to correct my python syntax errors, I'm new to python
# but i really need this to work. sorry
##from nltk.tag import hunpos.HunPosTagger
ht = HunPosTagger('english.model') …
Run Code Online (Sandbox Code Playgroud) pos-tagger ×7
nlp ×4
python ×4
nltk ×3
algorithm ×1
corpus ×1
perceptron ×1
regex ×1
spacy ×1
svm ×1