我有一个数据框:
train_review = train['review']
train_review
Run Code Online (Sandbox Code Playgroud)
看起来像:
0 With all this stuff going down at the moment w...
1 \The Classic War of the Worlds\" by Timothy Hi...
2 The film starts with a manager (Nicholas Bell)...
3 It must be assumed that those who praised this...
4 Superbly trashy and wondrously unpretentious 8...
Run Code Online (Sandbox Code Playgroud)
我将令牌添加到字符串中:
train_review = train['review']
train_token = ''
for i in train['review']:
train_token +=i
Run Code Online (Sandbox Code Playgroud)
我想要的是使用 Spacy 标记评论。这是我尝试过的,但出现以下错误:
参数“字符串”的类型不正确(预期的 str,得到了 spacy.tokens.doc.Doc)
我该如何解决?提前致谢!
在您的for循环中,您从数据框中获取 spacy.tokens 并将它们附加到一个字符串中,因此您应该将其转换为str. 像这样:
train_review = train['review']
train_token = ''
for i in train['review']:
train_token += str(i)
Run Code Online (Sandbox Code Playgroud)