我正在寻找一个情感分析脚本/大豆代码,最好用PHP.你知道任何这样的剧本吗?谢谢,Sameer
我使用斯坦福核心 NLP库进行情感分析。下面的代码返回一个例子的类,但我怎样才能得到分数?例如 -0.3 为负等
private int getScore(String line) {
boolean isrun = false;
StanfordCoreNLP pipeline = null;
if(!isrun){
Properties props = getProperties();
pipeline = new StanfordCoreNLP(props);
isrun = true;
}
Annotation annotation;
int sentiment = -1;
if (line != null && line.length() > 0) {
annotation = pipeline.process(line);
for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
Tree tree = sentence.get(SentimentCoreAnnotations.AnnotatedTree.class);
sentiment = RNNCoreAnnotations.getPredictedClass(tree);
}
}
return sentiment;
}
Run Code Online (Sandbox Code Playgroud)
编辑
在在线演示中,当鼠标位于图形的根上时,我们可以看到该示例为负 72%。怎样才能得到这个号码?
我正面临着这个属性错误,如果它们出现在推文中,我就会陷入如何处理浮点值.流媒体推文必须更低,并且标记化,所以我使用了分割功能.
有人可以帮我处理它,任何解决方法或解决方案..?
这是我犯的错误 ....
AttributeError Traceback (most recent call last)
<ipython-input-28-fa278f6c3171> in <module>()
1 stop_words = []
----> 2 negfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'neg') for f in l]
3 posfeats = [(word_feats(x for x in p_test.SentimentText[f].lower().split() if x not in stop_words), 'pos') for f in p]
4
5 trainfeats = negfeats+ posfeats
AttributeError: 'float' object has no attribute 'lower'
Run Code Online (Sandbox Code Playgroud)
这是我的代码
p_test = pd.read_csv('TrainSA.csv')
stop_words = [ ]
def word_feats(words):
return dict([(word, …Run Code Online (Sandbox Code Playgroud) 我正在使用NLTK测试情感分析模型.我需要在分类器结果中添加一个Confusion Matrix,如果可能的话还需要在Precision,Recall和F-Measure值中添加.到目前为止我只有准确性.Movie_reviews数据有pos和neg标签.然而,为了训练分类器,我使用的"特征集"具有与通常(句子,标签)结构不同的格式.在通过"featuresets"训练分类器后,我不确定是否可以使用sklearn中的confusion_matrix
import nltk
import random
from nltk.corpus import movie_reviews
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
random.shuffle(documents)
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
featuresets = [(find_features(rev), category) for (rev, category) in documents]
training_set = featuresets[:1900]
testing_set = featuresets[1900:]
classifier = nltk.NaiveBayesClassifier.train(training_set)
print("Naive Bayes Algo accuracy percent:", (nltk.classify.accuracy(classifier, …Run Code Online (Sandbox Code Playgroud) 我正在使用 tm 包来清理 Twitter 语料库。但是,该软件包无法清理表情符号。
\n\n这是复制的代码:
\n\nJuly4th_clean <- tm_map(July4th_clean, content_transformer(tolower))\nError in FUN(content(x), ...) : invalid input 'RT ElleJohnson Love of country is encircling the globes \xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd july4thweekend July4th FourthOfJuly IndependenceDay NotAvailableOnIn' in 'utf8towcs'\nRun Code Online (Sandbox Code Playgroud)\n\n有人可以指出我使用 tm 包删除表情符号的正确方向吗?
\n\n谢谢你,
\n\n路易斯
\n我的文字来源于社交网络,所以你可以想象它的本质,我认为文字是我所能想象的干净和最小的;执行以下消毒后:
我认为运行时间是线性的,我不打算进行任何并行化,因为更改可用代码需要付出大量的努力,例如,对于大约 1000 个文本,范围从 ~50 kb 到 ~150 kb 字节,它需要大约
在我的机器上运行时间约为 10 分钟。
有没有更好的方法来输入算法以加快烹饪时间?代码就像 SentimentIntensityAnalyzer 的工作一样简单,这是主要部分
sid = SentimentIntensityAnalyzer()
c.execute("select body, creation_date, group_id from posts where (substring(lower(body) from (%s))=(%s)) and language=\'en\' order by creation _ date DESC (s,s,)")
conn.commit()
if(c.rowcount>0):
dump_fetched = c.fetchall()
textsSql=pd.DataFrame(dump_fetched,columns=['body','created_at', 'group_id'])
del dump_fetched
gc.collect()
texts = textsSql['body'].values
# here, some data manipulation: steps listed above
polarity_ = [sid.polarity_scores(s)['compound'] for s in texts]
Run Code Online (Sandbox Code Playgroud) python performance data-manipulation sentiment-analysis vader
我正在研究一个R项目.我使用的数据集可从以下链接获得 :https://www.kaggle.com/ranjitha1/hotel-reviews-city-chennai/data
我使用的代码是.
df1 = read.csv("chennai.csv", header = TRUE)
library(tidytext)
tidy_books <- df1 %>% unnest_tokens(word,Review_Text)
Run Code Online (Sandbox Code Playgroud)
这里Review_Text是文本列.然而,我得到以下错误.
Error in check_input(x) :
Input must be a character vector of any length or a list of character
vectors, each of which has a length of 1.
Run Code Online (Sandbox Code Playgroud) 我一直在用 tidymodels 为 Animal Crossing 用户评论 ( https://www.youtube.com/watch?v=whE85O1XCkg&t=1300s )从他的 Youtube 情感分析视频中复制 Julia Silge 的代码。在第 25 分钟,她使用 tune_grid(),当我尝试在我的脚本中使用它时,出现以下警告/错误:警告消息:所有模型在 tune_grid() 中失败。见.notes专栏。
在 .notes 中,出现 25 次:
[[1]]
# A tibble: 1 x 1
.notes
<chr>
1 "recipe: Error in UseMethod(\"prep\"): no applicable method for 'prep' applied~
Run Code Online (Sandbox Code Playgroud)
我怎样才能解决这个问题?我使用的代码与 Julia 使用的代码相同。我的整个代码是这样的:
library(tidyverse)
user_reviews <- read_tsv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/user_reviews.tsv")
Run Code Online (Sandbox Code Playgroud)
user_reviews %>%
count(grade) %>%
ggplot(aes(grade,n)) +
geom_col()
Run Code Online (Sandbox Code Playgroud)
user_reviews %>%
filter(grade > 0) %>%
sample_n(5) %>%
pull(text)
Run Code Online (Sandbox Code Playgroud)
reviews_parsed <- user_reviews %>%
mutate(text = str_remove(text, "Expand"),
rating = case_when(grade …Run Code Online (Sandbox Code Playgroud) r sentiment-analysis hyperparameters data-science tidymodels
我正在使用 getoldtweets3 库来抓取电晕爆发信息。我收到此错误 -
error : C:\Users\Vilius\anaconda3\python.exe C:/Users/Vilius/PycharmProjects/Sentiment-Analysis2/twitter_analysis.py
An error occured during an HTTP request: HTTP Error 404: Not Found
Run Code Online (Sandbox Code Playgroud)
尝试在浏览器中打开:https : //twitter.com/search?q=CoronaOutbreak%20since%3A2020-01-01%20until%3A2020-04-01&src=typd
即使链接有效,可能是什么问题? https://github.com/attreyabhatt/Sentiment-Analysis<- 我正在使用此代码
我在 Python Pandas 中有 DataFrame,如下所示:
sentence
------------
I like it
+1
One :-) :)
hah
Run Code Online (Sandbox Code Playgroud)
我需要仅选择包含表情符号或表情符号的行,因此我需要如下所示的内容:
sentence
------------
+1
One :-) :)
Run Code Online (Sandbox Code Playgroud)
我怎样才能在Python中做到这一点?
python ×4
r ×3
emoticons ×2
nltk ×2
data-science ×1
emoji ×1
java ×1
nlp ×1
performance ×1
php ×1
pycharm ×1
scikit-learn ×1
stanford-nlp ×1
tidymodels ×1
tm ×1
tweepy ×1
tweets ×1
twitter ×1
vader ×1