小编Piy*_*iya的帖子

是否可以使用word2vec模型对未标记的文本进行情感分析？

我有一些文本数据需要进行情感分类。我对此数据没有正面或负面标签（未标记）。我想使用 Gensim word2vec 模型进行情感分类。
是否有可能做到这一点？因为到目前为止我找不到任何可以做到这一点的东西？每个博客和文章都使用某种标记数据集（例如 imdb 数据集）来训练和测试 word2vec 模型。没有人进一步预测自己的未标记数据。

有人可以告诉我这种可能性（至少理论上）吗？

提前致谢！

sentiment-analysis gensim word2vec python-3.7

Piy*_*iya

lucky-day

7
推荐指数

1
解决办法

9464
查看次数

AttributeError: 由于 sklearn 的新版本，“str”对象没有属性“参数”

我正在使用 sklearn 进行主题建模。在尝试从网格搜索输出中获取对数似然时，出现以下错误：

AttributeError: 'str' 对象没有属性 'parameters'

我想我明白这个问题是：在旧版本中使用了“参数”，而我正在使用 sklearn 的新版本（0.22），这出现了错误。我还搜索了新版本中使用的术语，但找不到。下面是代码：

# Get Log Likelyhoods from Grid Search Output
n_components = [10, 15, 20, 25, 30]
log_likelyhoods_5 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.5]
log_likelyhoods_7 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.7]
log_likelyhoods_9 = [round(gscore.mean_validation_score) for gscore in model.cv_results_ if gscore.parameters['learning_decay']==0.9]

# Show graph
plt.figure(figsize=(12, 8))
plt.plot(n_components, log_likelyhoods_5, label='0.5')
plt.plot(n_components, log_likelyhoods_7, label='0.7')
plt.plot(n_components, log_likelyhoods_9, label='0.9')
plt.title("Choosing Optimal LDA Model")
plt.xlabel("Num Topics")
plt.ylabel("Log Likelyhood Scores")
plt.legend(title='Learning decay', loc='best')
plt.show()

Run Code Online (Sandbox Code Playgroud)

提前致谢！

scikit-learn python-3.7 log-likelihood gridsearchcv

Piy*_*iya

lucky-day

3
推荐指数

1
解决办法

1660
查看次数

使用 Python 3.7 中的 Beautifulsoup 从 WSJ 抓取网页文章？

我正在尝试使用 Python 中的 Beautifulsoup 从华尔街日报中抓取文章。但是，我正在运行的代码执行没有任何错误（退出代码 0）但没有结果。我不明白发生了什么？为什么这段代码没有给出预期的结果。

我什至支付了订阅费。

我知道有些地方不对，但我找不到问题所在。

import time

import requests

from bs4 import BeautifulSoup

url = 'https://www.wsj.com/search/term.html?KEYWORDS=cybersecurity&min-date=2018/04/01&max-date=2019/03/31' \
  '&isAdvanced=true&daysback=90d&andor=AND&sort=date-desc&source=wsjarticle,wsjpro&page={}'

pages = 32
for page in range(1, pages+1):
    res = requests.get(url.format(page))
    soup = BeautifulSoup(res.text,"lxml")
    for item in soup.select(".items.hedSumm li > a"):
        resp = requests.get(item.get("href"))
        _href = item.get("href")

        try:
            resp = requests.get(_href)
        except Exception as e:
            try:
            resp = requests.get("https://www.wsj.com" + _href)
        except Exception as e:
            continue
    sauce = BeautifulSoup(resp.text,"lxml")
    date = sauce.select("time.timestamp.article__timestamp.flexbox__flex--1")
    date = date[0].text
    tag = sauce.select("li.article-breadCrumb span").text …

Run Code Online (Sandbox Code Playgroud)

python beautifulsoup web-scraping

Piy*_*iya

2021 02-05

2
推荐指数

1
解决办法

1671
查看次数

在 Gensim 中查找每个句子中的主导主题时出现打字错误

我正在使用 gensim （在 jupyter 笔记本中）进行主题建模。我成功创建了一个模型并将其可视化。下面是代码：

import time
start_time = time.time()
import re
import spacy
import nltk
import pyLDAvis
import pyLDAvis.gensim
import gensim
import gensim.corpora as corpora
from gensim.utils import simple_preprocess
from gensim.models import CoherenceModel
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.ERROR)
import warnings
warnings.filterwarnings("ignore",category=DeprecationWarning)
# nlp = spacy.load('en')
stop_word_list = nltk.corpus.stopwords.words('english')
stop_word_list.extend(['from', 'subject', 're', 'edu', 'use'])
df = pd.read_csv('Topic_modeling.csv')
data = df.Articles.values.tolist()

# Remove Emails …

Run Code Online (Sandbox Code Playgroud)

python typeerror gensim topic-modeling python-3.7

Piy*_*iya

2023 05-17

1
推荐指数

1
解决办法

1309
查看次数