小编bol*_*wer的帖子

scikit-learn中的TfidfVectorizer:ValueError:np.nan是一个无效的文档

我正在使用scikit中的TfidfVectorizer学习从文本数据中提取一些特征.我有一个带有分数的CSV文件(可以是+1或-1)和一个评论(文本).我将这些数据导入DataFrame,以便运行Vectorizer.

这是我的代码:

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer

df = pd.read_csv("train_new.csv",
             names = ['Score', 'Review'], sep=',')

# x = df['Review'] == np.nan
#
# print x.to_csv(path='FindNaN.csv', sep=',', na_rep = 'string', index=True)
#
# print df.isnull().values.any()

v = TfidfVectorizer(decode_error='replace', encoding='utf-8')
x = v.fit_transform(df['Review'])
Run Code Online (Sandbox Code Playgroud)

这是我得到的错误的追溯:

Traceback (most recent call last):
  File "/home/PycharmProjects/Review/src/feature_extraction.py", line 16, in <module>
x = v.fit_transform(df['Review'])
 File "/home/b/hw1/local/lib/python2.7/site-   packages/sklearn/feature_extraction/text.py", line 1305, in fit_transform
   X = super(TfidfVectorizer, self).fit_transform(raw_documents)
 File "/home/b/work/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform …
Run Code Online (Sandbox Code Playgroud)

python machine-learning tf-idf pandas scikit-learn

36
推荐指数
3
解决办法
3万
查看次数

计算PyTorch中Conv2d的输入和输出大小以进行图像分类

我想在这里运行CIFAR10图像分类PyTorch教程- http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py

我做了一个小的更改,并且我使用了另一个数据集。我有Wikiart数据集中要按艺术家分类的图像(标签=艺术家名称)。

这是网络的代码-

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
Run Code Online (Sandbox Code Playgroud)

然后在代码的这一部分中,我开始训练网络。

for epoch in range(2):
     running_loss = 0.0

     for i, data in enumerate(wiki_train_dataloader, 0):
        inputs, labels = data['image'], data['class']
        print(inputs.shape) …
Run Code Online (Sandbox Code Playgroud)

python image convolution pytorch tensor

1
推荐指数
2
解决办法
8099
查看次数