我正在使用scikit中的TfidfVectorizer学习从文本数据中提取一些特征.我有一个带有分数的CSV文件(可以是+1或-1)和一个评论(文本).我将这些数据导入DataFrame,以便运行Vectorizer.
这是我的代码:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
df = pd.read_csv("train_new.csv",
names = ['Score', 'Review'], sep=',')
# x = df['Review'] == np.nan
#
# print x.to_csv(path='FindNaN.csv', sep=',', na_rep = 'string', index=True)
#
# print df.isnull().values.any()
v = TfidfVectorizer(decode_error='replace', encoding='utf-8')
x = v.fit_transform(df['Review'])
Run Code Online (Sandbox Code Playgroud)
这是我得到的错误的追溯:
Traceback (most recent call last):
File "/home/PycharmProjects/Review/src/feature_extraction.py", line 16, in <module>
x = v.fit_transform(df['Review'])
File "/home/b/hw1/local/lib/python2.7/site- packages/sklearn/feature_extraction/text.py", line 1305, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
File "/home/b/work/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.py", line 817, in fit_transform …Run Code Online (Sandbox Code Playgroud) 我想在这里运行CIFAR10图像分类PyTorch教程- http://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py
我做了一个小的更改,并且我使用了另一个数据集。我有Wikiart数据集中要按艺术家分类的图像(标签=艺术家名称)。
这是网络的代码-
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16*5*5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
Run Code Online (Sandbox Code Playgroud)
然后在代码的这一部分中,我开始训练网络。
for epoch in range(2):
running_loss = 0.0
for i, data in enumerate(wiki_train_dataloader, 0):
inputs, labels = data['image'], data['class']
print(inputs.shape) …Run Code Online (Sandbox Code Playgroud)