我使用了以下一组代码:我需要检查X_train和X_test的准确性
以下代码适用于我在多标记类中的分类问题
import numpy as np
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.svm import LinearSVC
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.multiclass import OneVsRestClassifier
X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england", …
Run Code Online (Sandbox Code Playgroud) 我必须将值列表更改为多个数组,如下所示:
list_train_data = [u'Class 1',
u'Class 2',
u'Class 3',
u'Class 4',
u'Class 5']
Run Code Online (Sandbox Code Playgroud)
我需要将这个值放入一个数组中:
train_set = [['Class 1'],['Class 2'],['Class 3'],['Class 4'],['Class 5']]
Run Code Online (Sandbox Code Playgroud)
如果可能的话不要用于循环.
我关注了两个数据框 df_sales和df_supply.
我希望以这样的方式合并销售到供应,以便我的df_sales表在以下条件下具有来自df_supply的DATE_SUPPLY和QNT_SUPPLY
*条件:DATE_SUPPLY应该是对应"STORE"的相应"ITEM"的最近DATE_SUPPLY,即DATE_SALE <- max(df_supply[df_supply$DATE_SUPPLY <= df_sales$DATE_SALE & df_supply$STORE == df_sales$STORE & df_supply$ITEM == df_sales$ITEM,]$DATE_SUPPLY)*
可以使用行应用功能或仅通过写循环.但我有庞大的数据集,所以不想循环.
df_sales <- data.frame("STORE"=c(1001,1001,1001,1001,1001,1002,1002,1002,1002,1002),"ITEM"=c(13048, 13057, 13082, 13048, 13057, 13145, 13166, 13229, 13057, 13048),"DATE_SALE"=as.Date(c("1/1/2014","1/1/2014","1/2/2014","1/2/2014","1/2/2014","1/2/2014","1/3/2014","1/3/2014","1/3/2014","1/4/2014"),"%m/%d/%Y"),"QNT_SALE"=c(1,1,1,1,1,1,1,1,1,1))
df_sales
STORE ITEM DATE_SALE QNT_SALE
1 1001 13048 2014-01-01 1
2 1001 13057 2014-01-01 1
3 1001 13082 2014-01-02 1
4 1001 13048 2014-01-02 1
5 1001 13057 2014-01-02 1
6 1002 13145 2014-01-02 1
7 1002 13166 2014-01-03 1
8 1002 13229 2014-01-03 1
9 …
Run Code Online (Sandbox Code Playgroud) python ×2
data.table ×1
dataframe ×1
list ×1
merge ×1
numpy ×1
r ×1
scikit-learn ×1
svc ×1
svm ×1