来源DF:
In [204]: df
Out[204]:
Country
0 Italy
1 Indonesia
2 Canada
3 Italy
Run Code Online (Sandbox Code Playgroud)
我们可以使用pd.get_dummies():
In [205]: pd.get_dummies(df.Country)
Out[205]:
Canada Indonesia Italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
Run Code Online (Sandbox Code Playgroud)
或者sklearn.feature_extraction.text.CountVectorizer:
In [211]: from sklearn.feature_extraction.text import CountVectorizer
In [212]: cv = CountVectorizer()
In [213]: r = pd.SparseDataFrame(cv.fit_transform(df.Country),
columns=cv.get_feature_names(),
index=df.index,
default_fill_value=0)
In [214]: r
Out[214]:
canada indonesia italy
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 1
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
79 次 |
最近记录: |