wat*_*ake 1 python machine-learning dataframe pandas scikit-learn
我有一个我正在为 SciKit Learn PCA 格式化的 DataFrame 看起来像这样:
datetime | mood | activities | notes
8/27/2017 | "good" | ["friends", "party", "gaming"] | NaN
8/28/2017 | "meh" | ["work", "friends", "good food"] | "Stuff stuff"
8/29/2017 | "bad" | ["work", "travel"] | "Fell off my bike"
Run Code Online (Sandbox Code Playgroud)
...等等
我想把它改成这个,我认为这对机器学习工作会更好:
datetime | mood | friends | party | gaming | work | good food | travel | notes
8/27/2017 | "good" | True | True | True | False | False | False | NaN
8/28/2017 | "meh" | True | False | False | True | True | False | "Stuff stuff"
8/29.2017 | "bad" | False | False | False | False | True | False | True | "Fell off my bike"
Run Code Online (Sandbox Code Playgroud)
我已经尝试过这里概述的方法,它只是为我提供了所有活动的左对齐矩阵。列没有任何意义。如果我尝试传递columns给DataFrame构造函数,则会收到错误消息“传递了 26 列,传递的数据有 9 列。我相信这是因为即使我有 26 个离散事件,我在同一天所做的最多也是 9 个。如果在该特定行中找不到该列,有没有办法让它用 0/False 填充?谢谢。
你可以简单地使用 get_dummies
让我们假设这个数据框:
df = pd.DataFrame({'datetime':pd.date_range('2017-08-27', '2017-08-29'),
'mood':['good','meh','bad'],'activities':[['friends','party','gaming'],
["work", "friends", "good food"],
["work", "travel"]],
'notes':[np.nan, 'stuff stuff','fell off my bike']})
df.set_index(['datetime'], inplace=True)
mood activities notes
datetime
2017-08-27 good [friends, party, gaming] NaN
2017-08-28 meh [work, friends, good food] stuff stuff
2017-08-29 bad [work, travel] fell off my bike
Run Code Online (Sandbox Code Playgroud)
只是concat和get_dummies:
df2 = pd.concat([df[['mood','notes']], pd.get_dummies(df['activities'].apply(pd.Series),
prefix='activity')], axis=1)
mood notes activity_friends activity_work activity_friends activity_party activity_travel activity_gaming activity_good food
datetime
2017-08-27 good NaN 1 0 0 1 0 1 0
2017-08-28 meh stuff stuff 0 1 1 0 0 0 1
2017-08-29 bad fell off my bike 0 1 0 0 1 0 0
Run Code Online (Sandbox Code Playgroud)
如果你想使用,你可以将它们更改为布尔值loc:
df2.loc[:,df2.columns[2:]] = df2.loc[:,df2.columns[2:]].astype(bool)
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1319 次 |
| 最近记录: |