Aad*_*Ura 16 python dataframe python-3.x pandas
我正在尝试使用不同的标记连接多个 Pandas DataFrame 列。
例如,我的数据集如下所示:
dataframe = pd.DataFrame({'col_1' : ['aaa','bbb','ccc','ddd'],
'col_2' : ['name_aaa','name_bbb','name_ccc','name_ddd'],
'col_3' : ['job_aaa','job_bbb','job_ccc','job_ddd']})
Run Code Online (Sandbox Code Playgroud)
我想输出这样的东西:
features
0 aaa <0> name_aaa <1> job_aaa
1 bbb <0> name_bbb <1> job_bbb
2 ccc <0> name_ccc <1> job_ccc
3 ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)
解释 :
用“<{}>”连接每一列,其中 {} 将增加数字。
到目前为止我尝试过的:
我不想修改原始数据帧,所以我创建了两个新数据帧:
features_df = pd.DataFrame()
final_df = pd.DataFrame()
for iters in range(len(dataframe.columns)):
features_df[dataframe.columns[iters]] = dataframe[dataframe.columns[iters]] + ' ' + "<{}>".format(iters)
final_df['features'] = features_df[features_df.columns].agg(' '.join, axis=1)
Run Code Online (Sandbox Code Playgroud)
我面临一个问题,它最后添加了 <2> 但我想要像上面那样的输出,这也不是熊猫执行此任务的方式,我如何使其更有效率?
from itertools import chain
dataframe['features'] = dataframe.apply(lambda x: ''.join([*chain.from_iterable((v, f' <{i}> ') for i, v in enumerate(x))][:-1]), axis=1)
print(dataframe)
Run Code Online (Sandbox Code Playgroud)
印刷:
col_1 col_2 col_3 features
0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa
1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb
2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc
3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)
您可以df.agg
通过传递可选参数来连接数据框的列axis=1
。用:
df['features'] = df.agg(
lambda s: r' <{}> '.join(s).format(*range(s.size)), axis=1)
Run Code Online (Sandbox Code Playgroud)
输出:
# print(df)
col_1 col_2 col_3 features
0 aaa name_aaa job_aaa aaa <0> name_aaa <1> job_aaa
1 bbb name_bbb job_bbb bbb <0> name_bbb <1> job_bbb
2 ccc name_ccc job_ccc ccc <0> name_ccc <1> job_ccc
3 ddd name_ddd job_ddd ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
510 次 |
最近记录: |