如何使用不同的标记分隔符连接多个 Pandas DataFrame 列?

Aad*_*Ura 16 python dataframe python-3.x pandas

我正在尝试使用不同的标记连接多个 Pandas DataFrame 列。

例如,我的数据集如下所示:

dataframe = pd.DataFrame({'col_1' : ['aaa','bbb','ccc','ddd'], 
                          'col_2' : ['name_aaa','name_bbb','name_ccc','name_ddd'], 
                          'col_3' : ['job_aaa','job_bbb','job_ccc','job_ddd']})
Run Code Online (Sandbox Code Playgroud)

我想输出这样的东西:

    features
0   aaa <0> name_aaa <1> job_aaa
1   bbb <0> name_bbb <1> job_bbb
2   ccc <0> name_ccc <1> job_ccc
3   ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)

解释 :

用“<{}>”连接每一列,其中 {} 将增加数字。

到目前为止我尝试过的:

我不想修改原始数据帧,所以我创建了两个新数据帧:

features_df = pd.DataFrame()
final_df    = pd.DataFrame()
for iters in range(len(dataframe.columns)):
    features_df[dataframe.columns[iters]] = dataframe[dataframe.columns[iters]] + ' ' + "<{}>".format(iters)
final_df['features'] = features_df[features_df.columns].agg(' '.join, axis=1)
Run Code Online (Sandbox Code Playgroud)

我面临一个问题,它最后添加了 <2> 但我想要像上面那样的输出,这也不是熊猫执行此任务的方式,我如何使其更有效率?

And*_*ely 8

from itertools import chain

dataframe['features'] = dataframe.apply(lambda x: ''.join([*chain.from_iterable((v, f' <{i}> ') for i, v in enumerate(x))][:-1]), axis=1)

print(dataframe)
Run Code Online (Sandbox Code Playgroud)

印刷:

  col_1     col_2    col_3                      features
0   aaa  name_aaa  job_aaa  aaa <0> name_aaa <1> job_aaa
1   bbb  name_bbb  job_bbb  bbb <0> name_bbb <1> job_bbb
2   ccc  name_ccc  job_ccc  ccc <0> name_ccc <1> job_ccc
3   ddd  name_ddd  job_ddd  ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)


Shu*_*rma 8

您可以df.agg通过传递可选参数来连接数据框的列axis=1。用:

df['features'] = df.agg(
    lambda s: r' <{}> '.join(s).format(*range(s.size)), axis=1)
Run Code Online (Sandbox Code Playgroud)

输出:

# print(df)
  col_1     col_2    col_3                      features
0   aaa  name_aaa  job_aaa  aaa <0> name_aaa <1> job_aaa
1   bbb  name_bbb  job_bbb  bbb <0> name_bbb <1> job_bbb
2   ccc  name_ccc  job_ccc  ccc <0> name_ccc <1> job_ccc
3   ddd  name_ddd  job_ddd  ddd <0> name_ddd <1> job_ddd
Run Code Online (Sandbox Code Playgroud)

  • 这是聪明的解决方案。 (2认同)