将多行特定列连接成一行 pandas

Bio*_*151 1 dataframe pandas

我有一个仪器,可以为我提供每行每分钟的平均数据,每秒提供 60 列的原始时间序列,假设这里为 3 秒(列),以使其更容易。

  Filename  Rec_number  Average  1s  2s  3s
0    type1           1        3   2   3   4
1    type1           2        2   1   2   3
2    type2           1        1   1   1   1
3    type2           2        5   4   5   6
4    type2           3        4   3   4   5
Run Code Online (Sandbox Code Playgroud)

我想合并时间序列,使每个文件有一个时间线,如下所示:

  Filename  1s  2s  3s  4s  5s  6s   7s   8s   9s
0    type1   2   3   4   1   2   3  Nan  Nan  Nan
1    type2   1   1   1   4   5   6    3    4    5
Run Code Online (Sandbox Code Playgroud)

这样做的好方法是什么?提前谢谢你的帮助!

Hen*_*ker 5

我们可以set_index使用长格式,然后使用最后返回宽格式stack创建相对于文件名的枚举组。可以使用 ,进行一些额外的清理,以达到准确的预期输出:groupby cumcountpivotadd_suffixreset_indexrename_axis

new_df = (
    df.set_index('Filename').loc[:, '1s':].stack()
        .droplevel(1)
        .reset_index(name='values')
        .assign(cols=lambda s: s.groupby('Filename').cumcount() + 1)
        .pivot(index='Filename', columns='cols', values='values')
        .add_suffix('s')
        .reset_index()
        .rename_axis(columns=None)
)
Run Code Online (Sandbox Code Playgroud)

new_df

  Filename   1s   2s   3s   4s   5s   6s   7s   8s   9s
0    type1  2.0  3.0  4.0  1.0  2.0  3.0  NaN  NaN  NaN
1    type2  1.0  1.0  1.0  4.0  5.0  6.0  3.0  4.0  5.0
Run Code Online (Sandbox Code Playgroud)

*解释步骤的内嵌注释:

new_df = (
    df.set_index('Filename')  # Maintain Filename
        .loc[:, '1s':]  # Slice all columns from `1s` to the end of the frame
        .stack()  # Go to long format
        .droplevel(1)  # Remove old column headers
        .reset_index(name='values')  # Create DataFrame
        # Create new Column headers
        .assign(cols=lambda s: s.groupby('Filename').cumcount() + 1)
        # Pivot back to wide format
        .pivot(index='Filename', columns='cols', values='values')
        # Add s to the end of column headers
        .add_suffix('s')
        # Restore default Range Index and make FileName a column
        .reset_index()
        # Remove Axis label from columns (Created by pivoting)
        .rename_axis(columns=None)
)
Run Code Online (Sandbox Code Playgroud)

或者,我们可以使用then set_index添加枚举组并以正确的顺序获取值。覆盖列标签和:groupby cumcountunstacksort_indexreset_index

new_df = (
    # Enumerate groups
    df.set_index(['Filename', df.groupby('Filename').cumcount()])
        .loc[:, '1s':]  # Slice from '1s' to the end of the frame
        .unstack()  # Go to wider format
        .sort_index(level=1, axis=1, sort_remaining=False,
                    kind='mergesort')  # stable sort level 1
)
# Overwrite column labels
new_df.columns = [f'{i}s' for i in range(1, 1 + new_df.columns.size)]
# Restore range index and Filename column
new_df = new_df.reset_index()
Run Code Online (Sandbox Code Playgroud)

new_df

  Filename   1s   2s   3s   4s   5s   6s   7s   8s   9s
0    type1  2.0  3.0  4.0  1.0  2.0  3.0  NaN  NaN  NaN
1    type2  1.0  1.0  1.0  4.0  5.0  6.0  3.0  4.0  5.0
Run Code Online (Sandbox Code Playgroud)

设置和导入:

import pandas as pd

df = pd.DataFrame({
    'Filename': ['type1', 'type1', 'type2', 'type2', 'type2'],
    'Rec_number': [1, 2, 1, 2, 3], 
    'Average': [3, 2, 1, 5, 4],
    '1s': [2, 1, 1, 4, 3], 
    '2s': [3, 2, 1, 5, 4], 
    '3s': [4, 3, 1, 6, 5]
})
Run Code Online (Sandbox Code Playgroud)