我有一个仪器,可以为我提供每行每分钟的平均数据,每秒提供 60 列的原始时间序列,假设这里为 3 秒(列),以使其更容易。
Filename Rec_number Average 1s 2s 3s
0 type1 1 3 2 3 4
1 type1 2 2 1 2 3
2 type2 1 1 1 1 1
3 type2 2 5 4 5 6
4 type2 3 4 3 4 5
Run Code Online (Sandbox Code Playgroud)
我想合并时间序列,使每个文件有一个时间线,如下所示:
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2 3 4 1 2 3 Nan Nan Nan
1 type2 1 1 1 4 5 6 3 4 5
Run Code Online (Sandbox Code Playgroud)
这样做的好方法是什么?提前谢谢你的帮助!
我们可以set_index使用长格式,然后使用最后返回宽格式stack创建相对于文件名的枚举组。可以使用 ,进行一些额外的清理,以达到准确的预期输出:groupby cumcountpivotadd_suffixreset_indexrename_axis
new_df = (
df.set_index('Filename').loc[:, '1s':].stack()
.droplevel(1)
.reset_index(name='values')
.assign(cols=lambda s: s.groupby('Filename').cumcount() + 1)
.pivot(index='Filename', columns='cols', values='values')
.add_suffix('s')
.reset_index()
.rename_axis(columns=None)
)
Run Code Online (Sandbox Code Playgroud)
new_df:
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2.0 3.0 4.0 1.0 2.0 3.0 NaN NaN NaN
1 type2 1.0 1.0 1.0 4.0 5.0 6.0 3.0 4.0 5.0
Run Code Online (Sandbox Code Playgroud)
*解释步骤的内嵌注释:
new_df = (
df.set_index('Filename') # Maintain Filename
.loc[:, '1s':] # Slice all columns from `1s` to the end of the frame
.stack() # Go to long format
.droplevel(1) # Remove old column headers
.reset_index(name='values') # Create DataFrame
# Create new Column headers
.assign(cols=lambda s: s.groupby('Filename').cumcount() + 1)
# Pivot back to wide format
.pivot(index='Filename', columns='cols', values='values')
# Add s to the end of column headers
.add_suffix('s')
# Restore default Range Index and make FileName a column
.reset_index()
# Remove Axis label from columns (Created by pivoting)
.rename_axis(columns=None)
)
Run Code Online (Sandbox Code Playgroud)
或者,我们可以使用then set_index添加枚举组并以正确的顺序获取值。覆盖列标签和:groupby cumcountunstacksort_indexreset_index
new_df = (
# Enumerate groups
df.set_index(['Filename', df.groupby('Filename').cumcount()])
.loc[:, '1s':] # Slice from '1s' to the end of the frame
.unstack() # Go to wider format
.sort_index(level=1, axis=1, sort_remaining=False,
kind='mergesort') # stable sort level 1
)
# Overwrite column labels
new_df.columns = [f'{i}s' for i in range(1, 1 + new_df.columns.size)]
# Restore range index and Filename column
new_df = new_df.reset_index()
Run Code Online (Sandbox Code Playgroud)
new_df:
Filename 1s 2s 3s 4s 5s 6s 7s 8s 9s
0 type1 2.0 3.0 4.0 1.0 2.0 3.0 NaN NaN NaN
1 type2 1.0 1.0 1.0 4.0 5.0 6.0 3.0 4.0 5.0
Run Code Online (Sandbox Code Playgroud)
设置和导入:
import pandas as pd
df = pd.DataFrame({
'Filename': ['type1', 'type1', 'type2', 'type2', 'type2'],
'Rec_number': [1, 2, 1, 2, 3],
'Average': [3, 2, 1, 5, 4],
'1s': [2, 1, 1, 4, 3],
'2s': [3, 2, 1, 5, 4],
'3s': [4, 3, 1, 6, 5]
})
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
282 次 |
| 最近记录: |