我有一个这样的数据集
data = {'weight': ['NaN',2,3,4,'NaN',6,7,8,9,'NaN',11,12,13,14,15],
'MI': ['NaN', 21, 19, 18, 'NaN',16,15,14,13,'NaN',11,10,9,8,7]}
df = pd.DataFrame(data, index= ['group1', "gene1", "gene2", 'gene3',
'group2', "gene1", 'gene21', 'gene4', 'gene7', 'group3',
'gene2', 'gene10', 'gene3', 'gene43', 'gene1'])
Run Code Online (Sandbox Code Playgroud)
我需要将其按组数据框与 MI 值堆叠到基因中。如果特定组没有基因值,则估算值应为 0.1。应删除“权重”列。最终的数据框应该是这样的
您可以使用:
m = df['weight'].ne('NaN')
(df[m]
.set_index((~m).cumsum()[m], append=True)['MI']
.unstack('weight', fill_value=0.1)
.add_prefix('group')
)
Run Code Online (Sandbox Code Playgroud)
变体pivot:
m = df['weight'].ne('NaN')
(df.assign(col=(~m).cumsum())
.loc[m]
.pivot(columns='col', values='MI')
.fillna(0.1)
.add_prefix('group')
)
Run Code Online (Sandbox Code Playgroud)
输出:
weight group1 group2 group3
gene1 21 16 7
gene10 0.1 0.1 10
gene2 19 0.1 11
gene21 0.1 15 0.1
gene3 18 0.1 9
gene4 0.1 14 0.1
gene43 0.1 0.1 8
gene7 0.1 13 0.1
Run Code Online (Sandbox Code Playgroud)
from natsort import natsorted
m = df['weight'].ne('NaN')
grp = df.index.to_series().mask(m).ffill()[m]
out = (df[m]
.set_index(grp, append=True)['MI']
.unstack(-1, fill_value=0.1)
.loc[natsorted(df.index[m].unique())]
)
print(out)
Run Code Online (Sandbox Code Playgroud)
输出:
group1 group2 group3
gene1 21 16 7
gene2 19 0.1 11
gene3 18 0.1 9
gene4 0.1 14 0.1
gene7 0.1 13 0.1
gene10 0.1 0.1 10
gene21 0.1 15 0.1
gene43 0.1 0.1 8
Run Code Online (Sandbox Code Playgroud)