如何根据特定列中的数值分解pandas数据框

Jus*_* CR 4 python dataframe pandas

我正在尝试根据列中的数值分解现有数据框。例如,如果该列的数值为 3,我想要其中 3 行,依此类推。

假设我们从这个数据框开始:

inventory_partner inventory_partner2  calc
0              A1                 aa     1
1              A2                 bb     2
2              A3                 cc     5
3              A4                 dd     4
4              A5                 ee     5
5              A6                 ff     3
Run Code Online (Sandbox Code Playgroud)

我们如何获得这个数据框?

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3
Run Code Online (Sandbox Code Playgroud)

我已经通过使用下面的代码来实现此目的,但我想知道是否有一种更简单的方法来完成此操作,而无需手动创建逗号分隔的列表以输入到爆炸方法中。

import pandas as pd

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'], 'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'], 'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)

print(df1) #print original dataframe

#create my_comma_list column based on number values in calc column
df1.insert(3, 'my_comma_list', '')
df1.loc[df1['calc'] == 1, 'my_comma_list'] = '1'
df1.loc[df1['calc'] == 2, 'my_comma_list'] = '1, 2'
df1.loc[df1['calc'] == 3, 'my_comma_list'] = '1, 2, 3'
df1.loc[df1['calc'] == 4, 'my_comma_list'] = '1, 2, 3, 4'
df1.loc[df1['calc'] == 5, 'my_comma_list'] = '1, 2, 3, 4, 5'

print(df1) #print before row explosion

#explode the rows using the my_comma_list column to get desired number of rows
df1 = df1.assign(my_comma_list=df1['my_comma_list'].str.split(',')).explode('my_comma_list')
#drop the my_comma_list column since we no longer need it
del df1['my_comma_list']

print(df1) #print after row explosion
Run Code Online (Sandbox Code Playgroud)

Joe*_*ndz 5

您可以使用Index.repeatDataFrame.loc来重复行。

import pandas as pd

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
     'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
     'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['calc'])]
print (df1)
Run Code Online (Sandbox Code Playgroud)

输出是:

原始数据框:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
2                A3                 cc     5
3                A4                 dd     4
4                A5                 ee     5
5                A6                 ff     3
Run Code Online (Sandbox Code Playgroud)

更新了具有重复行的 DataFrame:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3
Run Code Online (Sandbox Code Playgroud)

如果您想通过引用查找根据列值重复行,您可以创建一个字典并确定您希望它重复的次数,然后使用映射来传递该值。

假设您想根据 中的值重复inventory_partner。然后你可以这样做:

import pandas as pd

inv_partner_dict = {'A1':1, 'A2':2, 'A3':5, 'A4':4,'A5':5,'A6':3}

#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
     'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
     'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)


print (df1)
df1 = df1.loc[df1.index.repeat(df1['inventory_partner2'].map(inv_partner_dict))]
print (df1)
Run Code Online (Sandbox Code Playgroud)

这会做同样的事情。

其输出将是:

原始数据框:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
2                A3                 cc     5
3                A4                 dd     4
4                A5                 ee     5
5                A6                 ff     3
Run Code Online (Sandbox Code Playgroud)

更新了具有重复行的 DataFrame:

  inventory_partner inventory_partner2  calc
0                A1                 aa     1
1                A2                 bb     2
1                A2                 bb     2
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
2                A3                 cc     5
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
3                A4                 dd     4
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
4                A5                 ee     5
5                A6                 ff     3
5                A6                 ff     3
5                A6                 ff     3
Run Code Online (Sandbox Code Playgroud)