Jus*_* CR 4 python dataframe pandas
我正在尝试根据列中的数值分解现有数据框。例如,如果该列的数值为 3,我想要其中 3 行,依此类推。
假设我们从这个数据框开始:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)
我们如何获得这个数据框?
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)
我已经通过使用下面的代码来实现此目的,但我想知道是否有一种更简单的方法来完成此操作,而无需手动创建逗号分隔的列表以输入到爆炸方法中。
import pandas as pd
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'], 'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'], 'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print(df1) #print original dataframe
#create my_comma_list column based on number values in calc column
df1.insert(3, 'my_comma_list', '')
df1.loc[df1['calc'] == 1, 'my_comma_list'] = '1'
df1.loc[df1['calc'] == 2, 'my_comma_list'] = '1, 2'
df1.loc[df1['calc'] == 3, 'my_comma_list'] = '1, 2, 3'
df1.loc[df1['calc'] == 4, 'my_comma_list'] = '1, 2, 3, 4'
df1.loc[df1['calc'] == 5, 'my_comma_list'] = '1, 2, 3, 4, 5'
print(df1) #print before row explosion
#explode the rows using the my_comma_list column to get desired number of rows
df1 = df1.assign(my_comma_list=df1['my_comma_list'].str.split(',')).explode('my_comma_list')
#drop the my_comma_list column since we no longer need it
del df1['my_comma_list']
print(df1) #print after row explosion
Run Code Online (Sandbox Code Playgroud)
您可以使用Index.repeat和DataFrame.loc来重复行。
import pandas as pd
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['calc'])]
print (df1)
Run Code Online (Sandbox Code Playgroud)
输出是:
原始数据框:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)
更新了具有重复行的 DataFrame:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)
如果您想通过引用查找根据列值重复行,您可以创建一个字典并确定您希望它重复的次数,然后使用映射来传递该值。
假设您想根据 中的值重复inventory_partner。然后你可以这样做:
import pandas as pd
inv_partner_dict = {'A1':1, 'A2':2, 'A3':5, 'A4':4,'A5':5,'A6':3}
#create dataframe
d = {'inventory_partner': ['A1', 'A2', 'A3', 'A4', 'A5', 'A6'],
'inventory_partner2': ['aa', 'bb', 'cc', 'dd', 'ee', 'ff'],
'calc': [1, 2, 5, 4, 5, 3]}
df1 = pd.DataFrame(data=d)
print (df1)
df1 = df1.loc[df1.index.repeat(df1['inventory_partner2'].map(inv_partner_dict))]
print (df1)
Run Code Online (Sandbox Code Playgroud)
这会做同样的事情。
其输出将是:
原始数据框:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
2 A3 cc 5
3 A4 dd 4
4 A5 ee 5
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)
更新了具有重复行的 DataFrame:
inventory_partner inventory_partner2 calc
0 A1 aa 1
1 A2 bb 2
1 A2 bb 2
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
2 A3 cc 5
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
3 A4 dd 4
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
4 A5 ee 5
5 A6 ff 3
5 A6 ff 3
5 A6 ff 3
Run Code Online (Sandbox Code Playgroud)