Pandas 在多列上爆炸

nos*_*mos 8 python explode dataframe pandas

使用 Pandas 0.25.3,尝试分解几列。

数据看起来像:

d1 = {'user':['user1','user2','user3','user4'],
      'paid':['Y','Y','N','N']
      'last_active':['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018'],
      'col4':'data'}
Run Code Online (Sandbox Code Playgroud)

我将其发送到一个如下所示的数据帧df=pd.DataFrame([d1],columns=d1.keys())

user                              paid              last_active                                                col4               
['user1','user2','user3','user4'] ['Y','Y','N','N'] ['11 Jul 2019','23 Sep 2018','08 Dec 2019','03 Mar 2018']  'data'
Run Code Online (Sandbox Code Playgroud)

还有其他列每个值一个值,{'A':'B'}类型的东西,但我不担心这些。

当我这样做时,df.explode('user')它对那一列有效,对其他列也一样,但是当我尝试这样做时df.explode(column=('user','paid','last_active'),会出现以下错误:

KeyError: ('user','paid','last_active')

所以我想知道的是,如何使用explode多列上的函数分解它以获得以下 df:

user     paid  last_active    col4
'user1'  'Y'   '11 Jul 2019'  'data'
'user2'  'Y'   '23 Sep 2018'  NaN
'user3'  'N'   '08 Dec 2019'  NaN
'user4'  'N'   '03 Mar 2018'  NaN
Run Code Online (Sandbox Code Playgroud)

All*_*hvk 6

Pandas 没有多列爆炸。有解决方法。一种这样简单的方法可能是:

df = pd.DataFrame(
    {
        'A': [1, 2],
        'B': [['a','b'], ['c','d']],
        'C': [['z','y'], ['x','w']]
    }
)
print(df)

--------------
A    B     C
--------------
1 [a, b] [z, y]
2 [c, d] [x, w]

##Let us say list_cols are the columns to be exploded
list_cols = {'B','C'}

other_cols = list(set(df.columns) - set(list_cols))
##other_cols now contains all the remaining column names in the df
##we temporarily convert to set() to easily get the differences in 2 lists

##now explode the list_cols using a loop
exploded = [df[col].explode() for col in list_cols]
##now we have long list of exploded values. Print to see the format

##This statement creates pairs of the exploded cols
##zip command is used to create the pairs
##dict puts it in an appropriate format from which a dataframe can be created
##Please print the individual outputs of each command to understand the flow
df2 = pd.DataFrame(dict(zip(list_cols, exploded)))

##Now merge back the other_cols as well
df2 = df[other_cols].merge(df2, how="right", left_index=True, right_index=True)

##lastly, re-create the original column order
df2 = df2.loc[:, df.columns]

print(df2)

------
A B C
------
1 a z
1 b y
2 c x
2 d w
Run Code Online (Sandbox Code Playgroud)


ank*_*_91 4

我想你需要(注意数据的差异,col4NoneOP提到的):

pd.DataFrame([[i] if not isinstance(i,list) else i 
             for i in d1.values()],index=d1.keys()).T
Run Code Online (Sandbox Code Playgroud)
    user paid  last_active  col4
0  user1    Y  11 Jul 2019  data
1  user2    Y  23 Sep 2018  None
2  user3    N  08 Dec 2019  None
3  user4    N  03 Mar 2018  None
Run Code Online (Sandbox Code Playgroud)

  • @anky_91 不错!+1 (2认同)