Pandas Dataframe - 将字符串拆分为多列

zea*_*ous 3 python pandas

我是 Pandas 框架的新手,我已经进行了足够的搜索来解决我的问题,但在网上没有得到太多帮助。

我有一个如下所示的字符串列,我想将其转换为单独的列。我的问题是我试过拆分它,但它没有按照我需要的方式给我输出。

*-----------------------------------------------------------------------------*
|  Total Visitor                                                              |
*-----------------------------------------------------------------------------*
|  2x Adult, 1x Adult + Audio Guide                                           |
|  2x Adult, 2x Youth, 1x Children                                            | 
|  5x Adult + Audio Guide, 1x Children + Audio Guide, 1x Senior + Audio Guide |
*-----------------------------------------------------------------------------*
Run Code Online (Sandbox Code Playgroud)

这是我用来分割字符串但没有给我预期输出的代码。

df = data["Total Visitor"].str.split(",", n = 1, expand = True)
Run Code Online (Sandbox Code Playgroud)

拆分字符串后,我的预期输出应如下表所示:

*----------------------------------------------------------------------------------------------------------------*
|  Adult    | Adult + Audio Guide    | Youth   | Children    | Children + AG        | Senior + AG                                                                       
*----------------------------------------------------------------------------------------------------------------*
|  2x Adult | 1x Adult + Audio Guide |    -    |       -     |    -                    | -  
|
|  2x Adult |          -             |2x Youth | 1x Children |    -                    | -                               
|      -    | 5x Adult + Audio Guide |    -    |      -      |1x Children + Audio Guide| 1x Senior + Audio Guide |
*----------------------------------------------------------------------------------------------------------------*
Run Code Online (Sandbox Code Playgroud)

我怎样才能做到这一点?任何帮助或指导都会很棒。

jez*_*ael 6

想法是创建字典列表,其中包含删除数字的键,x通过regex- ^\d+x\s+^是字符串的开头,\d+是一个或多个整数,\s+是一个或多个空格)并传递给DataFrame构造函数:

import re

L =[dict([(re.sub('^\d+x\s+',"",y),y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
print (df)
      Adult     Adult + Audio Guide     Youth     Children  \
0  2x Adult  1x Adult + Audio Guide         -            -   
1  2x Adult                       -  2x Youth  1x Children   
2         -  5x Adult + Audio Guide         -            -   

      Children + Audio Guide     Senior + Audio Guide  
0                          -                        -  
1                          -                        -  
2  1x Children + Audio Guide  1x Senior + Audio Guide  
Run Code Online (Sandbox Code Playgroud)

另一个类似的想法是x从 dicts 的键中拆分为列名:

L = [dict([(y.split('x ')[1], y) for y in x.split(', ')]) for x in df['Total Visitor']]

df = pd.DataFrame(L).fillna('-')
Run Code Online (Sandbox Code Playgroud)