熊猫系列列表到一个系列

Max*_*Max 22 python string list series pandas

我有一个Pandas系列的字符串列表:

0                           [slim, waist, man]
1                                [slim, waistline]
2                                     [santa]
Run Code Online (Sandbox Code Playgroud)

如您所见,列表因长度而异.我想要一种有效的方法将其折叠成一个系列

0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
Run Code Online (Sandbox Code Playgroud)

我知道我可以使用分解列表

series_name.split(' ')
Run Code Online (Sandbox Code Playgroud)

但我很难将这些字符串放回一个列表中.

谢谢!

mcw*_*itt 35

这是一个只使用pandas函数的简单方法:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])
Run Code Online (Sandbox Code Playgroud)

然后

s.apply(pd.Series).stack().reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)

给出所需的输出.在某些情况下,您可能希望保存原始索引并添加第二级以索引嵌套元素,例如

0  0         slim
   1        waist
   2          man
1  0         slim
   1    waistline
2  0        santa
Run Code Online (Sandbox Code Playgroud)

如果这是你想要的,只需.reset_index(drop=True)从链中省略即可.

  • 另请记住,如果输入为空,则 apply 将返回 Series,并且 Series 对象没有 stack 方法... (2认同)
  • 对我来说太慢了。 (2认同)

Rom*_*tov 31

In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.

It helps to build the result you need.

For example you have such series:

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']])
Run Code Online (Sandbox Code Playgroud)

Then you can use

s.explode()
Run Code Online (Sandbox Code Playgroud)

To get such result:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa
Run Code Online (Sandbox Code Playgroud)

In case of dataframe:

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa']
   ]),
   'a': 1
})
Run Code Online (Sandbox Code Playgroud)

You will have such DataFrame:

                    s  a
0  [slim, waist, man]  1
1   [slim, waistline]  1
2             [santa]  1
Run Code Online (Sandbox Code Playgroud)

Applying explode on s column:

df.explode('s')
Run Code Online (Sandbox Code Playgroud)

Will give you such result:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
Run Code Online (Sandbox Code Playgroud)

If your series, contain empty lists

import pandas as pd

s = pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
])
Run Code Online (Sandbox Code Playgroud)

Then running explode will introduce NaN values for empty lists, like this:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa
3          NaN
Run Code Online (Sandbox Code Playgroud)

If this is not desired, you can dropna method call:

s.explode().dropna()
Run Code Online (Sandbox Code Playgroud)

To get this result:

0         slim
0        waist
0          man
1         slim
1    waistline
2        santa
Run Code Online (Sandbox Code Playgroud)

Dataframes also have dropna method:

df = pd.DataFrame({
  's': pd.Series([
    ['slim', 'waist', 'man'],
    ['slim', 'waistline'],
    ['santa'],
    []
   ]),
   'a': 1
})
Run Code Online (Sandbox Code Playgroud)

Running explode without dropna:

df.explode('s')
Run Code Online (Sandbox Code Playgroud)

Will result into:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
3        NaN  1
Run Code Online (Sandbox Code Playgroud)

with dropna:

df.explode('s').dropna(subset=['s'])
Run Code Online (Sandbox Code Playgroud)

Result:

           s  a
0       slim  1
0      waist  1
0        man  1
1       slim  1
1  waistline  1
2      santa  1
Run Code Online (Sandbox Code Playgroud)


小智 12

您基本上只是想在这里展平嵌套列表.

你应该只能迭代系列的元素:

slist =[]
for x in series:
    slist.extend(x)
Run Code Online (Sandbox Code Playgroud)

或者更清晰(但更难理解)列表理解:

slist = [st for row in s for st in row]
Run Code Online (Sandbox Code Playgroud)


Tad*_*jna 7

series_name.sum()
Run Code Online (Sandbox Code Playgroud)

确实满足您的需求。请确保它是一系列列表,否则您的值将被串联(如果是字符串)或添加(如果是int)


pet*_*lds 5

您可以尝试使用itertools.chain来简化列表:

In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]: 
0    [slim, waist, man]
1     [slim, waistline]
2               [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]: 
0         slim
1        waist
2          man
3         slim
4    waistline
5        santa
dtype: object
Run Code Online (Sandbox Code Playgroud)