Max*_*Max 22 python string list series pandas
我有一个Pandas系列的字符串列表:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
Run Code Online (Sandbox Code Playgroud)
如您所见,列表因长度而异.我想要一种有效的方法将其折叠成一个系列
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用分解列表
series_name.split(' ')
Run Code Online (Sandbox Code Playgroud)
但我很难将这些字符串放回一个列表中.
谢谢!
mcw*_*itt 35
这是一个只使用pandas函数的简单方法:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Run Code Online (Sandbox Code Playgroud)
然后
s.apply(pd.Series).stack().reset_index(drop=True)
Run Code Online (Sandbox Code Playgroud)
给出所需的输出.在某些情况下,您可能希望保存原始索引并添加第二级以索引嵌套元素,例如
0 0 slim
1 waist
2 man
1 0 slim
1 waistline
2 0 santa
Run Code Online (Sandbox Code Playgroud)
如果这是你想要的,只需.reset_index(drop=True)从链中省略即可.
Rom*_*tov 31
In pandas version 0.25.0 appeared a new method 'explode' for series and dataframes. Older versions do not have such method.
It helps to build the result you need.
For example you have such series:
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']])
Run Code Online (Sandbox Code Playgroud)
Then you can use
s.explode()
Run Code Online (Sandbox Code Playgroud)
To get such result:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
Run Code Online (Sandbox Code Playgroud)
In case of dataframe:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa']
]),
'a': 1
})
Run Code Online (Sandbox Code Playgroud)
You will have such DataFrame:
s a
0 [slim, waist, man] 1
1 [slim, waistline] 1
2 [santa] 1
Run Code Online (Sandbox Code Playgroud)
Applying explode on s column:
df.explode('s')
Run Code Online (Sandbox Code Playgroud)
Will give you such result:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
Run Code Online (Sandbox Code Playgroud)
If your series, contain empty lists
import pandas as pd
s = pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
])
Run Code Online (Sandbox Code Playgroud)
Then running explode will introduce NaN values for empty lists, like this:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
3 NaN
Run Code Online (Sandbox Code Playgroud)
If this is not desired, you can dropna method call:
s.explode().dropna()
Run Code Online (Sandbox Code Playgroud)
To get this result:
0 slim
0 waist
0 man
1 slim
1 waistline
2 santa
Run Code Online (Sandbox Code Playgroud)
Dataframes also have dropna method:
df = pd.DataFrame({
's': pd.Series([
['slim', 'waist', 'man'],
['slim', 'waistline'],
['santa'],
[]
]),
'a': 1
})
Run Code Online (Sandbox Code Playgroud)
Running explode without dropna:
df.explode('s')
Run Code Online (Sandbox Code Playgroud)
Will result into:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
3 NaN 1
Run Code Online (Sandbox Code Playgroud)
with dropna:
df.explode('s').dropna(subset=['s'])
Run Code Online (Sandbox Code Playgroud)
Result:
s a
0 slim 1
0 waist 1
0 man 1
1 slim 1
1 waistline 1
2 santa 1
Run Code Online (Sandbox Code Playgroud)
小智 12
您基本上只是想在这里展平嵌套列表.
你应该只能迭代系列的元素:
slist =[]
for x in series:
slist.extend(x)
Run Code Online (Sandbox Code Playgroud)
或者更清晰(但更难理解)列表理解:
slist = [st for row in s for st in row]
Run Code Online (Sandbox Code Playgroud)
series_name.sum()
Run Code Online (Sandbox Code Playgroud)
确实满足您的需求。请确保它是一系列列表,否则您的值将被串联(如果是字符串)或添加(如果是int)
您可以尝试使用itertools.chain来简化列表:
In [70]: from itertools import chain
In [71]: import pandas as pnd
In [72]: s = pnd.Series([['slim', 'waist', 'man'], ['slim', 'waistline'], ['santa']])
In [73]: s
Out[73]:
0 [slim, waist, man]
1 [slim, waistline]
2 [santa]
dtype: object
In [74]: new_s = pnd.Series(list(chain(*s.values)))
In [75]: new_s
Out[75]:
0 slim
1 waist
2 man
3 slim
4 waistline
5 santa
dtype: object
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
14979 次 |
| 最近记录: |