展平熊猫系列，即元素为列表的系列

Question

展平熊猫系列，即元素为列表的系列

met*_*eto 5 python series python-3.x pandas

我有一系列的表格：

s = Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])

Run Code Online (Sandbox Code Playgroud)

看起来像

0       [a, a, b]
1    [b, b, c, d]
2              []
3       [a, b, e]
dtype: object

Run Code Online (Sandbox Code Playgroud)

我想算一下我总共有多少个元素。我天真的尝试者喜欢

s.values.hist()

Run Code Online (Sandbox Code Playgroud)

要么

s.values.flatten()

Run Code Online (Sandbox Code Playgroud)

没用。我究竟做错了什么？

Answer 1

hei*_*ala 5

如果我们像原始问题一样坚持使用 pandas 系列，那么从 Pandas 0.25.0 版开始，一个巧妙的选择是Series.explode()例程。它向行返回一个展开的列表，其中索引将为这些行复制。

来自问题的原始系列：

s = pd.Series([['a','a','b'],['b','b','c','d'],[],['a','b','e']])

Run Code Online (Sandbox Code Playgroud)

让我们分解它，我们得到一个系列，其中索引重复。索引表示原始列表的索引。

>>> s.explode()
Out:
0      a
0      a
0      b
1      b
1      b
1      c
1      d
2    NaN
3      a
3      b
3      e
dtype: object

>>> type(s.explode())
Out:
pandas.core.series.Series

Run Code Online (Sandbox Code Playgroud)

要计算元素的数量，我们现在可以使用 Series.value_counts()：

>>> s.explode().value_counts()
Out:
b    4
a    3
d    1
c    1
e    1
dtype: int64

Run Code Online (Sandbox Code Playgroud)

还包括 NaN 值：

>>> s.explode().value_counts(dropna=False)
Out:
b      4
a      3
d      1
c      1
e      1
NaN    1
dtype: int64

Run Code Online (Sandbox Code Playgroud)

最后，使用 Series.plot() 绘制直方图：

>>> s.explode().value_counts(dropna=False).plot(kind = 'bar')

Run Code Online (Sandbox Code Playgroud)

Answer 2

Mar*_*ius 2

s.map(len).sum()

Run Code Online (Sandbox Code Playgroud)

就可以了。s.map(len)适用len()于每个元素并返回一系列所有长度，然后您可以仅sum在该系列上使用。

归档时间：	11 年，9 月前
查看次数：	4607 次
最近记录：	11 年，9 月前