熊猫str.count

Question

熊猫str.count

请考虑以下数据帧.我想计算一个字符串中出现的'$'的数量.我str.count在pandas中使用该函数(http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html).

>>> import pandas as pd
>>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A'])
>>> df['A'].str.count('$')
0    1
1    1
2    1
Name: A, dtype: int64

Run Code Online (Sandbox Code Playgroud)

我期待结果[2,2,1].我究竟做错了什么？

在Python中,count字符串模块中的函数返回正确的结果.

>>> a = "$$$$abcd"
>>> a.count('$')
4
>>> a = '$abcd$dsf$'
>>> a.count('$')
3

Run Code Online (Sandbox Code Playgroud)

Answer 1

Max*_*axU 6

$ 在RegEx中有一个特殊的含义 - 它是行尾,所以试试这个:

In [21]: df.A.str.count(r'\$')
Out[21]:
0    2
1    2
2    1
Name: A, dtype: int64

Run Code Online (Sandbox Code Playgroud)

Answer 2

fug*_*ede 6

正如其他答案所指出的，这里的问题是$表示该行的结尾。如果您不打算使用正则表达式，您可能会发现 using str.count（即内置 type 中的方法str）比它的 pandas 对应项更快；

In [39]: df['A'].apply(lambda x: x.count('$'))
Out[39]: 
0    2
1    2
2    1
Name: A, dtype: int64

In [40]: %timeit df['A'].str.count(r'\$')
1000 loops, best of 3: 243 µs per loop

In [41]: %timeit df['A'].apply(lambda x: x.count('$'))
1000 loops, best of 3: 202 µs per loop

Run Code Online (Sandbox Code Playgroud)

我认为为这么小的系列赛计时并没有多大意义。也就是说，随着它变大，差异变得更加明显！编辑：真正计数应该像其他 str 方法一样有一个 regex=False 标志。 (2认同)

归档时间：	8 年，11 月前
查看次数：	4059 次
最近记录：	8 年，11 月前