使用 str.split() 设置数据框中列的值,但仅限于某些行

abi*_*tio 2 python dataframe pandas

我有一个数据框,例如:

id some_string  
1. blah,count=1,blah
2. blah,blah
3  blah,count=4,blah
4. blah,blah
5  blah,count=4,blah
6. blah,count=3,blah
Run Code Online (Sandbox Code Playgroud)

我想使用 split 设置一个单独的列,其中包含要获取的 count 值:

id some_string        count
1  blah,count=1,blah   1
2  blah,blah           0
3  blah,count=4,blah   4
4  blah,blah           0 
5  blah,count=4,blah   4
6  blah,count=3,blah   3
Run Code Online (Sandbox Code Playgroud)

我试过:

df['count'].str.split('[count=|,]',expand=True)[3]
Run Code Online (Sandbox Code Playgroud)

但它正确地抱怨道:

 Length of values (4) does not match length of index (6)
Run Code Online (Sandbox Code Playgroud)

有没有一种明显的方法可以完成数据帧条目的短暂循环?

moz*_*way 7

不要split,使用extract

df['count'] = (df['some_string'].str.extract(r'count=(\d+)', expand=False)
               .fillna(0).astype(int)
              )
Run Code Online (Sandbox Code Playgroud)

输出:

    id        some_string  count
0  1.0  blah,count=1,blah      1
1  2.0          blah,blah      0
2  3.0  blah,count=4,blah      4
3  4.0          blah,blah      0
4  5.0  blah,count=4,blah      4
5  6.0  blah,count=3,blah      3
Run Code Online (Sandbox Code Playgroud)

正则表达式:

count=    # match literal "count="
(         # start capturing group
\d+       # capture one or more (+) digits (\d)
)         # end capturing group
Run Code Online (Sandbox Code Playgroud)

正则表达式演示


Cor*_*ien 5

(@mozway 回答比我快,接受他的回答而不是我的回答是正常的)

您可以使用str.extract

df['count'] = df['some_string'].str.extract('count=(\d+)').fillna(0).astype(int)
print(df)

# Output
   id        some_string count
0   1  blah,count=1,blah     1
1   2          blah,blah     0
2   3  blah,count=4,blah     4
3   4          blah,blah     0
4   5  blah,count=4,blah     4
5   6  blah,count=3,blah     3
Run Code Online (Sandbox Code Playgroud)