abi*_*tio 2 python dataframe pandas
我有一个数据框,例如:
id some_string  
1. blah,count=1,blah
2. blah,blah
3  blah,count=4,blah
4. blah,blah
5  blah,count=4,blah
6. blah,count=3,blah
Run Code Online (Sandbox Code Playgroud)
我想使用 split 设置一个单独的列,其中包含要获取的 count 值:
id some_string        count
1  blah,count=1,blah   1
2  blah,blah           0
3  blah,count=4,blah   4
4  blah,blah           0 
5  blah,count=4,blah   4
6  blah,count=3,blah   3
Run Code Online (Sandbox Code Playgroud)
我试过:
df['count'].str.split('[count=|,]',expand=True)[3]
Run Code Online (Sandbox Code Playgroud)
但它正确地抱怨道:
 Length of values (4) does not match length of index (6)
Run Code Online (Sandbox Code Playgroud)
有没有一种明显的方法可以完成数据帧条目的短暂循环?
不要split,使用extract:
df['count'] = (df['some_string'].str.extract(r'count=(\d+)', expand=False)
               .fillna(0).astype(int)
              )
Run Code Online (Sandbox Code Playgroud)
输出:
    id        some_string  count
0  1.0  blah,count=1,blah      1
1  2.0          blah,blah      0
2  3.0  blah,count=4,blah      4
3  4.0          blah,blah      0
4  5.0  blah,count=4,blah      4
5  6.0  blah,count=3,blah      3
Run Code Online (Sandbox Code Playgroud)
正则表达式:
count=    # match literal "count="
(         # start capturing group
\d+       # capture one or more (+) digits (\d)
)         # end capturing group
Run Code Online (Sandbox Code Playgroud)
        (@mozway 回答比我快,接受他的回答而不是我的回答是正常的)
您可以使用str.extract:
df['count'] = df['some_string'].str.extract('count=(\d+)').fillna(0).astype(int)
print(df)
# Output
   id        some_string count
0   1  blah,count=1,blah     1
1   2          blah,blah     0
2   3  blah,count=4,blah     4
3   4          blah,blah     0
4   5  blah,count=4,blah     4
5   6  blah,count=3,blah     3
Run Code Online (Sandbox Code Playgroud)
        |   归档时间:  |  
           
  |  
        
|   查看次数:  |  
           86 次  |  
        
|   最近记录:  |