abi*_*tio 2 python dataframe pandas
我有一个数据框,例如:
id some_string
1. blah,count=1,blah
2. blah,blah
3 blah,count=4,blah
4. blah,blah
5 blah,count=4,blah
6. blah,count=3,blah
Run Code Online (Sandbox Code Playgroud)
我想使用 split 设置一个单独的列,其中包含要获取的 count 值:
id some_string count
1 blah,count=1,blah 1
2 blah,blah 0
3 blah,count=4,blah 4
4 blah,blah 0
5 blah,count=4,blah 4
6 blah,count=3,blah 3
Run Code Online (Sandbox Code Playgroud)
我试过:
df['count'].str.split('[count=|,]',expand=True)[3]
Run Code Online (Sandbox Code Playgroud)
但它正确地抱怨道:
Length of values (4) does not match length of index (6)
Run Code Online (Sandbox Code Playgroud)
有没有一种明显的方法可以完成数据帧条目的短暂循环?
不要split,使用extract:
df['count'] = (df['some_string'].str.extract(r'count=(\d+)', expand=False)
.fillna(0).astype(int)
)
Run Code Online (Sandbox Code Playgroud)
输出:
id some_string count
0 1.0 blah,count=1,blah 1
1 2.0 blah,blah 0
2 3.0 blah,count=4,blah 4
3 4.0 blah,blah 0
4 5.0 blah,count=4,blah 4
5 6.0 blah,count=3,blah 3
Run Code Online (Sandbox Code Playgroud)
正则表达式:
count= # match literal "count="
( # start capturing group
\d+ # capture one or more (+) digits (\d)
) # end capturing group
Run Code Online (Sandbox Code Playgroud)
(@mozway 回答比我快,接受他的回答而不是我的回答是正常的)
您可以使用str.extract:
df['count'] = df['some_string'].str.extract('count=(\d+)').fillna(0).astype(int)
print(df)
# Output
id some_string count
0 1 blah,count=1,blah 1
1 2 blah,blah 0
2 3 blah,count=4,blah 4
3 4 blah,blah 0
4 5 blah,count=4,blah 4
5 6 blah,count=3,blah 3
Run Code Online (Sandbox Code Playgroud)