在 pandas 中使用正则表达式验证字符串

Question

在 pandas 中使用正则表达式验证字符串

kcE*_*ike 2 python regex string dataframe pandas

我需要一点帮助。

我对 Python 还很陌生（我使用与 Anaconda 捆绑在一起的 3.0 版本），并且我想使用正则表达式来验证/返回仅包含与条件匹配的有效数字的列表（例如 \d{11} 表示 11 位数字）。我正在使用 Pandas 获取列表

df = pd.DataFrame(columns=['phoneNumber','count'], data=[
    ['08034303939',11],
    ['08034382919',11],
    ['0802329292',10],
    ['09039292921',11]])

Run Code Online (Sandbox Code Playgroud)

当我使用以下方式退回所有物品时

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(row[1][0])

Run Code Online (Sandbox Code Playgroud)

它返回所有没有正则表达式验证的项目，但是当我尝试用这个验证时

for row in df.iterrows(): # dataframe.iterrows() returns tuple
    print(re.compile(r"\d{11}").search(row[1][0]).group())

Run Code Online (Sandbox Code Playgroud)

它返回一个属性错误（因为不匹配值的返回值为 None。

我该如何解决这个问题，或者有更简单的方法吗？

Answer 1

cs9*_*s95 5

如果您想验证，可以使用以下方法str.match将其转换为布尔掩码df.astype(bool)：

x = df['phoneNumber'].str.match(r'\d{11}').astype(bool)
x

0     True
1     True
2    False
3     True
Name: phoneNumber, dtype: bool

Run Code Online (Sandbox Code Playgroud)

您可以使用布尔索引仅返回包含有效电话号码的行。

df[x]

   phoneNumber  count
0  08034303939     11
1  08034382919     11
3  09039292921     11

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，7 月前
查看次数：	3874 次
最近记录：	7 年，1 月前