在python中搜索无效字符的有效方法

Question

在python中搜索无效字符的有效方法

我正在Django中构建一个论坛应用程序，我想确保用户不要在他们的论坛帖子中输入某些字符。我需要一种有效的方法来扫描他们的整个帖子，以检查无效字符。到目前为止，我所能提供的是以下内容，尽管它不能正常工作，并且我认为这种想法不是很有效。

def clean_topic_message(self):
    topic_message = self.cleaned_data['topic_message']
    words = topic_message.split()
    if (topic_message == ""):
        raise forms.ValidationError(_(u'Please provide a message for your topic'))
    ***for word in words:
        if (re.match(r'[^<>/\{}[]~`]$',topic_message)):
            raise forms.ValidationError(_(u'Topic message cannot contain the following: <>/\{}[]~`'))***
    return topic_message

Run Code Online (Sandbox Code Playgroud)

谢谢你的帮助。

Answer 1

rid*_*ner 5

对于正则表达式解决方案，这里有两种方法：

在字符串中的任意位置找到一个无效字符。
验证字符串中的每个字符。

这是同时实现这两个功能的脚本：

import re
topic_message = 'This topic is a-ok'

# Option 1: Invalidate one char in string.
re1 = re.compile(r"[<>/{}[\]~`]");
if re1.search(topic_message):
    print ("RE1: Invalid char detected.")
else:
    print ("RE1: No invalid char detected.")

# Option 2: Validate all chars in string.
re2 =  re.compile(r"^[^<>/{}[\]~`]*$");
if re2.match(topic_message):
    print ("RE2: All chars are valid.")
else:
    print ("RE2: Not all chars are valid.")

Run Code Online (Sandbox Code Playgroud)

随便你吧。

注意：原始正则表达式在字符类中错误地带有一个右方括号，需要将其转义。

基准测试：在看到了gnibbler有趣的解决方案后set()，我很好奇到底哪种方法最快，所以我决定进行测量。以下是基准数据和测量的报表以及timeit结果值：

测试数据：

r"""
TEST topic_message STRINGS:
ok:  'This topic is A-ok.     This topic is     A-ok.'
bad: 'This topic is <not>-ok. This topic is {not}-ok.'

MEASURED PYTHON STATEMENTS:
Method 1: 're1.search(topic_message)'
Method 2: 're2.match(topic_message)'
Method 3: 'set(invalid_chars).intersection(topic_message)'
"""

Run Code Online (Sandbox Code Playgroud)

结果：

r"""
Seconds to perform 1000000 Ok-match/Bad-no-match loops:
Method  Ok-time  Bad-time
1        1.054    1.190
2        1.830    1.636
3        4.364    4.577
"""

Run Code Online (Sandbox Code Playgroud)

基准测试表明，选项1的速度比选项2的速度略快，并且两者均比set().intersection()方法快得多。对于匹配和不匹配的字符串都是如此。

归档时间：	14 年，10 月前
查看次数：	8046 次
最近记录：	14 年，1 月前