python 正则表达式删除注释

cap*_*oke 2 python regex

我将如何编写一个正则表达式来删除所有以 # 开头并在行末尾停止的注释 - 但同时排除前两行

#!/usr/bin/python 
Run Code Online (Sandbox Code Playgroud)

#-*- coding: utf-8 -*-
Run Code Online (Sandbox Code Playgroud)

unu*_*tbu 5

您可以通过解析 Python 代码来删除注释tokenize.generate_tokens。以下是文档中此示例的稍微修改版本:

import tokenize
import io
import sys
if sys.version_info[0] == 3:
    StringIO = io.StringIO
else:
    StringIO = io.BytesIO

def nocomment(s):
    result = []
    g = tokenize.generate_tokens(StringIO(s).readline)  
    for toknum, tokval, _, _, _  in g:
        # print(toknum,tokval)
        if toknum != tokenize.COMMENT:
            result.append((toknum, tokval))
    return tokenize.untokenize(result)

with open('script.py','r') as f:
    content=f.read()

print(nocomment(content))
Run Code Online (Sandbox Code Playgroud)

例如:

如果 script.py 包含

def foo(): # Remove this comment
    ''' But do not remove this #1 docstring 
    '''
    # Another comment
    pass
Run Code Online (Sandbox Code Playgroud)

那么输出nocomment

def foo ():
    ''' But do not remove this #1 docstring 
    '''

    pass 
Run Code Online (Sandbox Code Playgroud)

  • @PiPeep:有关 tokenize 如何处理空格的示例,请参阅 [reindent.py](http://svn.python.org/projects/python/trunk/Tools/scripts/reindent.py)。 (2认同)