正则表达式匹配T-SQL脚本中的所有注释

Eri*_*ham 9 regex sql t-sql

我需要一个正则表达式来捕获T-SQL块中的所有注释.Expression需要与.Net Regex类一起使用.

假设我有以下T-SQL:

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar
Run Code Online (Sandbox Code Playgroud)

我需要捕获所有注释,包括多行注释,以便我可以删除它们.

编辑:它有同样的目的,有一个表达式,采取一切但评论.

Jer*_*emy 18

这应该工作:

(--.*)|(((/\*)+?[\w\W]+?(\*/)+))
Run Code Online (Sandbox Code Playgroud)

  • 如果您有包含 SQL 注释的字符串,不幸的是,这也将匹配这些字符串,例如:`SELECT * FROM foo WHERE a = 'This has -- dashes'` (3认同)
  • 不,不.它不支持OP所述的嵌套注释. (2认同)

Adr*_*rat 9

在PHP中,我使用此代码取消注释SQL(这是注释版本 - > x修饰符):

trim( preg_replace( '@
(([\'"]).*?[^\\\]\2) # $1 : Skip single & double quoted expressions
|(                   # $3 : Match comments
    (?:\#|--).*?$    # - Single line comment
    |                # - Multi line (nested) comments
     /\*             #   . comment open marker
        (?: [^/*]    #   . non comment-marker characters
            |/(?!\*) #   . not a comment open
            |\*(?!/) #   . not a comment close
            |(?R)    #   . recursive case
        )*           #   . repeat eventually
    \*\/             #   . comment close marker
)\s*                 # Trim after comments
|(?<=;)\s+           # Trim after semi-colon
@msx', '$1', $sql ) );
Run Code Online (Sandbox Code Playgroud)

精简版:

trim( preg_replace( '@(([\'"]).*?[^\\\]\2)|((?:\#|--).*?$|/\*(?:[^/*]|/(?!\*)|\*(?!/)|(?R))*\*\/)\s*|(?<=;)\s+@ms', '$1', $sql ) );
Run Code Online (Sandbox Code Playgroud)

  • 试图理解这个正则表达式,我就像\ _ _(ツ)_ /¯ (4认同)

Fai*_*Dev 5

使用此代码:

StringCollection resultList = new StringCollection(); 
try {
Regex regexObj = new Regex(@"/\*(?>(?:(?!\*/|/\*).)*)(?>(?:/\*(?>(?:(?!\*/|/\*).)*)\*/(?>(?:(?!\*/|/\*).)*))*).*?\*/|--.*?\r?[\n]", RegexOptions.Singleline);
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
    resultList.Add(matchResult.Value);
    matchResult = matchResult.NextMatch();
} 
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
Run Code Online (Sandbox Code Playgroud)

通过以下输入:

-- This is Comment 1
SELECT Foo FROM Bar
GO

-- This is
-- Comment 2
UPDATE Bar SET Foo == 'Foo'
GO

/* This is Comment 3 */
DELETE FROM Bar WHERE Foo = 'Foo'

/* This is a
multi-line comment */
DROP TABLE Bar

/* comment /* nesting */ of /* two */ levels supported */
foo...
Run Code Online (Sandbox Code Playgroud)

产生这些匹配:

-- This is Comment 1
-- This is
-- Comment 2
/* This is Comment 3 */
/* This is a
multi-line comment */
/* comment /* nesting */ of /* two */ levels supported */
Run Code Online (Sandbox Code Playgroud)

并不是说这只会匹配2级嵌套注释,尽管在我的生活中我从未见过使用多个级别.永远.


dri*_*zin 5

我创建了这个函数,使用普通正则表达式删除所有 SQL 注释。它删除行注释(即使后面没有换行符)和块注释(即使有嵌套的块注释)。此函数还可以替换文字(如果您在 SQL 过程中搜索某些内容但想要忽略字符串,则非常有用)。

我的代码基于这个答案(关于 C# 注释),所以我必须将行注释从“//”更改为“--”,但更重要的是我必须重写块注释正则表达式(使用平衡组),因为SQL 允许嵌套块注释,而 C# 则不允许。

另外,我有这个“ preservePositions ”参数,它不是删除注释,而是用空格填充注释。如果您想保留每个 SQL 命令的原始位置,以防您需要在保留原始注释的同时操作原始脚本,这非常有用。

Regex everythingExceptNewLines = new Regex("[^\r\n]");
public string RemoveComments(string input, bool preservePositions, bool removeLiterals=false)
{
    //based on /sf/ask/246702221/#3524689

    var lineComments = @"--(.*?)\r?\n";
    var lineCommentsOnLastLine = @"--(.*?)$"; // because it's possible that there's no \r\n after the last line comment
    // literals ('literals'), bracketedIdentifiers ([object]) and quotedIdentifiers ("object"), they follow the same structure:
    // there's the start character, any consecutive pairs of closing characters are considered part of the literal/identifier, and then comes the closing character
    var literals = @"('(('')|[^'])*')"; // 'John', 'O''malley''s', etc
    var bracketedIdentifiers = @"\[((\]\])|[^\]])* \]"; // [object], [ % object]] ], etc
    var quotedIdentifiers = @"(\""((\""\"")|[^""])*\"")"; // "object", "object[]", etc - when QUOTED_IDENTIFIER is set to ON, they are identifiers, else they are literals
    //var blockComments = @"/\*(.*?)\*/";  //the original code was for C#, but Microsoft SQL allows a nested block comments // //https://msdn.microsoft.com/en-us/library/ms178623.aspx
    //so we should use balancing groups // http://weblogs.asp.net/whaggard/377025
    var nestedBlockComments = @"/\*
                                (?>
                                /\*  (?<LEVEL>)      # On opening push level
                                | 
                                \*/ (?<-LEVEL>)     # On closing pop level
                                |
                                (?! /\* | \*/ ) . # Match any char unless the opening and closing strings   
                                )+                         # /* or */ in the lookahead string
                                (?(LEVEL)(?!))             # If level exists then fail
                                \*/";

    string noComments = Regex.Replace(input,
            nestedBlockComments + "|" + lineComments + "|" + lineCommentsOnLastLine + "|" + literals + "|" + bracketedIdentifiers + "|" + quotedIdentifiers,
        me => {
            if (me.Value.StartsWith("/*") && preservePositions)
                return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks // return new string(' ', me.Value.Length);
            else if (me.Value.StartsWith("/*") && !preservePositions)
                return "";
            else if (me.Value.StartsWith("--") && preservePositions)
                return everythingExceptNewLines.Replace(me.Value, " "); // preserve positions and keep line-breaks
            else if (me.Value.StartsWith("--") && !preservePositions)
                return everythingExceptNewLines.Replace(me.Value, ""); // preserve only line-breaks // Environment.NewLine;
            else if (me.Value.StartsWith("[") || me.Value.StartsWith("\""))
                return me.Value; // do not remove object identifiers ever
            else if (!removeLiterals) // Keep the literal strings
                return me.Value;
            else if (removeLiterals && preservePositions) // remove literals, but preserving positions and line-breaks
            {
                var literalWithLineBreaks = everythingExceptNewLines.Replace(me.Value, " ");
                return "'" + literalWithLineBreaks.Substring(1, literalWithLineBreaks.Length - 2) + "'";
            }
            else if (removeLiterals && !preservePositions) // wrap completely all literals
                return "''";
            else
                throw new NotImplementedException();
        },
        RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace);
    return noComments;
}
Run Code Online (Sandbox Code Playgroud)

测试 1(首先是原始内容,然后删除注释,最后删除注释/文字)

[select /* block comment */ top 1 'a' /* block comment /* nested block comment */*/ from  sys.tables --LineComment
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables --FinalLineComment]

[select                     top 1 'a'                                               from  sys.tables              
union
select top 1 '/* literal with */-- lots of comments symbols' from sys.tables                   ]

[select                     top 1 ' '                                               from  sys.tables              
union
select top 1 '                                             ' from sys.tables                   ]
Run Code Online (Sandbox Code Playgroud)

测试 2(首先是原始内容,然后删除注释,最后删除注释/文字)

Original:
[create table [/*] /* 
  -- huh? */
(
    "--
     --" integer identity, -- /*
    [*/] varchar(20) /* -- */
         default '*/ /* -- */' /* /* /* */ */ */
);
            go]


[create table [/*]    

(
    "--
     --" integer identity,      
    [*/] varchar(20)         
         default '*/ /* -- */'                  
);
            go]


[create table [/*]    

(
    "--
     --" integer identity,      
    [*/] varchar(20)         
         default '           '                  
);
            go]
Run Code Online (Sandbox Code Playgroud)