如何从保留转义哈希字符的文件中删除所有评论

Mik*_*elo 6 text-processing

我知道以前有人问过这个问题,但这只是有点不同:我需要删除所有评论,不包括转义#或其他不意味着开始评论(在单顶点或双顶点之间)

从以下文本开始:

test
# comment
comment on midline # comment
escaped hash "\# this is an escaped hash"
escaped hash "\\# this is not a comment"
not a comment "# this is not a comment - double apices"
not a comment '# this is not a comment - single apices'
this is a comment \\# this is a comment
this is not a comment \# this is not a comment
Run Code Online (Sandbox Code Playgroud)

我想获得

test
comment on midline
escaped hash "\# this is an escaped hash"
escaped hash "\\# this is not a comment"
not a comment "# this is not a comment - double apices"
not a comment '# this is not a comment - single apices'
this is a comment \\
this is not a comment \# this is not a comment
Run Code Online (Sandbox Code Playgroud)

我试过

grep -o '^[^#]*' file
Run Code Online (Sandbox Code Playgroud)

但这也会删除转义的哈希值。

注意:我正在处理的文本确实已转义#( \#) 但缺少双重转义#( \\#),因此是否保留它们对我来说无关紧要。我想删除它们更简洁,因为事实上哈希没有被转义。

don*_*sti 5

有了sed你可以删除以启动线#(零个或多个空格开头),并删除开头的所有字符串#不遵循一个反斜杠(且仅当它不是在两者之间引号1):

sed '/^[[:blank:]]*#/d
/["'\''].*#.*["'\'']/!{
s/\\\\#.*/\\\\/
s/\([^\]\)#.*/\1/
}' infile
Run Code Online (Sandbox Code Playgroud)

1:此解决方案假定一行中有一对引号