Szy*_*zyk 10 python sql postgresql text-parsing
我有清除注释和已存在的SQL文件中的空行的问题.该文件有超过10k行,因此不能手动清理它.
我有一个小python脚本,但我不知道如何处理多行插入内的注释.
f = file( 'file.sql', 'r' )
t = filter( lambda x: not x.startswith('--') \
and not x.isspace()
, f.readlines() )
f.close()
t #<- here the cleaned data should be
Run Code Online (Sandbox Code Playgroud)
这应该清理:
-- normal sql comment
Run Code Online (Sandbox Code Playgroud)
这应保持原样:
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
[...]
END;
$$;
INSERT INTO public.texts (multilinetext) VALUES ('
and more lines here \'
-- part of text
\'
[...]
');
Run Code Online (Sandbox Code Playgroud)
试试sqlparse模块.
更新的示例:将注释保留在插入值内,以及CREATE FUNCTION块中的注释.您可以进一步调整以调整行为:
import sqlparse
from sqlparse import tokens
queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);
'''
IGNORE = set(['CREATE FUNCTION',]) # extend this
def _filter(stmt, allow=0):
ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
start = ' '.join(d.value for d in ddl[:2])
if ddl and start in IGNORE:
allow = 1
for tok in stmt.tokens:
if allow or not isinstance(tok, sqlparse.sql.Comment):
yield tok
for stmt in sqlparse.split(queries):
sql = sqlparse.parse(stmt)[0]
print sqlparse.sql.TokenList([t for t in _filter(sql)])
Run Code Online (Sandbox Code Playgroud)
输出:
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT * FROM TABLE foo;
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);
Run Code Online (Sandbox Code Playgroud)
添加更新的答案:)
import sqlparse
sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""
print sqlparse.format(sql_example, strip_comments=True).strip()
Run Code Online (Sandbox Code Playgroud)
输出:
Run Code Online (Sandbox Code Playgroud)SELECT * from test; INSERT INTO test VALUES (' -- test a ');
它实现了相同的结果,但也涵盖了所有其他极端情况并且更加简洁