如何从python中管道许多bash命令?

Raf*_*ios 1 python bash subprocess pipe

嗨,我试图从python调用以下命令:

comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v "#" | sed "s/\t//g"
Run Code Online (Sandbox Code Playgroud)

当通信命令的输入也被管道传输时,我怎么能进行调用?

有一种简单直接的方法吗?

我尝试了子进程模块:

subprocess.call("comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'")
Run Code Online (Sandbox Code Playgroud)

没有成功,它说:OSError:[Errno 2]没有这样的文件或目录

或者我必须单独创建不同的调用,然后使用PIPE传递它们,如子进程文档中所述:

p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]
Run Code Online (Sandbox Code Playgroud)

Cha*_*ffy 7

进程替换(<())是仅限bash的功能.因此,你需要一个shell,但它不能只是任何shell(就像在非Windows平台上/bin/sh使用的那样shell=True) - 它需要是bash.

subprocess.call(['bash', '-c', "comm -3 <(awk '{print $1}' File1.txt | sort | uniq) <(awk '{print $1}' File2.txt | sort | uniq) | grep -v '#' | sed 's/\t//g'"])
Run Code Online (Sandbox Code Playgroud)

顺便说一下,如果您要使用任意文件名转到此路由,请将其传递到带外(如下所示:传递_as $0,File1.txtas $1File2.txtas $2):

subprocess.call(['bash', '-c',
  '''comm -3 <(awk '{print $1}' "$1" | sort | uniq) '''
  '''        <(awk '{print $1}' "$2" | sort | uniq) '''
  '''        | grep -v '#' | tr -d "\t"''',
  '_', "File1.txt", "File2.txt"])
Run Code Online (Sandbox Code Playgroud)

也就是说,最佳实践方法确实是自己建立链条.下面是用Python 3.6测试的(注意需要pass_fds参数subprocess.Popen来使文件描述符通过/dev/fd/##可用的链接引用):

awk_filter='''! /#/ && !seen[$1]++ { print $1 }'''

p1 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File1.txt', 'r'),
                      stdout=subprocess.PIPE)
p2 = subprocess.Popen(['sort', '-u'],
                      stdin=p1.stdout,
                      stdout=subprocess.PIPE)
p3 = subprocess.Popen(['awk', awk_filter],
                      stdin=open('File2.txt', 'r'),
                      stdout=subprocess.PIPE)
p4 = subprocess.Popen(['sort', '-u'],
                      stdin=p3.stdout,
                      stdout=subprocess.PIPE)
p5 = subprocess.Popen(['comm', '-3',
                       ('/dev/fd/%d' % (p2.stdout.fileno(),)),
                       ('/dev/fd/%d' % (p4.stdout.fileno(),))],
                      pass_fds=(p2.stdout.fileno(), p4.stdout.fileno()),
                      stdout=subprocess.PIPE)
p6 = subprocess.Popen(['tr', '-d', '\t'],
                      stdin=p5.stdout,
                      stdout=subprocess.PIPE)
result = p6.communicate()
Run Code Online (Sandbox Code Playgroud)

这是更多的代码,但(假设文件名在现实世界中被参数化)它也是更安全的代码 - 你不容易受到ShellShock这样的错误的攻击,这些错误是由启动shell的简单行为触发的,并且不是我需要担心在带外传递变量以避免注入攻击(除了在命令参数的上下文中 - 比如awk脚本语言解释器本身).


也就是说,另一件要考虑的事情就是在原生Python中实现整个过程.

lines_1 = set(line.split()[0] for line in open('File1.txt', 'r') if not '#' in line)
lines_2 = set(line.split()[0] for line in open('File2.txt', 'r') if not '#' in line)
not_common = (lines_1 - lines_2) | (lines_2 - lines_1)
for line in sorted(not_common):
  print line
Run Code Online (Sandbox Code Playgroud)