使用子进程输出到HDFS中的文件

Question

使用子进程输出到HDFS中的文件

我有一个逐行读取文本的脚本,稍微修改一行,然后将该行输出到一个文件.我可以将文本读入文件中,问题是我无法输出文本.这是我的代码.

cat = subprocess.Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"], stdout=subprocess.PIPE)
for line in cat.stdout:
    line = line+"Blah";
    subprocess.Popen(["hadoop", "fs", "-put", "/user/test/moddedfile.txt"], stdin=line)

Run Code Online (Sandbox Code Playgroud)

这是我得到的错误.

AttributeError: 'str' object has no attribute 'fileno'
cat: Unable to write to output stream.

Run Code Online (Sandbox Code Playgroud)

Answer 1

jfs*_*jfs 5

stdin参数不接受字符串.它应该是PIPE,None或者是现有文件(具有有效.fileno()或整数文件描述符的东西).

from subprocess import Popen, PIPE

cat = Popen(["hadoop", "fs", "-cat", "/user/test/myfile.txt"],
            stdout=PIPE, bufsize=-1)
put = Popen(["hadoop", "fs", "-put", "-", "/user/test/moddedfile.txt"],
            stdin=PIPE, bufsize=-1)
for line in cat.stdout:
    line += "Blah"
    put.stdin.write(line)

cat.stdout.close()
cat.wait()
put.stdin.close()
put.wait()

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，10 月前
查看次数：	4761 次
最近记录：	11 年，10 月前