尝试多进程时如何修复“TypeError：无法序列化'_io.BufferedReader'对象”错误

Question

尝试多进程时如何修复“TypeError：无法序列化'_io.BufferedReader'对象”错误

Ars*_*lla 6 python windows pool multiprocessing python-3.x

我正在尝试将代码中的线程切换到多处理以衡量其性能，并希望实现更好的暴力破解潜力，因为我的程序旨在暴力破解受密码保护的 .zip 文件。但是每当我尝试运行该程序时，我都会得到以下信息：

BruteZIP2.py -z "Generic ZIP.zip" -f  Worm.txt
Traceback (most recent call last):
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 40, in <module>
    main(args.zip, args.file)
  File "C:\Users\User\Documents\Jetbrains\PyCharm\BruteZIP\BruteZIP2.py", line 34, in main
    p.start()
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object

Run Code Online (Sandbox Code Playgroud)

我确实找到了与我有相同问题的线程，但它们都没有得到答复/未解决。我也尝试Pool在上面插入，p.start()因为我相信这是由于我在基于 Windows 的机器上造成的，但这没有帮助。我的代码如下：

  import argparse
  from multiprocessing import Process
  import zipfile

  parser = argparse.ArgumentParser(description="Unzips a password protected .zip by performing a brute-force attack using either a word list, password list or a dictionary.", usage="BruteZIP.py -z zip.zip -f file.txt")
  # Creates -z arg
  parser.add_argument("-z", "--zip", metavar="", required=True, help="Location and the name of the .zip file.")
  # Creates -f arg
  parser.add_argument("-f", "--file", metavar="", required=True, help="Location and the name of the word list/password list/dictionary.")
  args = parser.parse_args()


  def extract_zip(zip_file, password):
      try:
          zip_file.extractall(pwd=password)
          print(f"[+] Password for the .zip: {password.decode('utf-8')} \n")
      except:
          # If a password fails, it moves to the next password without notifying the user. If all passwords fail, it will print nothing in the command prompt.
          print(f"Incorrect password: {password.decode('utf-8')}")
          # pass


  def main(zip, file):
      if (zip == None) | (file == None):
          # If the args are not used, it displays how to use them to the user.
          print(parser.usage)
          exit(0)
      zip_file = zipfile.ZipFile(zip)
      # Opens the word list/password list/dictionary in "read binary" mode.
      txt_file = open(file, "rb")
      for line in txt_file:
          password = line.strip()
          p = Process(target=extract_zip, args=(zip_file, password))
          p.start()
          p.join()


  if __name__ == '__main__':
      # BruteZIP.py -z zip.zip -f file.txt.
      main(args.zip, args.file)

Run Code Online (Sandbox Code Playgroud)

正如我之前所说，我相信这主要是因为我现在在一台基于 Windows 的机器上。我与一些在基于 Linux 的机器上的其他人分享了我的代码，他们运行上面的代码没有问题。

与线程相比，我的主要目标是开始 8 个进程/池以最大化完成的尝试次数，但由于我无法修复TypeError: cannot serialize '_io.BufferedReader' object消息，我不确定在这里做什么以及如何继续要解决这个问题。任何援助将不胜感激。

Answer 1

Jea*_*bre 7

文件句柄不能很好地序列化......但是你可以发送zip 文件的名称而不是 zip文件句柄（一个字符串在进程之间序列化没问题）。并避免zip使用您的文件名，因为它是内置的。我选择了zip_filename

p = Process(target=extract_zip, args=(zip_filename, password))

Run Code Online (Sandbox Code Playgroud)

然后：

def extract_zip(zip_filename, password):
      try:
          zip_file = zipfile.ZipFile(zip_filename)
          zip_file.extractall(pwd=password)

Run Code Online (Sandbox Code Playgroud)

另一个问题是您的代码不会并行运行，因为：

      p.start()
      p.join()

Run Code Online (Sandbox Code Playgroud)

p.join等待过程完成......几乎没有用。最后，您必须将进程标识符存储到join它们中。

这可能会导致其他问题：并行创建太多进程可能对您的机器造成问题，并且在某个时间点后不会有太大帮助。考虑使用 amultiprocessing.Pool来限制工人数量。

简单的例子是：

with multiprocessing.Pool(5) as p:
    print(p.map(f, [1, 2, 3, 4, 5, 6, 7]))

Run Code Online (Sandbox Code Playgroud)

适应您的示例：

with multiprocessing.Pool(5) as p:
    p.starmap(extract_zip, [(zip_filename,line.strip()) for line in txt_file])

Run Code Online (Sandbox Code Playgroud)

（starmap将元组扩展为 2 个单独的参数以适合您的extract_zip方法，如Python multiprocessing pool.map 中针对多个参数的解释）

归档时间：	6 年，11 月前
查看次数：	16650 次
最近记录：	6 年，11 月前