相关疑难解决方法(0)

使用多处理时,在窗口中强制使用if name ==" main"

在Windows上的python中使用多处理时,应该保护程序的入口点.文档说"确保新的Python解释器可以安全地导入主模块,而不会导致意外的副作用(例如启动新进程)".任何人都可以解释这究竟是什么意思？

python windows multiprocessing

prt*_*kms

lucky-day

11
推荐指数

2
解决办法

5696
查看次数

为什么一段时间后此python多处理脚本速度变慢？

基于此答案的脚本，我有以下情况：一个包含2500个大文本文件的文件夹（每个〜55Mb），所有制表符分隔。基本上是Web日志。

我需要对每个文件的每一行md5哈希第二个“列”，将修改后的文件保存到其他位置。源文件位于机械磁盘上，目标文件位于SSD上。

该脚本可以非常快速地处理前25个（或大约25个）文件。然后，它会减慢WAY的速度。根据前25个文件，它应在2分钟左右的时间内完成所有文件。但是，根据此后的性能，将需要15分钟（或大约15分钟）来完成所有操作。

它运行在具有32 Gb RAM的服务器上，任务管理器很少显示超过6 Gb的使用情况。我将其设置为启动6个进程，但是内核上的CPU使用率很低，很少超过15％。

为什么会变慢？读/写磁盘问题？垃圾收集器？错误的代码？关于如何加快速度的任何想法？

这是剧本

import os

import multiprocessing
from multiprocessing import Process
import threading
import hashlib

class ThreadRunner(threading.Thread):
    """ This class represents a single instance of a running thread"""
    def __init__(self, fileset, filedirectory):
        threading.Thread.__init__(self)
        self.files_to_process = fileset
        self.filedir          = filedirectory

    def run(self):
        for current_file in self.files_to_process:

            # Open the current file as read only
            active_file_name = self.filedir + "/" + current_file
            output_file_name = "D:/hashed_data/" + "hashed_" + current_file

            active_file = open(active_file_name, …

Run Code Online (Sandbox Code Playgroud)

python performance multiprocessing

Cla*_*lay

2017 05-23

8
推荐指数

1
解决办法

2642
查看次数

PyTorch 教程 freeze_support() 问题

我尝试按照 PyTorch 的教程进行操作：https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#sphx-glr-beginner-blitz-cifar10-tutorial-py。

完整代码在这里：

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


# Loading and normalizing CIFAR10
transform = transforms.Compose(
    [transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', …

Run Code Online (Sandbox Code Playgroud)

python machine-learning pytorch

opt*_*fan

2020 11-03

7
推荐指数

1
解决办法

5705
查看次数

在使用concurrent.futures.ProcessPoolExecuter()进行多处理期间,为什么会多次打印此消息？

声明"我应该只出现一次"应该只出现一次.我无法理解为什么它会出现3次......我很清楚我的代码正在执行3个进一步的过程.但是在这三个过程中只会funktion0()被调用.为什么声明"I should appear only once"会包含在这些额外的3个流程中？有人能解释一下吗

码:

from datetime import datetime
#print(datetime.now().time())

from time import time, sleep
#print(time())
print("I should appear only once")
from concurrent import futures


def funktion0(arg0):
    sleep(arg0)
    print(f"ich habe {arg0} sek. gewartet, aktuelle Zeit: {datetime.now().time()}")

if __name__=="__main__":

    with futures.ProcessPoolExecutor(max_workers=3) as obj0:
        obj0.submit(funktion0, 5)
        obj0.submit(funktion0, 10)
        obj0.submit(funktion0, 15)
        obj0.submit(funktion0, 20)
        print("alle Aufgaben gestartet")

    print("alle Aufgaben erledigt")

Run Code Online (Sandbox Code Playgroud)

预期产量:

I should appear only once
alle Aufgaben gestartet
ich habe 5 sek. gewartet, aktuelle Zeit: 18:32:51.926288
ich habe …

Run Code Online (Sandbox Code Playgroud)

python windows python-3.x

vos*_*eta

2019 02-03

6
推荐指数

1
解决办法

37
查看次数

简单的numpy.apply_along_axis（）并行化？

如何将函数对NumPy数组的元素的应用numpy.apply_along_axis()并行化，以利用多核？在通常情况下，要应用的函数的所有调用都是独立的，这似乎是很自然的事情。

在我的特定情况下，如果此事宜，应用的轴线是轴0： np.apply_along_axis(func, axis=0, arr=param_grid)（np是NumPy的）。

我快速浏览了Numba，但似乎无法通过如下循环获得这种并行化：

@numba.jit(parallel=True)
result = np.empty(shape=params.shape[1:])
for index in np.ndindex(*result.shape)):  # All the indices of params[0,...]
    result[index] = func(params[(slice(None),) + index])  # Applying func along axis 0

Run Code Online (Sandbox Code Playgroud)

NumPy中显然还有一个编译选项可通过OpenMP进行并行化，但似乎无法通过MacPorts进行访问。

还可以考虑将数组切成几块并使用线程（以避免复制数据），然后在每块上并行应用函数。这比我要查找的要复杂（如果全局解释器锁释放不充分，则可能不起作用）。

能够以简单的方式使用多个内核来完成简单的可并行化的任务，例如将函数应用于数组的所有元素（这实际上是这里所需要的，函数func()只需一个一维数组），这非常好。参数）。

python arrays parallel-processing performance numpy

Eri*_*got

2017 08-08

4
推荐指数

1
解决办法

2038
查看次数

如何在python多处理中使用全局/公共变量

我最近开始在python中使用多处理,我有以下代码来更新多个进程的列表项.但它正在给出空列表.

from multiprocessing import Pool
import time

global_list = list()


def testfun(n):
    print('started ', n)
    time.sleep(1)
    global_list.append(n)
    print('completed ', n)


def call_multiprocessing_function():
    mytasks = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n']
    with Pool() as pool:
        pool.map(testfun, mytasks)


if __name__ == "__main__":
    print('starting the script')

    print(global_list)
    call_multiprocessing_function()
    print(global_list)

    print('completed the script')

Run Code Online (Sandbox Code Playgroud)

我得到以下输出

starting the script
[]
started  a
started  b
started  c
started  d
completed  a
started  e
completed  b
started  f
completed  c …

Run Code Online (Sandbox Code Playgroud)

python multiprocessing python-3.x python-multiprocessing

new*_*bie

2018 03-16

2
推荐指数

1
解决办法

80
查看次数

标签统计

python ×6

multiprocessing ×3

performance ×2

python-3.x ×2

windows ×2

arrays ×1

machine-learning ×1

numpy ×1

parallel-processing ×1

python-multiprocessing ×1

pytorch ×1

使用多处理时,在窗口中强制使用if __name __ =="__ main__"

为什么一段时间后此python多处理脚本速度变慢？

PyTorch 教程 freeze_support() 问题

在使用concurrent.futures.ProcessPoolExecuter()进行多处理期间,为什么会多次打印此消息？

简单的numpy.apply_along_axis（）并行化？

如何在python多处理中使用全局/公共变量

标签 统计

使用多处理时,在窗口中强制使用if name ==" main"

标签统计