inter_op_parallelism_threads和intra_op_parallelism_threads的含义

Question

inter_op_parallelism_threads和intra_op_parallelism_threads的含义

its*_*ral 50 python parallel-processing distributed-computing tensorflow

有人可以解释以下TensorFlow术语

inter_op_parallelism_threads
intra_op_parallelism_threads

或者,请提供正确解释来源的链接.

我通过改变参数进行了一些测试,但结果并不一致,无法得出结论.

Answer 1

mrr*_*rry 62

的inter_op_parallelism_threads和intra_op_parallelism_threads选项都记录在所述的源tf.ConfigProto协议缓冲器.这些选项配置TensorFlow用于并行执行的两个线程池,如注释所描述:

// The execution of an individual op (for some op types) can be
// parallelized on a pool of intra_op_parallelism_threads.
// 0 means the system picks an appropriate number.
int32 intra_op_parallelism_threads = 2;

// Nodes that perform blocking operations are enqueued on a pool of
// inter_op_parallelism_threads available in each process.
//
// 0 means the system picks an appropriate number.
//
// Note that the first Session created in the process sets the
// number of threads for all future sessions unless use_per_session_threads is
// true or session_inter_op_thread_pool is configured.
int32 inter_op_parallelism_threads = 5;

Run Code Online (Sandbox Code Playgroud)

运行TensorFlow图时,有几种可能的并行形式,这些选项提供了一些控制多核CPU并行性:

如果您有一个可以在内部并行化的操作,例如矩阵乘法(tf.matmul())或缩减(例如tf.reduce_sum()),TensorFlow将通过使用intra_op_parallelism_threads线程调度线程池中的任务来执行它.因此,此配置选项控制单个操作的最大并行加速.请注意,如果并行运行多个操作,这些操作将共享此线程池.
如果在TensorFlow图中有许多独立的操作 - 因为数据流图中它们之间没有定向路径 - TensorFlow将尝试使用带inter_op_parallelism_threads线程的线程池同时运行它们.如果这些操作具有多线程实现,则它们(在大多数情况下)将共享相同的线程池以实现操作内并行性.

最后,两个配置选项都采用默认值0,这意味着"系统选择一个合适的数字".目前,这意味着每个线程池在您的计算机中每个CPU核心都有一个线程.

这些选项控制运行TensorFlow图表时可以获得的最大并行度.但是,它们依赖于您运行并行实现的操作(就像许多标准内核那样),用于操作内并行性; 以及在图中运行的独立运算的可用性,以实现运算间并行性.但是,如果(例如)您的图形是一个线性操作链,并且这些操作只有串行实现,那么这些选项将不会添加并行性.这些选项与容错(或分布式执行)无关. (4认同)
看来这两个选项只适用于CPU而不适用于GPU？如果我有基于多个并行矩阵乘法运算的tf.add_n运算符并在GPU中运行,那么默认情况下如何完成并行化并且可以控制它？ (2认同)

Answer 2

mrk*_*mrk 5

为了从机器上获得最佳性能，请为tensorflow后端（从此处）更改并行线程和OpenMP设置，如下所示：

import tensorflow as tf

#Assume that the number of cores per socket in the machine is denoted as NUM_PARALLEL_EXEC_UNITS
#  when NUM_PARALLEL_EXEC_UNITS=0 the system chooses appropriate settings 

config = tf.ConfigProto(intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS, 
                        inter_op_parallelism_threads=2, 
                        allow_soft_placement=True,
                        device_count = {'CPU': NUM_PARALLEL_EXEC_UNITS})

session = tf.Session(config=config)

Run Code Online (Sandbox Code Playgroud)

下面的评论答案： [来源]

allow_soft_placement=True

Run Code Online (Sandbox Code Playgroud)

如果您希望TensorFlow在不存在指定设备的情况下自动选择一个现有的受支持的设备来运行操作，则可以allow_soft_placement在创建会话时在配置选项中将其设置为True。简而言之，它允许动态分配GPU内存。

什么是“ allow_soft_placement = True”？ (5认同)

归档时间：	9 年前
查看次数：	26077 次
最近记录：	6 年，2 月前