张量流推理图性能优化

use*_*156 5 machine-learning deep-learning tensorflow tensorflow-serving tensor

我试图更多地了解在执行tf图时看到的某些令人惊讶的结果。我正在使用的图形只是一片森林（一堆树）。这只是一个简单的前向推论图，与训练无关。我正在分享2个实施的摘要

代码段1：

with tf.name_scope("main"):

    def get_tree_output(offset):
        loop_vars = (offset,)
        leaf_indice = tf.while_loop(cond,
                                    body,
                                    loop_vars,
                                    back_prop=False,
                                    parallel_iterations=1,
                                    name="while_loop")
        tree_score = tf.gather(score_tensor, leaf_indice, name="tree-scores")
        output = tf.add(tree_score, output)

    leaf_indices = tf.map_fn(get_tree_output,
                             tree_offsets_tensor,
                             dtype=INT_TYPE,
                             parallel_iterations=n_trees,
                             back_prop=False,
                             name="tree-scores")

    tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")

    output = tf.reduce_sum(tree_scores, name="sum-output")
    output = tf.sigmoid(output, name="sigmoid-output")

Run Code Online (Sandbox Code Playgroud)

代码段2：

with tf.name_scope("main"):
    tree_offsets_tensor = tf.constant(tree_offsets, dtype=INT_TYPE, name="tree_offsets_tensor")
    loop_vars = (tree_offsets_tensor,)
    leaf_indices = tf.while_loop(cond,
                                 body,
                                 loop_vars,
                                 back_prop=False,
                                 parallel_iterations=n_trees,
                                 name="while_loop")

    tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")

    output = tf.reduce_sum(tree_scores, name="sum-output")
    output = tf.sigmoid(output, name="sigmoid-output")

Run Code Online (Sandbox Code Playgroud)

其余代码完全相同：while循环的常量张量，变量，条件和主体。在两种情况下，线程和并行性也相同代码snippet2：需要大约500微秒来进行推理代码片段1：需要大约12毫秒来进行推理

区别在于，在snippet 1中，我使用map_fn进行操作tree_offset_tensor，而在snippet 2中，则摆脱了该操作map_fn，而直接使用该张量，所以据我所知，在snippet1中get_tree_output方法是使用的一个元素调用tree_offset_tensor的，while_loop每个偏移值均使用倍数，而在代码段2中，我们while_loop仅获取了多个偏移值（基本上是offset_tensor）。

我还尝试了snippet的另一种变体，而不是使用map_fn编写了for循环的手写内容

代码段1（循环的变体）：

output = 0
with tf.name_scope("main"):
    for offset in tree_offsets:
        loop_vars = (offset,)
        leaf_indice = tf.while_loop(cond,
                                    body,
                                    loop_vars,
                                    back_prop=False,
                                    parallel_iterations=1,
                                    name="while_loop")
        tree_score = tf.gather(score_tensor, leaf_indice, name="tree-scores")
        output = tf.add(tree_score, output)

    #leaf_indices = tf.map_fn(get_tree_output,
    #    tree_offsets_tensor, dtype=INT_TYPE,
    #    parallel_iterations=n_trees, back_prop=False,
    #    name="tree-scores")

    #tree_scores = tf.gather(score_tensor, leaf_indices, name="tree-scores")

    #output = tf.reduce_sum(tree_scores, name="sum-output")
    output = tf.sigmoid(output, name="sigmoid-output")

Run Code Online (Sandbox Code Playgroud)

这将带来较小的改进：9毫秒

归档时间：	8 年，2 月前
查看次数：	642 次
最近记录：	8 年，2 月前