TensorFlow推断

Question

TensorFlow推断

我一直在研究这个问题.我发现了很多文章; 但没有一个真正只显示张量流推理作为一个简单的推论.它始终"使用服务引擎"或使用预编码/定义的图形.

问题出在这里:我有一个偶尔检查更新模型的设备.然后,它需要加载该模型并通过模型运行输入预测.

在keras这很简单:建立一个模型; 训练模型和调用model.predict().在scikit中学习同样的事情.

我能够抓住一个新模型并加载它; 我可以打印出所有的重量; 但我怎么在世界上推断它？

加载模型和打印重量的代码:

    with tf.Session() as sess:
        new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
        new_saver.restore(sess, MODEL_PATH)
        for var in tf.trainable_variables():
            print(sess.run(var))

Run Code Online (Sandbox Code Playgroud)

我打印出了我的所有收藏品,我有:['queue_runners','变量','损失','摘要','train_op','cond_context','trainable_variables']

我尝试使用sess.run(train_op); 然而,刚开始完整的训练课程; 这不是我想做的.我只想对我提供的不是TF记录的不同输入集进行推理.

再详细一点:

该设备可以使用C++或Python; 只要我能产生.exe.如果我想要提供系统,我可以设置一个feed dict.我用TFRecords训练过; 但在生产中,我不打算使用TFRecords; 它是一个真实/近实时系统.

感谢您的任何意见.我将样本代码发布到此repo:https://github.com/drcrook1/CIFAR10/TensorFlow,它执行所有培训和样本推断.

任何提示都非常感谢!

------------编辑-----------------我重建模型如下:

def inference(images):
    '''
    Portion of the compute graph that takes an input and converts it into a Y output
    '''
    with tf.variable_scope('Conv1') as scope:
        C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
        C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
        P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
    with tf.variable_scope('Dense1') as scope:
        P_1 = tf.reshape(C_1_2, (CONSTANTS.BATCH_SIZE, -1))
        dim = P_1.get_shape()[1].value
        D_1 = ld.mlp_layer(P_1, dim, NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
    with tf.variable_scope('Dense2') as scope:
        D_2 = ld.mlp_layer(D_1, NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
    H = tf.nn.softmax(D_2, name='prediction')
    return H

Run Code Online (Sandbox Code Playgroud)

注意我将名称'prediction'添加到TF操作中,以便稍后检索它.

训练时,我使用输入管道进行tfrecords和输入队列.

GRAPH = tf.Graph()
with GRAPH.as_default():
    examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
                                          batch_size=CONSTANTS.BATCH_SIZE,
                                          img_shape=CONSTANTS.IMAGE_SHAPE,
                                          num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
    examples = tf.reshape(examples, [CONSTANTS.BATCH_SIZE, CONSTANTS.IMAGE_SHAPE[0],
                                     CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]])
    logits = Vgg3CIFAR10.inference(examples)
    loss = Vgg3CIFAR10.loss(logits, labels)
    OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)

Run Code Online (Sandbox Code Playgroud)

我试图在图中加载的操作上使用feed_dict; 然而现在只是悬挂....

MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'

images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))

def run_inference():
    '''Runs inference against a loaded model'''
    with tf.Session() as sess:
        #sess.run(tf.global_variables_initializer())
        new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta', clear_devices=True)
        new_saver.restore(sess, MODEL_PATH)
        pred = tf.get_default_graph().get_operation_by_name('prediction')
        rand = np.random.rand(1, 32, 32, 3)
        print(rand)
        print(pred)
        print(sess.run(pred, feed_dict={images: rand}))
        print('done')

run_inference()

Run Code Online (Sandbox Code Playgroud)

我认为这不起作用,因为原始网络是使用TFRecords训练的.在样本CIFAR数据集中,数据很小; 我们的真实数据集非常庞大,我理解TFRecords是培训网络的默认最佳实践.feed_dict从生产的角度来看非常完美; 我们可以启动一些线程并从我们的输入系统中填充该东西.

所以我想我有一个受过训练的网络,我可以得到预测操作; 但是如何告诉它停止使用输入队列并开始使用feed_dict？请记住,从生产的角度来看,我无法获得科学家所做的任何事情.他们做他们的事; 我们使用任何商定的标准将其粘贴在生产中.

-------输入OPS --------

tf.Operation'input/input_producer/Const'type = Const,tf.Operation'input/input_producer/Size'type = Const,tf.Operation'input/input_producer/Greater/y'type = Const,tf.Operation'input/input_producer/Greater'type = Greater,tf.Operation'input/input_producer/Assert/Const'type = Const,tf.Operation'input/input_producer/Assert/Assert/data_0'type = Const,tf.Operation'input/input_producer/Assert/Assert'type = Assert,tf.Operation'input/input_producer/Identity'type = Identity,tf.Operation'input/input_producer/RandomShuffle'type = RandomShuffle,tf.Operation'input/input_producer'type = FIFOQueueV2,tf.操作'input/input_producer/input_producer_EnqueueMany'type = QueueEnqueueManyV2,tf.Operation'input/input_producer/input_producer_Close'type = QueueCloseV2,tf.Operation'input/input_producer/input_producer_Close_1'type = QueueCloseV2,tf.Operation'input/input_producer/input_producer_Size' type = QueueSizeV2,tf.Operation'input/input_producer/Cast'type = Cast,tf.Operation'input/input_produc er/mul/y'type = Const,tf.Operation'input/input_producer/mul'type = Mul,tf.Operation'input/input_producer/fraction_of_32_full/tags'type = Const,tf.Operation'input/input_producer/fraction_of_32_full' type = ScalarSummary,tf.Operation'input/TFRecordReaderV2'type = TFRecordReaderV2,tf.Operation'inputReadReadV2'type = ReaderReadV2,

------ END INPUT OPS -----

----更新3 ----

我相信我需要做的是杀死用TF Records训练的图形的输入部分,并将第一层的输入重新连接到新的输入.它有点像进行手术; 但如果我使用TFRecords训练就像听起来一样疯狂,这是我能找到推理的唯一方法......

完整图表:

要杀的部分:

所以我认为问题变成:如何杀死图表的输入部分并用feed_dict替换它？

对此的跟进将是:这真的是正确的方法吗？这似乎是疯子.

----结束更新3 ----

---链接到检查点文件---

https://drcdata.blob.core.windows.net/checkpoints/CIFAR_10_VGG3_50neuron_1pool_1e-3lr_adam.model.zip?st=2017-05-01T21%3A56%3A00Z&se=2020-05-02T21%3A56%3A00Z&sp=rl&sv=2015- 12-11&SR = b&SIG = oBCGxlOusB4NOEKnSnD%2FTlRYa5NKNIwAX1IyuZXAr9o%3D

- 指向检查点文件的链接---

-----更新4 -----

我放弃了,只是用"正常"方式进行推理,假设我可以让科学家只是腌制他们的模型而我们可以抓住模型泡菜; 解压缩然后对其进行推理.所以为了测试我尝试了正常的方式,假设我们已经解压缩它...它不值得一个豆...

import tensorflow as tf
import CONSTANTS
import Vgg3CIFAR10
import numpy as np
from scipy import misc
import time

MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'

images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))

logits = Vgg3CIFAR10.inference(images)

def run_inference():
'''Runs inference against a loaded model'''
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        new_saver = tf.train.import_meta_graph(MODEL_PATH + '.meta')#, import_scope='1', input_map={'input:0': images})
        new_saver.restore(sess, MODEL_PATH)
        pred = tf.get_default_graph().get_operation_by_name('prediction')
        enq = sess.graph.get_operation_by_name(enqueue_op)
        #tf.train.start_queue_runners(sess)
        print(rand)
        print(pred)
        print(enq)
        for i in range(1, 25):
            img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
            img = img.reshape(1, 32, 32, 3)
            print(sess.run(logits, feed_dict={images : img}))
            time.sleep(3)
        print('done')

run_inference()

Run Code Online (Sandbox Code Playgroud)

Tensorflow最终使用加载模型中的推理函数构建新图形; 然后它将所有其他东西从其他图形追加到它的末尾.那么当我填充一个期望得到推论的feed_dict时; 我只是得到一堆随机垃圾,好像它是第一次通过网络...

再次; 这看起来很疯狂; 我是否真的需要编写自己的框架来序列化和反序列化随机网络？这必须在......之前完成.

-----更新4 -----

再次; 谢谢!

Answer 1

Dav*_*ook 17

好吧,这花了太多时间才弄明白; 所以这是世界其他地方的答案.

快速提醒:我需要坚持一个可以动态加载和推断的模型,而不知道它是如何工作的下层或内部.

步骤1:将模型创建为类,理想情况下使用接口定义

class Vgg3Model:

    NUM_DENSE_NEURONS = 50
    DENSE_RESHAPE = 32 * (CONSTANTS.IMAGE_SHAPE[0] // 2) * (CONSTANTS.IMAGE_SHAPE[1] // 2)

    def inference(self, images):
        '''
        Portion of the compute graph that takes an input and converts it into a Y output
        '''
        with tf.variable_scope('Conv1') as scope:
            C_1_1 = ld.cnn_layer(images, (5, 5, 3, 32), (1, 1, 1, 1), scope, name_postfix='1')
            C_1_2 = ld.cnn_layer(C_1_1, (5, 5, 32, 32), (1, 1, 1, 1), scope, name_postfix='2')
            P_1 = ld.pool_layer(C_1_2, (1, 2, 2, 1), (1, 2, 2, 1), scope)
        with tf.variable_scope('Dense1') as scope:
            P_1 = tf.reshape(P_1, (-1, self.DENSE_RESHAPE))
            dim = P_1.get_shape()[1].value
            D_1 = ld.mlp_layer(P_1, dim, self.NUM_DENSE_NEURONS, scope, act_func=tf.nn.relu)
        with tf.variable_scope('Dense2') as scope:
            D_2 = ld.mlp_layer(D_1, self.NUM_DENSE_NEURONS, CONSTANTS.NUM_CLASSES, scope)
        H = tf.nn.softmax(D_2, name='prediction')
        return H

    def loss(self, logits, labels):
        '''
        Adds Loss to all variables
        '''
        cross_entr = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=labels)
        cross_entr = tf.reduce_mean(cross_entr)
        tf.summary.scalar('cost', cross_entr)
        tf.add_to_collection('losses', cross_entr)
        return tf.add_n(tf.get_collection('losses'), name='total_loss')

Run Code Online (Sandbox Code Playgroud)

第2步:使用您想要的任何输入训练您的网络; 在我的例子中,我使用了Queue Runners和TF Records.请注意,此步骤由不同的团队完成,该团队对模型进行迭代,构建,设计和优化.这也会随着时间而改变.它们产生的输出必须能够从远程位置拉出,因此我们可以在设备上动态加载更新的模型(重新刷新硬件是一种痛苦,特别是如果它在地理上分布的话).在这种情况下; 团队删除与图形保护程序关联的3个文件; 但也是用于该训练课程的模型的泡菜

model = vgg3.Vgg3Model()

def create_sess_ops():
    '''
    Creates and returns operations needed for running
    a tensorflow training session
    '''
    GRAPH = tf.Graph()
    with GRAPH.as_default():
        examples, labels = Inputs.read_inputs(CONSTANTS.RecordPaths,
                                          batch_size=CONSTANTS.BATCH_SIZE,
                                          img_shape=CONSTANTS.IMAGE_SHAPE,
                                          num_threads=CONSTANTS.INPUT_PIPELINE_THREADS)
        examples = tf.reshape(examples, [-1, CONSTANTS.IMAGE_SHAPE[0],
                                     CONSTANTS.IMAGE_SHAPE[1], CONSTANTS.IMAGE_SHAPE[2]], name='infer/input')
        logits = model.inference(examples)
        loss = model.loss(logits, labels)
        OPTIMIZER = tf.train.AdamOptimizer(CONSTANTS.LEARNING_RATE)
        gradients = OPTIMIZER.compute_gradients(loss)
        apply_gradient_op = OPTIMIZER.apply_gradients(gradients)
        gradients_summary(gradients)
        summaries_op = tf.summary.merge_all()
        return [apply_gradient_op, summaries_op, loss, logits], GRAPH

def main():
    '''
    Run and Train CIFAR 10
    '''
    print('starting...')
    ops, GRAPH = create_sess_ops()
    total_duration = 0.0
    with tf.Session(graph=GRAPH) as SESSION:
        COORDINATOR = tf.train.Coordinator()
        THREADS = tf.train.start_queue_runners(SESSION, COORDINATOR)
        SESSION.run(tf.global_variables_initializer())
        SUMMARY_WRITER = tf.summary.FileWriter('Tensorboard/' + CONSTANTS.MODEL_NAME, graph=GRAPH)
        GRAPH_SAVER = tf.train.Saver()

        for EPOCH in range(CONSTANTS.EPOCHS):
            duration = 0
            error = 0.0
            start_time = time.time()
            for batch in range(CONSTANTS.MINI_BATCHES):
                _, summaries, cost_val, prediction = SESSION.run(ops)
                error += cost_val
            duration += time.time() - start_time
            total_duration += duration
            SUMMARY_WRITER.add_summary(summaries, EPOCH)
            print('Epoch %d: loss = %.2f (%.3f sec)' % (EPOCH, error, duration))
            if EPOCH == CONSTANTS.EPOCHS - 1 or error < 0.005:
                print(
                'Done training for %d epochs. (%.3f sec)' % (EPOCH, total_duration)
            )
                break
        GRAPH_SAVER.save(SESSION, 'models/' + CONSTANTS.MODEL_NAME + '.model')
        with open('models/' + CONSTANTS.MODEL_NAME + '.pkl', 'wb') as output:
            pickle.dump(model, output)
        COORDINATOR.request_stop()
        COORDINATOR.join(THREADS)

Run Code Online (Sandbox Code Playgroud)

第3步:运行一些推理.加载你的酸洗模型; 通过在新占位符中管道到logits来创建一个新图形; 然后调用会话恢复.不要恢复整个图形; 只是变量.

MODEL_PATH = 'models/' + CONSTANTS.MODEL_NAME + '.model'
imgs_bsdir = 'C:/data/cifar_10/train/'

images = tf.placeholder(tf.float32, shape=(1, 32, 32, 3))
with open('models/vgg3.pkl', 'rb') as model_in:
model = pickle.load(model_in)
logits = model.inference(images)

def run_inference():
    '''Runs inference against a loaded model'''
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        new_saver = tf.train.Saver()
        new_saver.restore(sess, MODEL_PATH)
        print("Starting...")
        for i in range(20, 30):
            print(str(i) + '.png')
            img = misc.imread(imgs_bsdir + str(i) + '.png').astype(np.float32) / 255.0
            img = img.reshape(1, 32, 32, 3)
            pred = sess.run(logits, feed_dict={images : img})
            max_node = np.argmax(pred)
            print('predicted label: ' + str(max_node))
        print('done')

run_inference()

Run Code Online (Sandbox Code Playgroud)

肯定有方法可以使用接口改进这一点,并且可能更好地打包一切; 但这是有效的,并为我们如何向前发展奠定了基础.

最后注意 当我们最终将其推向生产时,我们最终不得不将愚蠢的mymodel_model.py文件向下发送以构建图形.因此,我们现在为所有模型强制执行命名约定,并且还有生产模型运行的编码标准,以便我们可以正确地执行此操作.

祝好运!

对于那些感兴趣的人来说，这是太多的歌舞了。现在，我们改用CNTK ...为所有团队标准化所有内容并使之始终投入生产变得容易了。 (2认同)

Answer 2

Dav*_*rks 5

虽然它不像model.predict()那样剪切和干燥,但它仍然非常微不足道.

在你的模型中,你应该有一个张量来计算你感兴趣的最终输出,让我们说出张量output.您目前可能只有一个损失功能.如果是这样,则创建另一个张量(模型中的变量),它实际上计算出你想要的输出.

例如,如果您的损失函数是:

tf.nn.sigmoid_cross_entropy_with_logits(last_layer_activation, labels)

Run Code Online (Sandbox Code Playgroud)

并且您希望您的输出在每个类的[0,1]范围内,创建另一个变量:

output = tf.sigmoid(last_layer_activation)

Run Code Online (Sandbox Code Playgroud)

现在,当你打电话时sess.run(...)请求output张量.不要求您通常会训练它的优化OP.当您请求此变量时,tensorflow将执行生成该值所需的最少工作(例如,它不会打扰backprop,损失函数以及所有这些,因为简单的前馈传递是计算所需的全部内容output.

因此,如果您正在创建一个服务来返回模型的推断,那么您需要将模型保存在内存/ gpu中,并重复:

sess.run(output, feed_dict={X: input_data})

Run Code Online (Sandbox Code Playgroud)

您不需要为标签提供它,因为tensorflow不会费心计算生成您请求的输出所不需要的操作.您无需更改模型或任何其他内容.

虽然这种方法可能不那么明显,因为model.predict(...)我认为它的灵活性要大得多.如果你开始玩更复杂的模型,你可能会学会喜欢这种方法.model.predict()就像"在盒子里思考".

归档时间：	8 年，10 月前
查看次数：	17629 次
最近记录：	6 年，9 月前