小编Jia*_*nbo的帖子

如何在tensorflow中使用dataset.shard?

最近我在研究Tensorflow中的数据集API,并且有一种dataset.shard()用于分布式计算的方法.

这就是Tensorflow文档中所述的内容:

Creates a Dataset that includes only 1/num_shards of this dataset.

d = tf.data.TFRecordDataset(FLAGS.input_file)
d = d.shard(FLAGS.num_workers, FLAGS.worker_index)
d = d.repeat(FLAGS.num_epochs)
d = d.shuffle(FLAGS.shuffle_buffer_size)
d = d.map(parser_fn, num_parallel_calls=FLAGS.num_map_threads)
Run Code Online (Sandbox Code Playgroud)

据说该方法返回原始数据集的一部分.如果我有两个工人,我应该这样做:

d_0 = d.shard(FLAGS.num_workers, worker_0)
d_1 = d.shard(FLAGS.num_workers, worker_1)
......
iterator_0 = d_0.make_initializable_iterator()
iterator_1 = d_1.make_initializable_iterator()

for worker_id in workers:
    with tf.device(worker_id):
        if worker_id == 0:
            data = iterator_0.get_next()
        else:
            data = iterator_1.get_next()
        ......
Run Code Online (Sandbox Code Playgroud)

因为文档没有指定如何进行后续调用,所以我在这里有点困惑.

谢谢!

tensorflow tensorflow-datasets

7
推荐指数
1
解决办法
4686
查看次数

对整个数据集或每次调用iterator.next()进行一次Tensorflow数据集数据预处理?

您好我正在研究tensorflow中的数据集API,我对datat.map()函数有一个问题,该函数执行数据预处理.

file_name = ["image1.jpg", "image2.jpg", ......]
im_dataset = tf.data.Dataset.from_tensor_slices(file_names)
im_dataset = im_dataset.map(lambda image:tuple(tf.py_func(image_parser(), [image], [tf.float32, tf.float32, tf.float32])))
im_dataset = im_dataset.batch(batch_size)
iterator = im_dataset.make_initializable_iterator()
Run Code Online (Sandbox Code Playgroud)

数据集接收图像名称并将其解析为3个张量(关于图像的3个信息).

如果我的训练文件夹中有大量图像,预处理它们需要很长时间.我的问题是,由于数据集API据说是为高效的输入管道而设计的,因此在我将它们提供给我的工作人员(比如说GPU)之前对整个数据集进行预处理,或者每次我只预处理一批图像调用iterator.get_next()?

python tensorflow tensorflow-datasets

6
推荐指数
1
解决办法
2089
查看次数

Dataset API does not pass dimensionality information for its output tensor when using py_func

To reproduce my problem, try this first (mapping with py_func):

import tensorflow as tf
import numpy as np
def image_parser(image_name):
    a = np.array([1.0,2.0,3.0], dtype=np.float32)
    return a

images = [[1,2,3],[4,5,6]]
im_dataset = tf.data.Dataset.from_tensor_slices(images)
im_dataset = im_dataset.map(lambda image:tuple(tf.py_func(image_parser, [image], [tf.float32])), num_parallel_calls = 2)
im_dataset = im_dataset.prefetch(4)
iterator = im_dataset.make_initializable_iterator()
print(im_dataset.output_shapes)
Run Code Online (Sandbox Code Playgroud)

It will give you (TensorShape(None),)

However, if you try this (using direct tensorflow mapping instead of py_func):

import tensorflow as tf
import numpy as np

def image_parser(image_name)
    return image_name

images = [[1,2,3],[4,5,6]] …
Run Code Online (Sandbox Code Playgroud)

python tensorflow tensorflow-datasets

6
推荐指数
1
解决办法
552
查看次数

Django 频道 group_send 无法正常工作

我试图用 django-channels 实现一个竞价模块。基本上,我广播从客户端收到的任何消息,我的消费者部分如下代码片段所示:

class BidderConsumer(AsyncJsonWebsocketConsumer):

    async def connect(self):
        print("Connected")
        await self.accept()
        # Add to group
        self.channel_layer.group_add("bidding", self.channel_name)
        # Add channel to group
        await self.send_json({"msg_type": "connected"})

    async def receive_json(self, content, **kwargs):
        price = int(content.get("price"))
        item_id = int(content.get("item_id"))
        print("receive price ", price)
        print("receive item_id ", item_id)
        if not price or not item_id:
            await self.send_json({"error": "invalid argument"})

        item = await get_item(item_id)
        # Update bidding price
        if price > item.price:
            item.price = price
            await save_item(item)
            # Broadcast new bidding price
            print("performing group send")
            await …
Run Code Online (Sandbox Code Playgroud)

django django-channels

5
推荐指数
1
解决办法
5682
查看次数

Django休息框架如何在序列化器中访问字段

我有一个名为的模型 Video

class Video(models.Model):
    created = models.DateTimeField(auto_now_add=True)
    name = models.CharField(max_length=100, blank=False, null=False, default='', unique=True)
    file = models.FileField(upload_to='videos/', blank=False, null=False)
    owner = models.ForeignKey('auth.User', related_name='videos', on_delete=models.CASCADE, verbose_name='')

    def __str__(self):
        return self.name + ': ' + self.file.name

    class Meta:
        ordering = ('created',)
Run Code Online (Sandbox Code Playgroud)

它的序列化器:

class VideoSerializer(serializers.ModelSerializer):
    owner = serializers.ReadOnlyField(source='owner.username')

    class Meta:
        model = Video
        fields = ['name', 'file', 'owner']
Run Code Online (Sandbox Code Playgroud)

我试图在我的视图中访问序列化程序中的字段,因为我需要它来进行一些处理:

def post(self, request):
    serializer = VideoSerializer(data=request.data)
    if serializer.is_valid():
        # I need the name of the file!!!!!
        # accessing the fields below
        print(serializer.name)
        print(serializer.file.name)
        # …
Run Code Online (Sandbox Code Playgroud)

django django-rest-framework

1
推荐指数
1
解决办法
2089
查看次数