在稀疏张量中合并重复索引

dtr*_*ers 2 tensorflow

可以说我有一个带有重复索引的稀疏张量,并且在它们重复的地方,我想合并值(将它们加起来),这样做的最佳方法是什么?

例:

indicies = [[1, 1], [1, 2], [1, 2], [1, 3]]
values = [1, 2, 3, 4]

object = tf.SparseTensor(indicies, values, shape=[10, 10])

result = tf.MAGIC(object)
Run Code Online (Sandbox Code Playgroud)

结果应该是具有以下值的备用张量(或具体值!):

indicies = [[1, 1], [1, 2], [1, 3]]
values = [1, 5, 4]
Run Code Online (Sandbox Code Playgroud)

我唯一需要做的就是将索引连接在一起以创建索引哈希,将其应用于第三个维度,然后减少该第三个维度的总和。

indicies = [[1, 1, 11], [1, 2, 12], [1, 2, 12], [1, 3, 13]]
sparse_result = tf.sparse_reduce_sum(sparseTensor, reduction_axes=2, keep_dims=true)
Run Code Online (Sandbox Code Playgroud)

但这感觉非常丑陋

kev*_*man 5

这是使用的解决方案tf.segment_sum。想法是将索引线性化到一维空间,使用tf.unique,run tf.segment_sum和获得唯一索引,然后将索引转换回ND空间。

indices = tf.constant([[1, 1], [1, 2], [1, 2], [1, 3]])
values = tf.constant([1, 2, 3, 4])

# Linearize the indices. If the dimensions of original array are
# [N_{k}, N_{k-1}, ... N_0], then simply matrix multiply the indices
# by [..., N_1 * N_0, N_0, 1]^T. For example, if the sparse tensor
# has dimensions [10, 6, 4, 5], then multiply by [120, 20, 5, 1]^T
# In your case, the dimensions are [10, 10], so multiply by [10, 1]^T

linearized = tf.matmul(indices, [[10], [1]])

# Get the unique indices, and their positions in the array
y, idx = tf.unique(tf.squeeze(linearized))

# Use the positions of the unique values as the segment ids to
# get the unique values
values = tf.segment_sum(values, idx)

# Go back to N-D indices
y = tf.expand_dims(y, 1)
indices = tf.concat([y//10, y%10], axis=1)

tf.InteractiveSession()
print(indices.eval())
print(values.eval())
Run Code Online (Sandbox Code Playgroud)