Zio*_*fil 1 python numpy vectorization numpy-einsum
我有一个L张量(ndarray对象)列表,每个都有几个索引.我需要根据连接图收缩这些指数.
连接被编码在元组列表中,((m,i),(n,j))表示" 将张量的第i个索引与张量L[m]的第j个索引收缩" L[n].
如何处理非平凡连接图?第一个问题是,只要我收缩一对索引,结果就是一个不属于列表的新张量L.但即使我解决了这个问题(例如,通过为所有张量的所有索引提供唯一标识符),也存在一个问题,即人们可以选择任何顺序来执行收缩,而某些选择会在计算中期产生不必要的巨大影响(即使最终结果很小).建议?
除了记忆考虑之外,我相信你可以在一次通话中进行收缩einsum,尽管你需要一些预处理.我并不完全确定你的意思是" 因为我收缩了一对指数,结果是一个不属于列表的新张量L ",但我认为一步完成收缩就能解决这个问题.
我建议使用替代的,数字索引的语法einsum:
einsum(op0, sublist0, op1, sublist1, ..., [sublistout])
Run Code Online (Sandbox Code Playgroud)
所以你需要做的是将索引编码为以整数序列收缩.首先,您需要首先设置一系列唯一索引,并保留另一个副本作为sublistout.然后,迭代连接图,您需要在必要时将合同索引设置为相同的索引,同时从中删除合同索引sublistout.
import numpy as np
def contract_all(tensors,conns):
'''
Contract the tensors inside the list tensors
according to the connectivities in conns
Example input:
tensors = [np.random.rand(2,3),np.random.rand(3,4,5),np.random.rand(3,4)]
conns = [((0,1),(2,0)), ((1,1),(2,1))]
returned shape in this case is (2,3,5)
'''
ndims = [t.ndim for t in tensors]
totdims = sum(ndims)
dims0 = np.arange(totdims)
# keep track of sublistout throughout
sublistout = set(dims0.tolist())
# cut up the index array according to tensors
# (throw away empty list at the end)
inds = np.split(dims0,np.cumsum(ndims))[:-1]
# we also need to convert to a list, otherwise einsum chokes
inds = [ind.tolist() for ind in inds]
# if there were no contractions, we'd call
# np.einsum(*zip(tensors,inds),sublistout)
# instead we need to loop over the connectivity graph
# and manipulate the indices
for (m,i),(n,j) in conns:
# tensors[m][i] contracted with tensors[n][j]
# remove the old indices from sublistout which is a set
sublistout -= {inds[m][i],inds[n][j]}
# contract the indices
inds[n][j] = inds[m][i]
# zip and flatten the tensors and indices
args = [subarg for arg in zip(tensors,inds) for subarg in arg]
# assuming there are no multiple contractions, we're done here
return np.einsum(*args,sublistout)
Run Code Online (Sandbox Code Playgroud)
一个简单的例子:
>>> tensors = [np.random.rand(2,3), np.random.rand(4,3)]
>>> conns = [((0,1),(1,1))]
>>> contract_all(tensors,conns)
array([[ 1.51970003, 1.06482209, 1.61478989, 1.86329518],
[ 1.16334367, 0.60125945, 1.00275992, 1.43578448]])
>>> np.einsum('ij,kj',tensors[0],tensors[1])
array([[ 1.51970003, 1.06482209, 1.61478989, 1.86329518],
[ 1.16334367, 0.60125945, 1.00275992, 1.43578448]])
Run Code Online (Sandbox Code Playgroud)
如果有多次收缩,循环中的物流变得有点复杂,因为我们需要处理所有重复.然而,逻辑是一样的.此外,上述显然缺少检查以确保可以签订相应的指数.
事后我意识到sublistout不必指定默认值einsum,无论如何都要使用该命令.我决定在代码中保留该变量,因为如果我们想要一个非平凡的输出索引顺序,我们必须适当地处理该变量,它可能会派上用场.
至于收缩顺序的优化,您可以在np.einsum版本1.12中实现内部优化(如@hpaulj在现在删除的注释中所述).此版本引入了optimize可选的关键字参数np.einsum,允许选择缩减顺序,以减少内存为代价减少计算时间.传递'greedy'或'optimal'作为optimize关键字将使numpy选择缩小顺序大致递减的维度大小顺序.
optimize关键字可用的选项来自显然没有文档(就在线文档而言; help()幸运的是工作)功能np.einsum_path:
einsum_path(subscripts, *operands, optimize='greedy')
Evaluates the lowest cost contraction order for an einsum expression by
considering the creation of intermediate arrays.
Run Code Online (Sandbox Code Playgroud)
输出收缩路径np.einsum_path也可以用作optimize参数的输入np.einsum.在您的问题中,您担心使用的内存过多,因此我怀疑默认情况下没有优化(可能更长的运行时间和更小的内存占用).