相关疑难解决方法(0)

更快的替代numpy.where？

我有一个3d数组,填充从0到N的整数.我需要一个索引列表,对应于数组所在的位置1,2,3,... N.我可以用np.where来完成,如下所示:

N = 300
shape = (1000,1000,10)
data = np.random.randint(0,N+1,shape)
indx = [np.where(data == i_id) for i_id in range(1,data.max()+1)]

Run Code Online (Sandbox Code Playgroud)

但这很慢.根据这个问题快速python numpy在哪里功能？应该可以加快索引搜索的速度,但是我无法将那里提出的方法转移到我获取实际索引的问题上.什么是加速上述代码的最佳方法？

作为一个附加组件:我想稍后存储索引,为此有意义的是使用np.ravel_multi_index来减小从保存3个索引到仅1的大小,即使用:

indx = [np.ravel_multi_index(np.where(data == i_id), data.shape) for i_id in range(1, data.max()+1)]

Run Code Online (Sandbox Code Playgroud)

这更接近于Matlab的find函数.这可以直接包含在不使用np.where的解决方案中吗？

python numpy

jac*_*cob

2017 05-23

12
推荐指数

2
解决办法

9201
查看次数

获取numpy数组中重复元素的所有索引的列表

我试图在numpy数组中得到所有重复元素的索引,但我现在发现的解决方案对于大型(> 20000个元素)输入数组来说是非常低效的(它需要大约9秒钟).这个想法很简单:

records_array是一个numpy时间戳数组(timedate),我们要从中提取重复时间戳的索引
time_array 是一个numpy数组,包含重复的所有时间戳 records_array
records是一个包含一些Record对象的django QuerySet(可以很容易地转换为列表).我们想要创建一个由Record的tagId属性的所有可能组合形成的对的列表,对应于从中找到的重复时间戳records_array.

这是我目前的工作(但效率低下)代码:

tag_couples = [];
for t in time_array:
    users_inter = np.nonzero(records_array == t)[0] # Get all repeated timestamps in records_array for time t
    l = [str(records[i].tagId) for i in users_inter] # Create a temporary list containing all tagIds recorded at time t
    if l.count(l[0]) != len(l): #remove tuples formed by the first tag repeated
        tag_couples +=[x for x in itertools.combinations(list(set(l)),2)] # Remove duplicates with list(set(l)) and append all …

Run Code Online (Sandbox Code Playgroud)

python arrays django numpy

mor*_*ens

lucky-day

9
推荐指数

2
解决办法

1万
查看次数