如何在 Julia 中获取索引列表的补充?

Dav*_*ira 18 filtering list-comprehension julia

假设我有一个变量i = [1,3,5],它是在我对数组应用过滤器时获得的。现在,假设这个数组有 10 个元素,我想获得“补”索引。我的意思是,我想获得ic = [2,4,6,7,8,9,10].

是否有一种干净而简短的方法来获得这个索引的补充列表?

我的意思是,我可以用常规循环来做到这一点,但是有没有办法通过列表理解来做到这一点?

And*_*kin 13

您可以使用setdiff功能

julia> x = [1, 3, 5];
julia> y = collect(1:10);
julia> setdiff(y, x)
7-element Vector{Int64}:
  2
  4
  6
  7
  8
  9
 10
Run Code Online (Sandbox Code Playgroud)

性能方面,基于循环的实现更好,因为我们可以考虑到新索引是原始索引的子集

function mysetdiff(y, x)
    res = Vector{eltype(y)}(undef, length(y) - length(x))
    i = 1
    @inbounds for el in y
        el ? x && continue
        res[i] = el
        i += 1
    end

    res
end
Run Code Online (Sandbox Code Playgroud)

和比较

using BenchmarkTools

@btime [z for z ? $y if z ? $x]
# 141.893 ns (4 allocations: 288 bytes)
@btime setdiff($y, $x)
# 477.056 ns (8 allocations: 688 bytes)
@btime mysetdiff($y, $x)
# 46.434 ns (1 allocation: 144 bytes)
Run Code Online (Sandbox Code Playgroud)


Hél*_*ins 12

您需要获取所有1:10不在i. 所以使用列表理解:

julia> i = [1,3,5];

julia> ic = [x for x ? 1:10 if x ? i]
7-element Array{Int64,1}:
  2
  4
  6
  7
  8
  9
 10
Run Code Online (Sandbox Code Playgroud)

你可以打字?在 REPL 中使用\in Tab. ? 可以通过\notin Tab. 如果您在不允许这样做的环境中编码,则可以从此处复制或键入以下内容之一:

julia> ic = [x for x in 1:10 if !(x in i)]
Run Code Online (Sandbox Code Playgroud)
julia> ic = [x for x in 1:10 if !in(x, i)] # operators are functions
Run Code Online (Sandbox Code Playgroud)

表现

如果你关心性能,这里是基准:

julia> @benchmark ic = [x for x ? 1:10 if x ? i]
BenchmarkTools.Trial: 
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     462.274 ns (0.00% GC)
  median time:      471.168 ns (0.00% GC)
  mean time:        497.947 ns (2.86% GC)
  maximum time:     13.115 ?s (95.25% GC)
  --------------
  samples:          10000
  evals/sample:     197
Run Code Online (Sandbox Code Playgroud)


Bog*_*ski 12

如果您关心性能,您还可以考虑:

julia> @benchmark deleteat!([1:10;], $i) # indices must be unique and sorted
BenchmarkTools.Trial:
  memory estimate:  160 bytes
  allocs estimate:  1
  --------------
  minimum time:     53.798 ns (0.00% GC)
  median time:      60.790 ns (0.00% GC)
  mean time:        71.125 ns (1.76% GC)
  maximum time:     618.946 ns (77.28% GC)
  --------------
  samples:          10000
  evals/sample:     987
Run Code Online (Sandbox Code Playgroud)

julia> @benchmark (x = trues(10); x[$i] .= false; findall(x)) # if Bool-array is enough for you you can skip the last step to save 50% of time
BenchmarkTools.Trial:
  memory estimate:  272 bytes
  allocs estimate:  3
  --------------
  minimum time:     124.863 ns (0.00% GC)
  median time:      134.033 ns (0.00% GC)
  mean time:        157.668 ns (6.84% GC)
  maximum time:     3.000 ?s (95.44% GC)
  --------------
  samples:          10000
  evals/sample:     905
Run Code Online (Sandbox Code Playgroud)

与之前提出的:

julia> @benchmark ic = [x for x ? 1:10 if x ? $i]
BenchmarkTools.Trial:
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     170.714 ns (0.00% GC)
  median time:      196.571 ns (0.00% GC)
  mean time:        222.510 ns (4.43% GC)
  maximum time:     3.078 ?s (91.01% GC)
  --------------
  samples:          10000
  evals/sample:     700
Run Code Online (Sandbox Code Playgroud)

julia> @benchmark setdiff($[1:10;], $i)
BenchmarkTools.Trial:
  memory estimate:  672 bytes
  allocs estimate:  7
  --------------
  minimum time:     504.145 ns (0.00% GC)
  median time:      514.508 ns (0.00% GC)
  mean time:        589.584 ns (2.75% GC)
  maximum time:     8.954 ?s (90.69% GC)
  --------------
  samples:          10000
  evals/sample:     193
Run Code Online (Sandbox Code Playgroud)

和(来自另一篇文章的自定义实现)

julia> @benchmark mysetdiff($[1:10;], $i)
BenchmarkTools.Trial:
  memory estimate:  144 bytes
  allocs estimate:  1
  --------------
  minimum time:     44.748 ns (0.00% GC)
  median time:      46.869 ns (0.00% GC)
  mean time:        52.780 ns (1.75% GC)
  maximum time:     431.919 ns (88.49% GC)
  --------------
  samples:          10000
  evals/sample:     990
Run Code Online (Sandbox Code Playgroud)

  • 啊 - 是的,因为你的位置与“删除!”相反。 (2认同)

nic*_*y12 7

由于您已经使用了过滤,因此也可以反过来使用。或者,如果您拥有用于 get 的函数,则i可以将其取反以获取索引的补集。

julia> i = [1,3,5];

julia> filter(x -> x ? i, 1:10)
7-element Array{Int64,1}:
  2
  4
  6
  7
  8
  9
 10
Run Code Online (Sandbox Code Playgroud)

基准

@benchmark filter(x -> x ? i, 1:10)
BenchmarkTools.Trial:
  memory estimate:  304 bytes
  allocs estimate:  2
  --------------
  minimum time:     501.036 ns (0.00% GC)
  median time:      511.399 ns (0.00% GC)
  mean time:        546.609 ns (1.26% GC)
  maximum time:     7.516 ?s (92.31% GC)
  --------------
  samples:          10000
  evals/sample:     193
Run Code Online (Sandbox Code Playgroud)

参考方案

@benchmark ic = [x for x ? 1:10 if x ? i]
BenchmarkTools.Trial:
  memory estimate:  288 bytes
  allocs estimate:  4
  --------------
  minimum time:     626.036 ns (0.00% GC)
  median time:      646.154 ns (0.00% GC)
  mean time:        692.179 ns (2.72% GC)
  maximum time:     26.065 ?s (95.43% GC)
  --------------
  samples:          10000
  evals/sample:     169
Run Code Online (Sandbox Code Playgroud)


Prz*_*fel 6

InvertedIndices包中添加到列表中的另一种值得了解的语法(虽然不是太快)(通常它也与 一起加载DataFrames):

(1:10)[Not(i)]
Run Code Online (Sandbox Code Playgroud)
julia> @benchmark (1:10)[Not($i)]
BenchmarkTools.Trial:
  memory estimate:  1.42 KiB
  allocs estimate:  39
  --------------
  minimum time:     1.130 ?s (0.00% GC)
  median time:      1.320 ?s (0.00% GC)
  mean time:        1.713 ?s (5.05% GC)
  maximum time:     324.000 ?s (98.81% GC)
  --------------
  samples:          10000
  evals/sample:     10
Run Code Online (Sandbox Code Playgroud)

  • “Not”实际上来自 InvertedIndices (2认同)