在 Julia 中检测和删除数组中重复行的最佳方法是什么?
x = Integer.(round.(10 .* rand(1000,4)))
# In R I would apply the duplicated function.
x = x[duplicated(x),:]
Run Code Online (Sandbox Code Playgroud)
unique is what you are looking for: (this does not answer the question for the detection part.)
julia> x = Integer.(round.(10 .* rand(1000,4)))
1000×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
?
julia> unique(x, 1)
973×4 Array{Int64,2}:
7 3 10 1
7 4 8 9
7 7 3 0
3 4 8 2
?
Run Code Online (Sandbox Code Playgroud)
As for the detection part, a dirty fix would be editing this line:
@nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))
Run Code Online (Sandbox Code Playgroud)
to:
(@nref $N A d->d == dim ? sort!(uniquerows) : (indices(A, d))), uniquerows
Run Code Online (Sandbox Code Playgroud)
Alternatively, you could define your own unique2 with abovementioned changes:
using Base.Cartesian
import Base.Prehashed
@generated function unique2(A::AbstractArray{T,N}, dim::Int) where {T,N}
......
end
julia> y, idx = unique2(x, 1)
julia> y
960×4 Array{Int64,2}:
8 3 1 5
8 3 1 6
1 1 0 1
8 10 1 10
9 1 8 7
?
julia> setdiff(1:1000, idx)
40-element Array{Int64,1}:
99
120
132
140
216
227
?
Run Code Online (Sandbox Code Playgroud)
The benchmark on my machine is:
x = rand(1:10,1000,4) # 48 dups
@btime unique2($x, 1);
124.342 ?s (31 allocations: 145.97 KiB)
@btime duplicated($x);
407.809 ?s (9325 allocations: 394.78 KiB)
x = rand(1:4,1000,4) # 751 dups
@btime unique2($x, 1);
66.062 ?s (25 allocations: 50.30 KiB)
@btime duplicated($x);
222.337 ?s (4851 allocations: 237.88 KiB)
Run Code Online (Sandbox Code Playgroud)
The result shows that the convoluted-metaprogramming-hashtable way in Base benefits a lot from lower memory allocation.
| 归档时间: |
|
| 查看次数: |
7236 次 |
| 最近记录: |