在Julia中将类型Array {Union {Missing,Float64},1}转换为Array {Float64,1}

Sch*_*tor 2 julia

我有一些缺少值的浮点数组,因此其类型为Array{Union{Missing, Float64},1}。是否有将非缺失部分转换为的命令Array{Float64,1}

Col*_*ers 6

以下是三种解决方案,按优先顺序排序(感谢@BogumilKaminski的第一个解决方案):

f1(x) = collect(skipmissing(x))
f2(x) = Float64[ a for a in x if !ismissing(a) ]
f3(x) = x[.!ismissing.(x)]
Run Code Online (Sandbox Code Playgroud)

f1 lazy-loads the array with skipmissing (useful for e.g. iteration) and then builds the array via collect.

f2 uses a for loop but is likely to be slower than f1 since the final array length is not computed ahead of time.

f3 uses broadcasting, and allocates temporaries in the process, and so is likely to be the slowest of the three.

We can verify the above with a simple benchmark:

using BenchmarkTools
x = Array{Union{Missing,Float64}}(undef, 100);
inds = unique(rand(1:100, 50));
x[inds] = randn(length(inds));
@btime f1($x);
@btime f2($x);
@btime f3($x);
Run Code Online (Sandbox Code Playgroud)

Resulting in:

julia> @btime f1($x);
  377.186 ns (7 allocations: 1.22 KiB)

julia> @btime f2($x);
  471.204 ns (8 allocations: 1.23 KiB)

julia> @btime f3($x);
  732.726 ns (6 allocations: 4.80 KiB)
Run Code Online (Sandbox Code Playgroud)

  • 我通常会写“ skipmissing(x)”(它返回一个惰性包装器)和“ collect(skipmissing(x))”来实现一个数组。这应该比“ h”更快。 (6认同)