Ant*_*rev 5 arrays julia categorical-data
将分类数组转换为简单数值数组的完美方法是什么?例如:
using CategoricalArrays
a = CategoricalArray(["X", "X", "Y", "Z", "Y", "Y", "Z"])
b = recode(a, "X"=>1, "Y"=>2, "Z"=>3)
Run Code Online (Sandbox Code Playgroud)
作为转换的结果,我们仍然得到一个分类数组,即使我们明确指定了赋值的类型:
b = recode(a, "X"=>1::Int64, "Y"=>2::Int64, "Z"=>3::Int64)
Run Code Online (Sandbox Code Playgroud)
看起来这里需要一些其他方法,但我想不出一个方向
你有两个自然选择:
julia> recode(unwrap.(a), "X"=>1, "Y"=>2, "Z"=>3)
7-element Vector{Int64}:
1
1
2
3
2
2
3
Run Code Online (Sandbox Code Playgroud)
或者
julia> mapping = Dict("X"=>1, "Y"=>2, "Z"=>3)
Dict{String, Int64} with 3 entries:
"Y" => 2
"Z" => 3
"X" => 1
julia> [mapping[v] for v in a]
7-element Vector{Int64}:
1
1
2
3
2
2
3
Run Code Online (Sandbox Code Playgroud)
这 Dict方法较慢,但如果您要映射多个级别,则它更灵活。
这里的关键功能是unwrap删除CategoricalValue(在Dict样式中)的“分类”概念unwrap自动调用)
另请注意,如果您只想获取levelcode存储在 a 中的值的s CategoricalArray(R 默认情况下会这样做),那么您可以这样做:
julia> levelcode.(a)
7-element Vector{Int64}:
1
1
2
3
2
2
3
Run Code Online (Sandbox Code Playgroud)
另请注意, withlevelcode missing映射到missing:
julia> x = CategoricalArray(["Y", "X", missing, "Z"])
4-element CategoricalArray{Union{Missing, String},1,UInt32}:
"Y"
"X"
missing
"Z"
julia> levelcode.(x)
4-element Vector{Union{Missing, Int64}}:
2
1
missing
3
Run Code Online (Sandbox Code Playgroud)