相关疑难解决方法(0)

如何使用data.table对热量编码因子变量?

对于那些不熟悉的,单热编码简单地指将一列类别(即因子)转换为多列二进制指示符变量,其中每个新列对应于原始列的一个类.这个例子将更好地解释它:

dt <- data.table(
  ID=1:5, 
  Color=factor(c("green", "red", "red", "blue", "green"), levels=c("blue", "green", "red", "purple")),
  Shape=factor(c("square", "triangle", "square", "triangle", "cirlce"))
)

dt
   ID Color    Shape
1:  1 green   square
2:  2   red triangle
3:  3   red   square
4:  4  blue triangle
5:  5 green   cirlce

# one hot encode the colors
color.binarized <- dcast(dt[, list(V1=1, ID, Color)], ID ~ Color, fun=sum, value.var="V1", drop=c(TRUE, FALSE))

# Prepend Color_ in front of each one-hot-encoded feature
setnames(color.binarized, setdiff(colnames(color.binarized), "ID"), paste0("Color_", setdiff(colnames(color.binarized), "ID")))

# one …
Run Code Online (Sandbox Code Playgroud)

r data.table

10
推荐指数
3
解决办法
4923
查看次数

标签 统计

data.table ×1

r ×1