连续变量中有6个级别的因子

Kat*_*ina 2 r factors

我有一个连续的频率变量,范围从0到6.115053.我需要将它分成6个级别,我的分析将以这种方式更具可读性.

我试过了:

frequency.new <-  hist(all$frequency, 6, plot = FALSE)
all$frequency <- as.factor(frequency.new)
Run Code Online (Sandbox Code Playgroud)

但我得到一个我不明白的错误:

Error in sort.list(y) : 
  'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Run Code Online (Sandbox Code Playgroud)

有人可以帮帮我吗?

非常感谢!

卡捷琳娜

Rei*_*son 7

您应该查看cut()基数R 中的函数.在进一步冒险之前,您还应该注意我的答案的最后一行(粗体).

> set.seed(42)
> cut(runif(50), 6)
 [1] (0.825,0.99]    (0.825,0.99]    (0.167,0.332]   (0.825,0.99]   
 [5] (0.496,0.661]   (0.496,0.661]   (0.661,0.825]   (0.00296,0.167]
 [9] (0.496,0.661]   (0.661,0.825]   (0.332,0.496]   (0.661,0.825]  
[13] (0.825,0.99]    (0.167,0.332]   (0.332,0.496]   (0.825,0.99]   
[17] (0.825,0.99]    (0.00296,0.167] (0.332,0.496]   (0.496,0.661]  
[21] (0.825,0.99]    (0.00296,0.167] (0.825,0.99]    (0.825,0.99]   
[25] (0.00296,0.167] (0.496,0.661]   (0.332,0.496]   (0.825,0.99]   
[29] (0.332,0.496]   (0.825,0.99]    (0.661,0.825]   (0.661,0.825]  
[33] (0.332,0.496]   (0.661,0.825]   (0.00296,0.167] (0.825,0.99]   
[37] (0.00296,0.167] (0.167,0.332]   (0.825,0.99]    (0.496,0.661]  
[41] (0.332,0.496]   (0.332,0.496]   (0.00296,0.167] (0.825,0.99]   
[45] (0.332,0.496]   (0.825,0.99]    (0.825,0.99]    (0.496,0.661]  
[49] (0.825,0.99]    (0.496,0.661]  
6 Levels: (0.00296,0.167] (0.167,0.332] (0.332,0.496] ... (0.825,0.99]
Run Code Online (Sandbox Code Playgroud)

cut()返回一个因子,该因子索引哪一个,在这种情况下,6个观察到数据的组.这只是将数据范围简单地分成6组相等间隔.阅读?cut有关在间隔的极端情况下应该做什么的详细信息.

您的代码失败的原因是因为返回的对象hist()是一个列表,其中包含的内容远远超过您分组到群组中的数据:

> foo <- hist(runif(50), breaks = 6, plot = FALSE)
> str(foo)
List of 7
 $ breaks     : num [1:6] 0 0.2 0.4 0.6 0.8 1
 $ counts     : int [1:5] 12 13 7 13 5
 $ intensities: num [1:5] 1.2 1.3 0.7 1.3 0.5
 $ density    : num [1:5] 1.2 1.3 0.7 1.3 0.5
 $ mids       : num [1:5] 0.1 0.3 0.5 0.7 0.9
 $ xname      : chr "runif(50)"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"
Run Code Online (Sandbox Code Playgroud)

所以你可以把它转换成一个因子--R不知道怎么做.另请注意,这hist()不会将数据分解为6组 - 它提供了用于构建直方图的其他信息.还要注意它会产生漂亮的休息,不像cut().如果你想要这些漂亮的休息,那么我们可以hist()通过以下方式重现:

> set.seed(42)
> x <- runif(50)
> brks <- pretty(range(x), n = 6, min.n = 1)
> cut(x, breaks = brks)
 [1] (0.8,1]   (0.8,1]   (0.2,0.4] (0.8,1]   (0.6,0.8] (0.4,0.6] (0.6,0.8]
 [8] (0,0.2]   (0.6,0.8] (0.6,0.8] (0.4,0.6] (0.6,0.8] (0.8,1]   (0.2,0.4]
[15] (0.4,0.6] (0.8,1]   (0.8,1]   (0,0.2]   (0.4,0.6] (0.4,0.6] (0.8,1]  
[22] (0,0.2]   (0.8,1]   (0.8,1]   (0,0.2]   (0.4,0.6] (0.2,0.4] (0.8,1]  
[29] (0.4,0.6] (0.8,1]   (0.6,0.8] (0.8,1]   (0.2,0.4] (0.6,0.8] (0,0.2]  
[36] (0.8,1]   (0,0.2]   (0.2,0.4] (0.8,1]   (0.6,0.8] (0.2,0.4] (0.4,0.6]
[43] (0,0.2]   (0.8,1]   (0.4,0.6] (0.8,1]   (0.8,1]   (0.6,0.8] (0.8,1]  
[50] (0.6,0.8]
Levels: (0,0.2] (0.2,0.4] (0.4,0.6] (0.6,0.8] (0.8,1]
Run Code Online (Sandbox Code Playgroud)

但你应该问自己为什么要对你的数据进行分类,以及这是否有意义?