我正在使用教育数据集:426名学生对8个多项选择题的答案(1=正确,0=不正确),以及指示哪位教师(1, 2, or 3)教授他们课程的专栏.
就目前而言,我的数据非常漂亮data.df,如下所示:
str(data.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: int 1 1 1 1 1 1 0 0 0 1 ...
$ ques02: int 0 0 1 1 1 1 1 1 1 1 ...
$ ques03: int 0 0 1 1 0 0 1 1 0 1 ...
$ ques04: int 1 0 1 1 1 1 1 1 1 1 ...
$ ques05: int 0 0 0 0 1 0 0 0 0 0 ...
$ ques06: int 1 0 1 1 0 1 1 1 1 1 ...
$ ques07: int 0 0 1 1 0 1 1 0 0 1 ...
$ ques08: int 0 0 1 1 1 0 1 1 0 1 ...
$ inst : num 1 1 1 1 1 1 1 1 1 1 ...
Run Code Online (Sandbox Code Playgroud)
但这些ques0x价值观并不是真正的整数.相反,我认为将R视为实验因素更好."inst"值也是如此.
ints和nums变成factors
理想情况下,一个优雅的解决方案应该生成一个数据帧 - 我称之为factorData.df- 如下所示:
str(factorData.df)
'data.frame': 426 obs. of 9 variables:
$ ques01: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 1 1 1 2 ...
$ ques02: Factor w/ 2 levels "0","1": 1 1 2 2 2 2 2 2 2 2 ...
$ ques03: Factor w/ 2 levels "0","1": 1 1 2 2 1 1 2 2 1 2 ...
$ ques04: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ ques05: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 1 1 ...
$ ques06: Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 2 ...
$ ques07: Factor w/ 2 levels "0","1": 1 1 2 2 1 2 2 1 1 2 ...
$ ques08: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 2 2 1 2 ...
$ inst : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
Run Code Online (Sandbox Code Playgroud)
我敢肯定,不管解决方案,您乡亲拿出,它应该是很容易推广到的那会需要得到重新分类变量任意n个,并且将整个工作最常见的转换(int -> factor并且num -> int,例如).
因为我目前笨重的代码只有9个单独的factor()语句,每个变量一个,就像这样
factorData.df$ques01
我对R,编程和stackoverflow都是全新的.请保持温和,并提前感谢您的帮助!
Sha*_*ane 11
我想有更好的方法,但这里有两个选择:
# use a sample data set
> str(cars)
'data.frame': 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
$ dist : num 2 10 4 22 16 10 18 26 34 17 ...
> data.df <- cars
Run Code Online (Sandbox Code Playgroud)
你可以使用lapply:
> data.df <- data.frame(lapply(data.df, factor))
Run Code Online (Sandbox Code Playgroud)
或for声明:
> for(i in 1:ncol(data.df)) data.df[,i] <- as.factor(data.df[,i])
Run Code Online (Sandbox Code Playgroud)
在任何一种情况下,你最终得到你想要的:
> str(data.df)
'data.frame': 50 obs. of 2 variables:
$ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
$ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
Run Code Online (Sandbox Code Playgroud)
我在plyr包中找到了另一种解决方案:
# load the package and data
> library(plyr)
> data.df <- cars
Run Code Online (Sandbox Code Playgroud)
使用colwise函数:
> data.df <- colwise(factor)(data.df)
> str(data.df)
'data.frame': 50 obs. of 2 variables:
$ speed: Factor w/ 19 levels "4","7","8","9",..: 1 1 2 2 3 4 5 5 5 6 ...
$ dist : Factor w/ 35 levels "2","4","10","14",..: 1 3 2 9 5 3 7 11 14 6 ...
Run Code Online (Sandbox Code Playgroud)
顺便说一下,如果你看一下colwise函数,它只是使用lapply:
df <- as.data.frame(lapply(filtered, .fun, ...))
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
670 次 |
| 最近记录: |