小编Lea*_*r27的帖子

如何汇总R中分类变量的唯一值的计数

假设我有一个数据集data:

x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)

x1 x2
a  a1
a  a1 
a  a2
a  a1
a  a2
a  a3
b  b1
b  b1
b  b2 
b  b2
Run Code Online (Sandbox Code Playgroud)

我想找到x1对应的唯一值的数量x2

例如a,只有3个唯一值(a1,a2a3)并且b有2个值(b1b2)

我使用aggregate(x1~.,data,sum)但它没有用,因为这些是因素,而不是整数.

请帮忙

r unique summarization distinct-values

4
推荐指数
1
解决办法
5076
查看次数

如何从R中的数据集中删除问号(?)

大家好我正在分析UCI成人census数据.?每个缺失值的数据都有问号().

我想替换所有的问号NA.

我试过了:

library(XML)
census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?")
names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation"   
  ,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary"  )

table(census$Workclass)

                ?       Federal-gov         Local-gov      Never-worked           Private      Self-emp-inc 
             1836               960              2093                 7             22696              1116 
 Self-emp-not-inc         State-gov       Without-pay 
             2541              1298                14 

x

<-ifelse(census$Workclass=="?",NA,census$Workclass)
 table(x)
x
    1     2     3     4     5     6     7     8     9 
 1836   960  2093     7 22696  1116  2541  1298    14
Run Code Online (Sandbox Code Playgroud)

但它不起作用.

请帮忙.

r na

2
推荐指数
1
解决办法
1万
查看次数

使用频率在R中提取变量

假设我有一个数据帧:

 x  y
 a  1
 b  2
 a  3
 a  4
 b  5
 c  6
 a  7
 d  8
 a  9
 b 10
 e 12
 b 13
 c 15
Run Code Online (Sandbox Code Playgroud)

我想创建另一个数据帧,其中仅包含x发生至少3次(价值ab,在这种情况下),以及它们相应的最高y值.

所以我希望输出为:

x   y
a   9
b   13
Run Code Online (Sandbox Code Playgroud)

这里913它们分别是a和的最高值b

我试过用:

sort-(table(x,y)) 
Run Code Online (Sandbox Code Playgroud)

但它不起作用.

r data-mining

2
推荐指数
2
解决办法
102
查看次数

标签 统计

r ×3

data-mining ×1

distinct-values ×1

na ×1

summarization ×1

unique ×1