假设我有一个数据集data
:
x1 <- c("a","a","a","a","a","a","b","b","b","b")
x2 <- c("a1","a1","a1","a1","a1","a1","b1","b1","b2","b2")
data <- data.frame(x1,x2)
x1 x2
a a1
a a1
a a2
a a1
a a2
a a3
b b1
b b1
b b2
b b2
Run Code Online (Sandbox Code Playgroud)
我想找到x1
对应的唯一值的数量x2
例如a
,只有3个唯一值(a1,a2
和a3
)并且b
有2个值(b1
和b2
)
我使用aggregate(x1~.,data,sum)
但它没有用,因为这些是因素,而不是整数.
请帮忙
大家好我正在分析UCI成人census
数据.?
每个缺失值的数据都有问号().
我想替换所有的问号NA
.
我试过了:
library(XML)
census<-read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",header=F,na.strings="?")
names(census)<-c("Age","Workclass","Fnlwght","Education","EducationNum","MaritalStatus","Occupation"
,"Relationship" , "Race","Gender","CapitalGain","CapitalLoss","HoursPerWeek","NativeCountry","Salary" )
table(census$Workclass)
? Federal-gov Local-gov Never-worked Private Self-emp-inc
1836 960 2093 7 22696 1116
Self-emp-not-inc State-gov Without-pay
2541 1298 14
x
<-ifelse(census$Workclass=="?",NA,census$Workclass)
table(x)
x
1 2 3 4 5 6 7 8 9
1836 960 2093 7 22696 1116 2541 1298 14
Run Code Online (Sandbox Code Playgroud)
但它不起作用.
请帮忙.
假设我有一个数据帧:
x y
a 1
b 2
a 3
a 4
b 5
c 6
a 7
d 8
a 9
b 10
e 12
b 13
c 15
Run Code Online (Sandbox Code Playgroud)
我想创建另一个数据帧,其中仅包含x
发生至少3次(价值a
和b
,在这种情况下),以及它们相应的最高y
值.
所以我希望输出为:
x y
a 9
b 13
Run Code Online (Sandbox Code Playgroud)
这里9
和13
它们分别是a
和的最高值b
我试过用:
sort-(table(x,y))
Run Code Online (Sandbox Code Playgroud)
但它不起作用.