如何计算R中表的每列中的元素

PNY*_*PNY 0 r count missing-data

我有一个看起来像这样的数据集(实际上它有> 50列)

data <- read.csv("sample.csv")

subject gender  age type    satisfation     agree 
1   f   22  a   yes yes
2   f   23  b   no  yes 
3   f   21  b       no
4   m   24  c   yes yes 
5   f   22  b   no  yes
6   m       a   yes yes 
7       25  c   yes no
8   m   21  b   no  yes 
9   f   23  c   yes yes
Run Code Online (Sandbox Code Playgroud)

我想计算每列中的元素(不计算NA)并将结果导出为下面的布局

subject gender  age type    satisfation     agree 
9   8   8   9   8   9
Run Code Online (Sandbox Code Playgroud)

我写了一个脚本来计算

counting <- function(x) {
  for(i in 1:length(data)) {
     data <- length(which(!is.na(x$i)))
      print(data)
  }
  return(data)
}   
counting(data)
Run Code Online (Sandbox Code Playgroud)

我没有工作,因为它给了所有0.

dput(head(data, 9))

structure(list(subject = 1:9, gender = structure(c(2L, 2L, 2L, 
3L, 2L, 3L, 1L, 3L, 2L), .Label = c("", "f", "m"), class = "factor"), 
    age = c(22L, 23L, 21L, 24L, 22L, NA, 25L, 21L, 23L), type = structure(c(1L, 
    2L, 2L, 3L, 2L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"
    ), class = "factor"), satisfation = structure(c(3L, 2L, 1L, 
    3L, 2L, 3L, 3L, 2L, 3L), .Label = c("", "no", "yes"), class = "factor"), 
    agree = structure(c(2L, 3L, 1L, 3L, 2L, 3L, 1L, 3L, 2L), .Label = c("no", 
    "yes", "yes "), class = "factor"), time = c(23L, 54L, 67L, 
    324L, 87L, 12L, 756L, 34L, 98L), day = c(1L, 3L, 2L, 5L, 
    7L, 4L, 3L, 1L, 4L)), .Names = c("subject", "gender", "age", 
"type", "satisfation", "agree", "time", "day"), row.names = c(NA, 
9L), class = "data.frame")
Run Code Online (Sandbox Code Playgroud)

有什么建议的脚本,PLZ?

谢谢大家!

Sot*_*tos 7

假设你已经处理过NA然后只需使用colSums,

colSums(!is.na(df))
#    subject    gender       age       type   satisfation    agree      time        day 
#          9      9           8           9           9       9          9           9 
Run Code Online (Sandbox Code Playgroud)

添加@DavidArenburg建议以克服任何NA麻烦,

colSums(!is.na(df) | df != "", na.rm = TRUE)
Run Code Online (Sandbox Code Playgroud)


Rap*_*l K 5

当我将表加载到R中时,只有空格而不是NA.因此,当您阅读.csv文件时,请指定如何编码NA.看起来它们被编码为"或"或"".

获得NA后,您可以运行此代码.假设您的表被调用df.

counts <- apply(df, 2, function(x) length(na.omit(x)))
Run Code Online (Sandbox Code Playgroud)

或者,正如@JasonAizkalns所说:

data <- read.csv("sample.csv", na.strings = "") 
sapply(data, function(x) sum(!is.na(x))
Run Code Online (Sandbox Code Playgroud)