我有一个数据框
x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4
Run Code Online (Sandbox Code Playgroud)
等等..
基本上,用户已经评价了许多书籍,并且书籍具有许多评级.
我需要提取一些描述性统计数据userId
.平均评级数量,平均评级等.
有谁能指出我正确的方向?
您可以使用以下命令进行以下计算data.table
:
如果你data.frame
被称为books
:
require(data.table)
setDT(books)
# average rating by user
books[, mean(rating), by=userId]
# userId V1
#1: 1 5.5
#2: 2 4.0
# average amount of ratings given :
books[, .N, by=userId][, mean(N)]
#[1] 1.5
Run Code Online (Sandbox Code Playgroud)
小智 5
我不确定我是否得到您的确切问题/任务.但以下内容可以提供一些见解:
data = read.table(header = T, stringsAsFactors = F, text = "x userId bookId rating
1 1 412 6
2 1 454 5
3 2 412 4")
# Number of ratings per user
userFreq = data.frame(table(data$userId))
# Var1 Freq
# 1 1 2
# 2 2 1
# mean rating per userID
meanRatingPerUser = aggregate(data$rating, by=list(data$userId), FUN = mean )
# Group.1 x
# 1 1 5.5
# 2 2 4.0
# mean rating per book
meanRatingPerBook = aggregate(data$rating, by=list(data$bookId), FUN = mean )
# Group.1 x
# 1 412 5
# 2 454 5
# "Summary" function, applied per bookID
moreStats = aggregate(data$rating, by=list(data$bookId), FUN = summary )
# Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
# 1 412 4.0 4.5 5.0 5.0 5.5 6.0
# 2 454 5.0 5.0 5.0 5.0 5.0 5.0
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
122 次 |
最近记录: |