有关命令的问题by和weighted.mean已经存在,但没有一个能够帮助解决我的问题。我是R语言的新手,比起编程,我更习惯于数据挖掘语言。
我有一个数据框,其中包含每个人(观察/行)的收入,教育水平和样本权重。我想按教育程度计算收入的加权平均值,并且希望将结果与原始数据框的新列中的每个人相关联,如下所示:
obs income education weight incomegroup
1. 1000 A 10 --> display weighted mean of income for education level A
2. 2000 B 1 --> display weighted mean of income for education level B
3. 1500 B 5 --> display weighted mean of income for education level B
4. 2000 A 2 --> display weighted mean of income for education level A
Run Code Online (Sandbox Code Playgroud)
我试过了:
data$incomegroup=by(data$education, function(x) weighted.mean(data$income, data$weight))
Run Code Online (Sandbox Code Playgroud)
这是行不通的。加权均值是通过某种方式计算的,并显示在“收入组”列中,但是对于整个集合而不是按组或仅对于一个组,我不知道。我阅读了有关软件包的内容plyr,aggregate但似乎并没有做我感兴趣的事情。
该ave{stats}命令给出的正是我要查找的内容,但仅出于简单的意思: …
I have a list of dataframes for which I want to obtain (in a separate dataframe) the row mean of a specified column which may or may not exist in all dataframes of the list. My problem comes when the specified column does not exist in at least one of the dataframes of the list.
Assume the following example list of dataframes:
df1 <- read.table(text = 'X A B C
name1 1 2 3
name2 5 10 4',
header = …Run Code Online (Sandbox Code Playgroud)