如何在seaborn因子图中使用加权平均估计器(包括自举)?

Tim*_*Tim 4 python matplotlib seaborn

我有一个数据框,其中每一行都有一定的权重,需要在平均计算中考虑到。我喜欢seaborn 因子图和它们自举的95% 置信区间,但无法让seaborn 接受新的加权均值估计器。

这是我想做的一个例子。

tips_all = sns.load_dataset("tips")
tips_all["weight"] = 10 * np.random.rand(len(tips_all))
sns.factorplot("size", "total_bill", 
               data=tips_all, kind="point")
# here I would like to have a mean estimator that computes a weighted mean
# the bootstrapped confidence intervals should also use this weighted mean estimator
# something like (tips_all["weight"] * tips_all["total_bill"]).sum() / tips_all["weight"].sum()
# but on bootstrapped samples (for the confidence interval)
Run Code Online (Sandbox Code Playgroud)

Tim*_*Tim 5

来自@mwaskom: https: //github.com/mwaskom/seaborn/issues/722

它并没有得到真正的支持,但我认为可以组合出一个解决方案。这似乎有效?

tips = sns.load_dataset("tips")
tips["weight"] = 10 * np.random.rand(len(tips))

tips["tip_and_weight"] = zip(tips.tip, tips.weight)

def weighted_mean(x, **kws):
    val, weight = map(np.asarray, zip(*x))
    return (val * weight).sum() / weight.sum()

g = sns.factorplot("size", "tip_and_weight", data=tips,
                   estimator=weighted_mean, orient="v")
g.set_axis_labels("size", "tip")
Run Code Online (Sandbox Code Playgroud)