规范化gnuplot中的直方图箱

Question

规范化gnuplot中的直方图箱

shi*_*ght 6 gnuplot normalize histogram bins

我正在尝试绘制一个直方图,其中的箱子通过箱子中的元素数量进行归一化.

我正在使用以下内容

binwidth=5
bin(x,width)=width*floor(x/width) + binwidth/2.0
plot 'file' using (bin($2, binwidth)):($4) smooth freq with boxes

Run Code Online (Sandbox Code Playgroud)

得到一个基本的直方图,但我希望每个bin的值除以bin的大小.我如何在gnuplot中进行此操作,或在必要时使用外部工具？

Answer 1

小智 9

在gnuplot 4.4中,函数采用不同的属性,因为它们可以执行多个连续的命令,然后返回一个值(参见gnuplot技巧)这意味着你可以实际计算gnuplot文件中的点数n,而无需提前知道.此代码针对包含一列的文件"out.dat"运行:来自正态分布的n个样本列表:

binwidth = 0.1
set boxwidth binwidth
sum = 0

s(x)          = ((sum=sum+1), 0)
bin(x, width) = width*floor(x/width) + binwidth/2.0

plot "out.dat" u ($1):(s($1))
plot "out.dat" u (bin($1, binwidth)):(1.0/(binwidth*sum)) smooth freq w boxes

Run Code Online (Sandbox Code Playgroud)

第一个绘图语句读取数据文件,并为每个点增加一次sum,绘制零.

第二个绘图语句实际上使用sum的值来标准化直方图.

你可以通过让`s(x)`的第二个值为'NaN`,并将'notitle`添加到第一个'plot`命令来进一步改善这一点 - 这样,总和将在图中完全不可见,因为绘图时,gnuplot忽略`NaN`值=) (2认同)

Answer 2

小智 8

在gnuplot 4.6中,您可以按stats命令计算点数,这比点快plot.实际上,你不需要这样的技巧s(x)=((sum=sum+1),0),但STATS_records在运行之后直接用变量计算数字stats 'out.dat' u 1.

Answer 3

chl*_*chl 4

这是我的做法，使用以下命令从 R 生成 n=500 个随机高斯变量：

Rscript -e 'cat(rnorm(500), sep="\\n")' > rnd.dat

Run Code Online (Sandbox Code Playgroud)

我使用与您定义标准化直方图完全相同的想法，其中 y 定义为 1/(binwidth * n)，除了我使用而int不是floor并且我没有重新定位 bin 值。简而言之，这是对smooth.dem演示脚本的快速改编，Janert 的教科书Gnuplot in Action（第 13 章，第 257 页，免费提供）中描述了类似的方法。random-points您可以用 Gnuplot 附带的文件夹中提供的示例数据文件替换我的示例数据文件demo。请注意，我们需要将点数指定为 Gnuplot，因为文件中的记录没有计数功能。

bw1=0.1
bw2=0.3
n=500
bin(x,width)=width*int(x/width)
set xrange [-3:3]
set yrange [0:1]
tstr(n)=sprintf("Binwidth = %1.1f\n", n) 
set multiplot layout 1,2
set boxwidth bw1
plot 'rnd.dat' using (bin($1,bw1)):(1./(bw1*n)) smooth frequency with boxes t tstr(bw1)
set boxwidth bw2
plot 'rnd.dat' using (bin($1,bw2)):(1./(bw2*n)) smooth frequency with boxes t tstr(bw2)

Run Code Online (Sandbox Code Playgroud)

这是结果，有两个 bin 宽度

在此输入图像描述

此外，这确实是直方图的一种粗略方法，并且在 R 中可以轻松获得更详细的解决方案。事实上，问题是如何定义良好的 bin 宽度，并且这个问题已经在stats.stackexchange.com上进行了讨论：使用Freedman-尽管您需要计算四分位数范围，但Diaconis分箱规则实施起来应该不会太困难。

以下是 R 如何处理相同的数据集，使用默认选项（Sturges 规则，因为在这种特殊情况下，这不会产生影响）和与上面使用的等间距的 bin。

在此输入图像描述

使用的 R 代码如下：

par(mfrow=c(1,2), las=1)
hist(rnd, main="Sturges", xlab="", ylab="", prob=TRUE)
hist(rnd, breaks=seq(-3.5,3.5,by=.1), main="Binwidth = 0.1", 
     xlab="", ylab="", prob=TRUE)

Run Code Online (Sandbox Code Playgroud)

您甚至可以通过检查调用时返回的值来了解 R 如何完成其工作hist()：

> str(hist(rnd, plot=FALSE))
List of 7
 $ breaks     : num [1:14] -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 ...
 $ counts     : int [1:13] 1 1 12 20 49 79 108 87 71 43 ...
 $ intensities: num [1:13] 0.004 0.004 0.048 0.08 0.196 0.316 0.432 0.348 0.284 0.172 ...
 $ density    : num [1:13] 0.004 0.004 0.048 0.08 0.196 0.316 0.432 0.348 0.284 0.172 ...
 $ mids       : num [1:13] -3.25 -2.75 -2.25 -1.75 -1.25 -0.75 -0.25 0.25 0.75 1.25 ...
 $ xname      : chr "rnd"
 $ equidist   : logi TRUE
 - attr(*, "class")= chr "histogram"

Run Code Online (Sandbox Code Playgroud)

所有这一切都表明，如果您愿意，您可以使用 R 结果通过 Gnuplot 处理数据（尽管我建议直接使用 R :-）。

归档时间：	14 年，10 月前
查看次数：	17242 次
最近记录：	9 年，2 月前