ANALYZE 将扫描多少页是否有配置设置?

5 postgresql configuration

是否有设置控制 PostgreSQLANALYZE命令扫描的记录/页面数量?看起来默认情况下它会扫描 30,000 页和 30,000 条记录。在PostgreSQL服务器配置文件清单吨的选项,但我没有看到任何具体到ANALYZE

Dan*_*ité 5

表的每一列都有一个attstattarget属性(存储在 中pg_attribute),它告诉从 收集的统计样本中应该为它存储多少数据ANALYZE

它默认为default_statistics_target,而它本身默认为100

规划师使用的统计信息中,文档说:

ANALYZE 存储在 pg_statistic 中的信息量,特别是每列的 most_common_vals 和 histogram_bounds 数组中的最大条目数,可以使用 ALTER TABLE SET STATISTICS 命令逐列设置,或通过设置全局设置default_statistics_target 配置变量

30,000页和行背后的原因是 ANALYZE 考虑的行中的样本大小是采样表300的最大值的倍attstattarget,这将是默认值100

300来自于所述源代码中提到的统计式
src/backend/commands/analyze.c

    /*--------------------
     * The following choice of minrows is based on the paper
     * "Random sampling for histogram construction: how much is enough?"
     * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
     * Proceedings of ACM SIGMOD International Conference on Management
     * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
     * says that for table size n, histogram size k, maximum relative
     * error in bin size f, and error probability gamma, the minimum
     * random sample size is
     *      r = 4 * k * ln(2*n/gamma) / f^2
     * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
     *      r = 305.82 * k
     * Note that because of the log function, the dependence on n is
     * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
     * bin size error with probability 0.99.  So there's no real need to
     * scale for n, which is a good thing because we don't necessarily
     * know it at this point.
     *--------------------
     */
    stats->minrows = 300 * attr->attstattarget;
Run Code Online (Sandbox Code Playgroud)

至于页数,由于行不跨页,因此最多N将读取页以获取N行。我相信ANALYZE故意旨在获取最大的页面以获得最佳样本。这是有道理的,因为存储在同一页面中的行更有可能相关。