ANALYZE 将扫描多少页是否有配置设置？

Question

ANALYZE 将扫描多少页是否有配置设置？

是否有设置控制 PostgreSQLANALYZE命令扫描的记录/页面数量？看起来默认情况下它会扫描 30,000 页和 30,000 条记录。在PostgreSQL服务器配置文件清单吨的选项，但我没有看到任何具体到ANALYZE。

Answer 1

Dan*_*ité 5

表的每一列都有一个attstattarget属性（存储在中pg_attribute），它告诉从收集的统计样本中应该为它存储多少数据ANALYZE。

它默认为default_statistics_target，而它本身默认为100。

在规划师使用的统计信息中，文档说：

ANALYZE 存储在 pg_statistic 中的信息量，特别是每列的 most_common_vals 和 histogram_bounds 数组中的最大条目数，可以使用 ALTER TABLE SET STATISTICS 命令逐列设置，或通过设置全局设置default_statistics_target 配置变量

30,000页和行背后的原因是 ANALYZE 考虑的行中的样本大小是采样表300的最大值的倍attstattarget，这将是默认值100。

的300来自于所述源代码中提到的统计式
src/backend/commands/analyze.c：

    /*--------------------
     * The following choice of minrows is based on the paper
     * "Random sampling for histogram construction: how much is enough?"
     * by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
     * Proceedings of ACM SIGMOD International Conference on Management
     * of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
     * says that for table size n, histogram size k, maximum relative
     * error in bin size f, and error probability gamma, the minimum
     * random sample size is
     *      r = 4 * k * ln(2*n/gamma) / f^2
     * Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
     *      r = 305.82 * k
     * Note that because of the log function, the dependence on n is
     * quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
     * bin size error with probability 0.99.  So there's no real need to
     * scale for n, which is a good thing because we don't necessarily
     * know it at this point.
     *--------------------
     */
    stats->minrows = 300 * attr->attstattarget;

Run Code Online (Sandbox Code Playgroud)

至于页数，由于行不跨页，因此最多N将读取页以获取N行。我相信ANALYZE故意旨在获取最大的页面以获得最佳样本。这是有道理的，因为存储在同一页面中的行更有可能相关。

归档时间：	11 年，5 月前
查看次数：	1213 次
最近记录：	11 年，5 月前