Oracle SQL 查询计算数据集的平均值,排除异常值

JT2*_*013 1 sql oracle average sql-navigator

我有一个查询,其中包含我想要显示的正确条件和字段:

  SELECT t.business_process_id,
         COUNT (tsp.status) AS COUNT,
         ROUND (AVG (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS average,
         ROUND (MAX (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MAX,
         ROUND (MIN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MIN,
         ROUND (MEDIAN (tsp.end_date - tsp.start_date), 2) * 24 * 60 AS MEDIAN,
         ROUND (STDDEV (tsp.end_date - tsp.start_date), 2) AS std_deviation
    FROM transaction_status_period tsp, transaction t
   WHERE     t.trans_id = tsp.trans_id
         AND tsp.status = 'R'
         AND tsp.end_date IS NOT NULL
         AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
         AND EXTRACT (DAY FROM tsp.start_date) =
                 EXTRACT (DAY FROM tsp.end_date)
         AND EXTRACT (YEAR FROM tsp.start_date) =
                 EXTRACT (YEAR FROM tsp.end_date)
         AND EXTRACT (MONTH FROM tsp.start_date) =
                 EXTRACT (MONTH FROM tsp.end_date)
         AND EXTRACT (YEAR FROM tsp.start_date) = 2013
         AND NOT EXISTS
                     (SELECT 1
                        FROM transaction_status_period tsp1
                       WHERE     tsp1.trans_id = tsp.trans_id
                             AND tsp.userid = tsp1.userid
                             AND tsp1.status = 'S'
                             AND tsp1.timestamp < tsp.timestamp)
GROUP BY t.business_process_id
Run Code Online (Sandbox Code Playgroud)

查询计算出的平均值是相关整个数据集的平均值(年份 = 2013 年)。有没有办法让查询计算 2013 年所有数据的平均值(排除异常值)?(tsp.end_date - tsp.start_date)即找到2013 年大多数观测值落在何处的 日期差异的平均值?

功能能用吗percentile_cont?我不熟悉它,但我知道它计算特定列的百分位数。就我而言,我正在寻找 之间的平均日期差(tsp.end_date - tsp.start_date),但大多数数据点的平均值(不包括异常值)。

任何帮助将非常感激。也许我以错误的方式处理这个问题。

jha*_*ham 5

这样的事情能解决你的问题吗?

计算内联视图中的平均值和标准差,然后使用它来定义异常值。假设您认为离群值是平均值标准的两倍,则:

SELECT calc.business_process_id,
 COUNT (calc.status) AS COUNT,
 ROUND (AVG (calc.end_date - calc.start_date), 2) * 24 * 60 AS average,
 ROUND (MAX (calc.end_date - calc.start_date), 2) * 24 * 60 AS MAX,
 ROUND (MIN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MIN,
 ROUND (MEDIAN (calc.end_date - calc.start_date), 2) * 24 * 60 AS MEDIAN,
 ROUND (STDDEV (calc.end_date - calc.start_date), 2) AS std_deviation
FROM (SELECT t.business_process_id,
         tsp.status,
         tsp.start_date,
         tsp.end_date, 
         ntile(100) over (order by (tsp.end_date-tsp.start_date)) as percentiles
      FROM transaction_status_period tsp, transaction t 
      WHERE     t.trans_id = tsp.trans_id
      AND tsp.status = 'R'
      AND tsp.end_date IS NOT NULL
      AND tsp.userid NOT IN ('X', 'Y', 'Z', 'A')
      AND EXTRACT (DAY FROM tsp.start_date) =
          EXTRACT (DAY FROM tsp.end_date)
      AND EXTRACT (YEAR FROM tsp.start_date) =
          EXTRACT (YEAR FROM tsp.end_date)
      AND EXTRACT (MONTH FROM tsp.start_date) =
          EXTRACT (MONTH FROM tsp.end_date)
      AND EXTRACT (YEAR FROM tsp.start_date) = 2013
      AND NOT EXISTS
             (SELECT 1
                FROM transaction_status_period tsp1
               WHERE     tsp1.trans_id = tsp.trans_id
                     AND tsp.userid = tsp1.userid
                     AND tsp1.status = 'S'
                     AND tsp1.timestamp < tsp.timestamp)
  ) calc
WHERE calc.percentiles >=10 
AND calc.percentiles <=90
GROUP BY calc.business_process_id  
Run Code Online (Sandbox Code Playgroud)