Clojure独特+随机生成的流的复杂性

Mic*_*ent 5 clojure time-complexity

表达式的时间复杂度是多少

(doall (take n (distinct stream)))
Run Code Online (Sandbox Code Playgroud)

其中stream是懒惰地生成具有重复(可能是无限的)集合?

我猜这部分取决于重复的数量或机会stream?如果stream(repeatedly #(rand-int m)))哪里 m >= n呢?

我的估计:

对于结果列表中的每个元素,必须从流中至少实现一个元素。如果流中有重复项,则为多个。对于每次迭代,都有一个设置的查找和/或插入,但是由于它们接近恒定的时间,所以我们至少得到了:O(n*~1) = O(n)然后,重复项变得有些复杂。我的直觉是,也可以忽略重复项的复杂性,但是我不确定如何将其形式化。例如,我们不能只说某个常数为O(n*k*~1)= O(n)k因为在流中没有明显的最大k重复数。

让我用一些数据演示问题:

(defn stream [upper distinct-n]
  (let [counter (volatile! 0)]
    (doall (take distinct-n
                 (distinct
                  (repeatedly (fn []
                                (vswap! counter inc)
                                (rand-int upper))))))
    @counter))

(defn sample [times-n upper distinct-n]
  (->> (repeatedly times-n
                  #(stream upper distinct-n))
       frequencies
       (sort-by val)
       reverse))

(sample 10000 5 1) ;; ([1 10000])
(sample 10000 5 2) ;; ([2 8024] [3 1562] [4 334] [5 66] [6 12] [8 1] [7 1])
(sample 10000 5 3) ;; ([3 4799] [4 2898] [5 1324] [6 578] [7 236] [8 87] [9 48] [10 14] [11 10] [14 3] [12 2] [13 1])
(sample 10000 5 3) ;; ([3 4881] [4 2787] [5 1359] [6 582] [7 221] [8 107] [9 39] [10 12] [11 9] [12 1] [17 1] [13 1])
(sample 10000 5 4) ;; ([5 2258] [6 1912] [4 1909] [7 1420] [8 985] [9 565] [10 374] [11 226] [12 138] [13 89] [14 50] [15 33] [16 16] [17 9] [18 8] [20 5] [19 1] [23 1] [21 1])
(sample 10000 5 5) ;; ([8 1082] [9 1055] [7 1012] [10 952] [11 805] [6 778] [12 689] [13 558] [14 505] [5 415] [15 387] [16 338] [17 295] [18 203] [19 198] [20 148] [21 100] [22 96] [23 72] [24 53] [25 44] [26 40] [28 35] [27 31] [29 19] [30 16] [31 15] [32 13] [35 10] [34 6] [33 6] [42 3] [38 3] [45 3] [36 3] [37 2] [39 2] [52 1] [66 1] [51 1] [44 1] [41 1] [50 1] [60 1] [58 1])
Run Code Online (Sandbox Code Playgroud)

请注意,对于最后一个样本,迭代次数distinct最多可以达到66,尽管机会很小。还要注意的是增加n(sample 10000 n n)从流实现要素最有可能的数量似乎超过线性上去。

此图示出了从输入实现中元素的数目(从10000个样品最屡见不鲜)(doall (take n (repeatedly #(rand-int m)))为各种数量nm

数据

为了完整起见,这是我用来生成图表的代码:

(require '[com.hypirion.clj-xchart :as c])

(defn most-common [times-n upper distinct-n]
  (->> (repeatedly times-n
                   #(stream upper distinct-n))
       frequencies
       (sort-by #(- (val %)))
       ffirst))

(defn series [m]
  {(str "m = " m)
   (let [x (range 1 (inc m))]
     {:x x
      :y (map #(most-common 10000 m %)
              x)})})

(c/view
 (c/xy-chart
  (merge (series 10)
         (series 25)
         (series 50)
         (series 100))
  {:x-axis {:title "n"}
   :y-axis {:title "realized"}}))
Run Code Online (Sandbox Code Playgroud)

Clo*_*tly 2

您的问题称为优惠券收集器问题,预期的元素数量只需将 m/m + m/(m-1) 相加即可给出...直到您拥有 n 项:

(defn general-coupon-collector-expect
  "n: Cardinality of resulting set (# of uniuque coupons to collect)
   m: Cardinality of set to draw from (#coupons that exist)"
  [n m]
  {:pre [(<= n m)]}
  (double (apply + (mapv / (repeat m) (range m (- m n) -1)))))
(general-coupon-collector-expect 25 25)
;; => 95
;; This generates the data for you plot:
(for [x (range 10 101 5)]
  (general-coupon-collector-expect x 100))
Run Code Online (Sandbox Code Playgroud)

最坏的情况将是无限的。最好的情况是 N。平均情况是 O(N log N)。这忽略了检查元素是否已被绘制的复杂性。实际上,对于 clojure 集来说,它是 Log_32 N(在 中使用distinct)。