Clojure中的复杂数据操作

mon*_*962 4 clojure destructuring

我正在从事个人市场分析项目.我有一个数据结构代表了市场上所有最近的转折点,看起来像这样:

[{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
 {:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
 {:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}
 {:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}
 {:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}
 {:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}
 {:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
 {:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}
 {:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}
 {:low 1.117575, :time "2016-08-02T06:00:00.000000Z"}
 {:low 1.117135, :time "2016-08-02T04:30:00.000000Z"}
 {:low 1.11624, :time "2016-08-02T02:00:00.000000Z"}
 {:low 1.115895, :time "2016-08-01T21:30:00.000000Z"}
 {:low 1.11552, :time "2016-08-01T11:45:00.000000Z"}
 {:low 1.11049, :time "2016-07-29T12:15:00.000000Z"}
 {:low 1.108825, :time "2016-07-29T08:30:00.000000Z"}
 {:low 1.10839, :time "2016-07-29T08:00:00.000000Z"}
 {:low 1.10744, :time "2016-07-29T05:45:00.000000Z"}
 {:low 1.10716, :time "2016-07-28T19:30:00.000000Z"}
 {:low 1.10705, :time "2016-07-28T18:45:00.000000Z"}
 {:low 1.106875, :time "2016-07-28T18:00:00.000000Z"}
 {:low 1.10641, :time "2016-07-28T05:45:00.000000Z"}
 {:low 1.10591, :time "2016-07-28T01:45:00.000000Z"}
 {:low 1.10579, :time "2016-07-27T23:15:00.000000Z"}
 {:low 1.105275, :time "2016-07-27T22:00:00.000000Z"}
 {:low 1.096135, :time "2016-07-27T18:00:00.000000Z"}]
Run Code Online (Sandbox Code Playgroud)

从概念上讲,我想匹配:high/ :low对,计算出价格范围(高 - 低)和中点(高和低的平均值),但我不希望生成所有可能的对.

我想要做的是从集合中的第一个项目开始,{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}然后"向下"走过集合的其余部分,创建一个对每个:low项目UNTIL我点击下一个:high项目.一旦我点击下一个:high项目,我对任何进一步的对都不感兴趣.在这种情况下,只创建了一对,即:high第一个:low- 我停在那里因为下一个(第三个)项是a :high.1生成的记录应该是这样的{:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}

接下来,我将移动到集合中的第二个项目,{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}然后"向下"穿过集合的其余部分,创建一个对每个:high项目UNTIL我点击下一个:low项目.在这种情况下,我生成了5条新记录,其中:low5条:high是连续的; 这5条记录中的第一条看起来像

{:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
Run Code Online (Sandbox Code Playgroud)

这5条记录中的第二条看起来像

{:price-range 0.000835, :midpoint 1.1215075, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}]}
Run Code Online (Sandbox Code Playgroud)

等等.

在那之后,我得到了一个,:low所以我就到此为止.

然后我将进入第3个项目,{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}并向下"向下",与每个:lowUNTIL 一起创建对,然后我击中下一个:high.在这种情况下,我得到0对生成,因为:high紧接着是另一个:high.对于接下来的3个:高项目也是如此,所有项目都紧跟另一个项目:high

接下来我到达第7项{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"},这应该与以下20 :low项中的每一项生成一对.

我生成的结果将是创建的所有对的列表:

[{:price-range 0.000365, :midpoint 1.121272, :extremes [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}]}
 {:price-range 0.000064, :midpoint 1.12131, :extremes [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}{:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}]}
 ...
Run Code Online (Sandbox Code Playgroud)

如果我使用Python之类的东西来实现它,我可能会使用一些嵌套循环,break当我停止看到:highs与我配对时:low反过来使用a 来退出内循环,并将所有生成的记录累积到数组,因为我遍历了2个循环.我只是无法找到一个使用Clojure攻击它的好方法......

有任何想法吗?

lee*_*ski 6

首先,你可以通过以下方式改写:

  1. 你必须找到所有的边界点,:high后面跟着:low,反之亦然
  2. 你需要在绑定之前获取项目,并在绑定之后使用它和每个项目创建一些东西,但直到下一个切换绑定.

为简单起见,我们使用以下数据模型:

(def data0 [{:a 1} {:b 2} {:b 3} {:b 4} {:a 5} {:a 6} {:a 7}])
Run Code Online (Sandbox Code Playgroud)

第一部分可以通过使用partition-by函数来实现,每次函数改变处理项的值时,它都会分割输入集合:

user> (def step1 (partition-by (comp boolean :a) data0))
#'user/step1
user> step1
(({:a 1}) ({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7}))
Run Code Online (Sandbox Code Playgroud)

现在你需要采取这两组中的每一组并操纵它们.这些群体应该是这样的:[({:a 1})({:b 2} {:b 3} {:b 4})] [({:b 2} {:b 3} {:b 4} )({:a 5} {:a 6} {:a 7})]

这是通过以下partition功能实现的:

user> (def step2 (partition 2 1 step1))
#'user/step2
user> step2
((({:a 1}) ({:b 2} {:b 3} {:b 4})) 
 (({:b 2} {:b 3} {:b 4}) ({:a 5} {:a 6} {:a 7})))
Run Code Online (Sandbox Code Playgroud)

你必须为每一对群体做点什么.你可以用地图做到:

user> (def step3 (map (fn [[lbounds rbounds]]
                    (map #(vector (last lbounds) %)
                         rbounds))
                  step2))
#'user/step3
user> step3
(([{:a 1} {:b 2}] [{:a 1} {:b 3}] [{:a 1} {:b 4}]) 
 ([{:b 4} {:a 5}] [{:b 4} {:a 6}] [{:b 4} {:a 7}]))
Run Code Online (Sandbox Code Playgroud)

但由于您需要连接列表,而不是分组列表,您可能希望使用mapcat而不是map:

user> (def step3 (mapcat (fn [[lbounds rbounds]]
                           (map #(vector (last lbounds) %)
                                rbounds))
                         step2))
#'user/step3
user> step3
([{:a 1} {:b 2}] 
 [{:a 1} {:b 3}] 
 [{:a 1} {:b 4}] 
 [{:b 4} {:a 5}] 
 [{:b 4} {:a 6}] 
 [{:b 4} {:a 7}])
Run Code Online (Sandbox Code Playgroud)

这就是我们想要的结果(它几乎是,因为我们只生成矢量而不是地图).

现在你可以使用线程宏来美化它:

(->> data0
     (partition-by (comp boolean :a))
     (partition 2 1)
     (mapcat (fn [[lbounds rbounds]]
               (map #(vector (last lbounds) %)
                    rbounds))))
Run Code Online (Sandbox Code Playgroud)

这给你完全相同的结果.

应用于您的数据它看起来几乎相同(另一个结果生成fn)

user> (defn hi-or-lo [item]
        (item :high (item :low)))
#'user/hi-or-lo
user> 
(->> data
     (partition-by (comp boolean :high))
     (partition 2 1)
     (mapcat (fn [[lbounds rbounds]]
               (let [left-bound (last lbounds)
                     left-val (hi-or-lo left-bound)]
                 (map #(let [right-val (hi-or-lo %)
                             diff (Math/abs (- right-val left-val))]
                         {:extremes [left-bound %]
                          :price-range diff
                          :midpoint (+ (min right-val left-val)
                                       (/ diff 2))})
                      rbounds))))
     (clojure.pprint/pprint))
Run Code Online (Sandbox Code Playgroud)

它打印以下内容:

({:extremes
  [{:high 1.121455, :time "2016-08-03T05:15:00.000000Z"}
   {:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}],
  :price-range 3.6500000000017074E-4,
  :midpoint 1.1212725}
 {:extremes
  [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
   {:high 1.12173, :time "2016-08-03T04:30:00.000000Z"}],
  :price-range 6.399999999999739E-4,
  :midpoint 1.12141}
 {:extremes
  [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
   {:high 1.121925, :time "2016-08-03T00:00:00.000000Z"}],
  :price-range 8.350000000001412E-4,
  :midpoint 1.1215074999999999}
 {:extremes
  [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
   {:high 1.12215, :time "2016-08-02T23:00:00.000000Z"}],
  :price-range 0.001060000000000061,
  :midpoint 1.12162}
 {:extremes
  [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
   {:high 1.12273, :time "2016-08-02T21:15:00.000000Z"}],
  :price-range 0.0016400000000000858,
  :midpoint 1.12191}
 {:extremes
  [{:low 1.12109, :time "2016-08-03T05:15:00.000000Z"}
   {:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}],
  :price-range 0.0022900000000001253,
  :midpoint 1.1222349999999999}
 {:extremes
  [{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
   {:low 1.119215, :time "2016-08-02T12:30:00.000000Z"}],
  :price-range 0.004164999999999974,
  :midpoint 1.1212975}
 {:extremes
  [{:high 1.12338, :time "2016-08-02T18:15:00.000000Z"}
   {:low 1.118755, :time "2016-08-02T12:00:00.000000Z"}],
  :price-range 0.004625000000000101,
  :midpoint 1.1210675}
 ...
Run Code Online (Sandbox Code Playgroud)

作为回答关于"复杂数据操作"的问题,我建议您查看clojure核心中所有集合的操作函数,然后尝试将任何任务分解为这些函数的应用程序.当你需要超越它们的东西时,没有那么多的情况.