在SBCL中进行多线程计算的正确方法

Question

在SBCL中进行多线程计算的正确方法

Fra*_*son 3 multithreading sbcl common-lisp

上下文

我需要使用多线程进行计算.我使用SBCL并且可移植性不是问题.我知道bordeaux-threads并lparallel存在,但我想在特定的SBCL线程实现提供的相对较低的级别实现一些东西.我需要最大速度,即使以牺牲可读性/编程工作为代价.

计算密集型操作的示例

我们可以定义一个充分的计算密集型函数,它将受益于多线程.

(defun intensive-sqrt (x)
  "Dummy calculation for intensive algorithm.
Approx 50 ms for 1e6 iterations."
  (let ((y x))
    (dotimes (it 1000000 t)
      (if (> y 1.01d0)
          (setf y (sqrt y))
          (setf y (* y y y))))
    y))

Run Code Online (Sandbox Code Playgroud)

将每个计算映射到一个线程并执行

给定参数列表llarg和函数列表fun,我们想要计算nthreads结果并返回结果列表res-list.以下是我使用我找到的资源(见下文).

(defmacro splice-arglist-help (fun arglist)
  "Helper macro.
   Splices a list 'arglist' (arg1 arg2 ...) into the function call of 'fun'
   Returns (funcall fun arg1 arg2 ...)"
  `(funcall ,fun ,@arglist))

(defun splice-arglist (fun arglist)
  (eval `(splice-arglist-help ,fun ,arglist)))

(defun maplist-fun-multi (fun llarg nthreads)
  "Maps 'fun' over list of argument lists 'llarg' using multithreading.
   Breaks up llarg and feeds it to each thread.
   Appends all the result lists at the end."
  (let ((thread-list nil)
        (res-list nil))
    ;; Create and run threads
    (dotimes (it nthreads t)
      (let ((larg-temp (elt llarg it)))
        (setf thread-list (append thread-list
                                  (list (sb-thread:make-thread
                                         (lambda ()
                                           (splice-arglist fun larg-temp))))))))
    ;; Join threads
    ;; Threads are joined in order, not optimal for speed.
    ;; Should be joined when finished ?
    (dotimes (it (list-length thread-list) t)
      (setf res-list (append res-list (list (sb-thread:join-thread (elt thread-list it))))))
    res-list))

Run Code Online (Sandbox Code Playgroud)

nthreads不一定与长度匹配llarg,但我为了简单的例子而避免额外的簿记.我也省略了declare用于优化的各种.

我们可以使用以下方法测试多线程并比较时序:

(defparameter *test-args-sqrt-long* nil)
(dotimes (it 10000 t)
  (push (list (+ 3d0 it)) *test-args-sqrt-long*))

(time (intensive-sqrt 5d0))
(time (maplist-fun-multi #'intensive-sqrt *test-args-sqrt-long* 100))

Run Code Online (Sandbox Code Playgroud)

线程数非常多.我认为最好的方法是使用与CPU一样多的线程,但我注意到性能下降在时间/操作方面几乎不可察觉.执行更多操作将涉及将输入列表分解为更小的部分.

以上代码输出,在2核/ 4线程机器上:

Evaluation took:
  0.029 seconds of real time
  0.015625 seconds of total run time (0.015625 user, 0.000000 system)
  55.17% CPU
  71,972,879 processor cycles
  22,151,168 bytes consed

Evaluation took:
  1.415 seconds of real time
  4.703125 seconds of total run time (4.437500 user, 0.265625 system)
  [ Run times consist of 0.205 seconds GC time, and 4.499 seconds non-GC time. ]
  332.37% CPU
  3,530,632,834 processor cycles
  2,215,345,584 bytes consed

Run Code Online (Sandbox Code Playgroud)

什么在困扰我

我给出的示例非常有效并且非常健壮(即结果不会在线程之间混淆,并且我没有遇到崩溃).速度增益也存在,并且计算确实在我测试此代码的机器上使用了几个核心/线程.但有一些我想要的意见/帮助:

使用参数列表llarg和larg-temp.这真的有必要吗？有没有办法避免操纵潜在的巨大列表？
线程按照它们存储在的顺序连接thread-list.我想如果每个操作都花费不同的时间来完成,这将不是最佳选择.有没有办法在每个线程完成时加入它们,而不是等待？

答案应该在我已经找到的资源中,但我发现更难以解决的更高级的东西.

迄今为止找到的资源

Answer 1

Sva*_*nte 5

文体问题

该splice-arglist是没有必要的帮手(所以我也会跳过它们的详细信息).apply改为使用你的线程函数:
```
(lambda ()
  (apply fun larg-temp))
```
Run Code Online (Sandbox Code Playgroud)
您不需要(也不应该)索引到列表中,因为每次查找都是O(n) - 您的循环是二次的.使用dolist简单的侧有效循环,或loop当你如并行迭代:
```
(loop :repeat nthreads
      :for args :in llarg
      :collect (sb-thread:make-thread (lambda () (apply fun args))))
```
Run Code Online (Sandbox Code Playgroud)
要在创建新的相同长度列表时查看列表,其中每个元素都是根据源列表中的相应元素计算的,请使用mapcar:
```
(mapcar #'sb-thread:join-thread threads)
```
Run Code Online (Sandbox Code Playgroud)

你的功能因此变成:

(defun map-args-parallel (fun arglists nthreads)
  (let ((threads (loop :repeat nthreads
                       :for args :in arglists
                       :collect (sb-thread:make-thread
                                 (lambda ()
                                   (apply fun args))))))
    (mapcar #'sb-thread:join-thread threads)))

Run Code Online (Sandbox Code Playgroud)

性能

你是对的,通常只创建与ca一样多的线程.可用的核心数量.如果通过始终创建n个线程来测试性能,然后加入它们,然后转到下一批,那么性能确实没有太大差异.这是因为效率低下在于创建线程.线程与进程一样资源密集.

通常做的是创建一个线程池,其中线程不会被连接,而是被重用.为此,您需要一些其他机制来传达参数和结果,例如通道(例如来自chanl).

但请注意,例如,lparallel已经提供了一个pmap功能,并且它做得对.这种包装器库的目的不仅是为用户(程序员)提供一个很好的界面,而且还要认真思考问题并合理地进行优化.我相信这pmap将比你的尝试快得多.

怎么样`(sb-thread:make-thread fun:arguments args)`？在调用`make-thread`之前重新绑定args怎么样？这会有用吗？ (2认同)

归档时间：	7 年，4 月前
查看次数：	241 次
最近记录：	7 年，4 月前