为什么我的多线程效率不高?

nba*_*lle 10 java performance benchmarking multithreading

我设计了一个类,使用不同数量的线程填充整数数组,以便了解多线程的强大功能.但根据我的结果,没有...

这个想法:这个想法太过于填充了100000000个整数,其值为"1".从1个线程开始(一个线程填充整个数组)并将其递增直到100个线程(每个线程填充一个大小为100000000/nbThreads的子数组)

示例:使用10个线程,我创建10个线程,每个线程填充10000000个整数的数组.

这是我的代码:

public class ThreadedArrayFilling extends Thread{
    private int start;
    private int partitionSize;
    public static int[] data;
    public static final int SIZE = 100000000;
    public static final int NB_THREADS_MAX = 100;


    public static void main(String[] args){
        data = new int[SIZE];
        long startTime, endTime;
        int partition, startIndex, j;
        ThreadedArrayLookup[] threads;

        for(int i = 1; i <= NB_THREADS_MAX; i++){       
            startTime = System.currentTimeMillis();
            partition = SIZE / i;
            startIndex = 0;
                threads = new ThreadedArrayLookup[i];
            for(j = 0; j < i; j++){         
                threads[j] = new ThreadedArrayLookup(startIndex, partition);
                startIndex += partition;
            }
            for(j = 0; j < i; j++){
                try {
                    threads[j].join();
                } catch (InterruptedException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }
            endTime = System.currentTimeMillis();       
            System.out.println(i + " THREADS: " + (endTime - startTime) + "ms");
        }
    }

    public ThreadedArrayFilling(int start, int size){
        this.start = start;
        this.partitionSize = size;
        this.start();
    }

    public void run(){
        for(int i = 0; i < this.partitionSize; i++){
            data[this.start + i] = 1;
        }
    }

    public static String display(int[] d){
        String s = "[";

        for(int i = 0; i < d.length; i++){
            s += d[i] + ", ";
        }

        s += "]";
        return s;
    }

}
Run Code Online (Sandbox Code Playgroud)

这是我的结果:

1 THREADS: 196ms
2 THREADS: 208ms
3 THREADS: 222ms
4 THREADS: 213ms
5 THREADS: 198ms
6 THREADS: 198ms
7 THREADS: 198ms
8 THREADS: 198ms
9 THREADS: 198ms
10 THREADS: 206ms
11 THREADS: 201ms
12 THREADS: 197ms
13 THREADS: 198ms
14 THREADS: 204ms
15 THREADS: 199ms
16 THREADS: 203ms
17 THREADS: 234ms
18 THREADS: 225ms
19 THREADS: 235ms
20 THREADS: 235ms
21 THREADS: 234ms
22 THREADS: 221ms
23 THREADS: 211ms
24 THREADS: 203ms
25 THREADS: 206ms
26 THREADS: 200ms
27 THREADS: 202ms
28 THREADS: 204ms
29 THREADS: 202ms
30 THREADS: 200ms
31 THREADS: 206ms
32 THREADS: 200ms
33 THREADS: 205ms
34 THREADS: 203ms
35 THREADS: 200ms
36 THREADS: 206ms
37 THREADS: 200ms
38 THREADS: 204ms
39 THREADS: 205ms
40 THREADS: 201ms
41 THREADS: 206ms
42 THREADS: 200ms
43 THREADS: 204ms
44 THREADS: 204ms
45 THREADS: 206ms
46 THREADS: 203ms
47 THREADS: 204ms
48 THREADS: 204ms
49 THREADS: 201ms
50 THREADS: 205ms
51 THREADS: 204ms
52 THREADS: 207ms
53 THREADS: 202ms
54 THREADS: 207ms
55 THREADS: 207ms
56 THREADS: 203ms
57 THREADS: 203ms
58 THREADS: 201ms
59 THREADS: 206ms
60 THREADS: 206ms
61 THREADS: 204ms
62 THREADS: 201ms
63 THREADS: 206ms
64 THREADS: 202ms
65 THREADS: 206ms
66 THREADS: 205ms
67 THREADS: 207ms
68 THREADS: 210ms
69 THREADS: 207ms
70 THREADS: 203ms
71 THREADS: 207ms
72 THREADS: 205ms
73 THREADS: 203ms
74 THREADS: 211ms
75 THREADS: 202ms
76 THREADS: 207ms
77 THREADS: 204ms
78 THREADS: 212ms
79 THREADS: 203ms
80 THREADS: 210ms
81 THREADS: 206ms
82 THREADS: 205ms
83 THREADS: 203ms
84 THREADS: 203ms
85 THREADS: 209ms
86 THREADS: 204ms
87 THREADS: 206ms
88 THREADS: 208ms
89 THREADS: 263ms
90 THREADS: 216ms
91 THREADS: 230ms
92 THREADS: 216ms
93 THREADS: 230ms
94 THREADS: 234ms
95 THREADS: 234ms
96 THREADS: 217ms
97 THREADS: 229ms
98 THREADS: 228ms
99 THREADS: 215ms
100 THREADS: 232ms
Run Code Online (Sandbox Code Playgroud)

我错过了什么?

编辑:其他信息:

我的机器正在运行双核心.

期望:

  • 我期待看到1到2个线程之间的性能大幅提升(使用双核心)
  • 我也期待在那之后看到大量线程的减速.

但这证实了我的期望.我的期望是错误的,还是我的算法问题?

Mic*_*rdt 19

使用两个内核,您可能期望的最佳性能是2个线程占用一个线程的一半时间.任何其他线程只会在此之后创建无用的开销 - 假设您完全受CPU限制,但实际上并非如此.

问题是为什么从1线程到2线程时你没有看到改进.原因可能是您的程序不受CPU限制,但受内存限制.你的瓶颈是主要的内存访问,2个线程正在轮流写入主内存.实际的CPU内核大多数时间都没有做任何事情.你会看到预期的差异,如果不是在大面积内存上做很少的实际工作,你会在少量内存上做很多CPU密集型工作.因为每个CPU核心都可以在其缓存中完成工作.


Gug*_*see 9

当你的软件受CPU限制时,多线程是非常有效的:有许多应用程序是单线程的,你可以通过最大限度地只使用一个内核(这在CPU监视器中非常清楚地显示)来看到它们在使用现代CPU时的痛苦.

但是,启动比可用的(虚拟)CPU数量更多的线程没有意义.

正确的多线程应用程序(例如,数字运算)确实会创建许多与JVM可用的(虚拟)CPU数量相关的工作线程.

  • 代码不一定要受到多线程的限制.例如,尝试访问数据库.执行100个查询的1个线程将比10个执行10个查询的线程慢得多.当一些线程正在等待,睡眠,锁定被阻止或任何你的名字时,其他线程可以继续工作.在这种情况下,使用(甚至很多)线程比使用可用CPU更有意义.如果情况不是这样的话,多线程在单核处理器上就没有意义了. (5认同)
  • @nbarraille - 您的代码可能不受CPU限制,但受内存访问速度的限制. (4认同)