Wil*_*ell 16 java concurrency multithreading multicore cpu-usage
我在具有四核CPU的机器上运行Ubuntu.我编写了一些测试Java代码,它产生了一定数量的进程,这些进程只是在运行时为一定数量的迭代增加一个volatile变量.
我希望运行时间不会显着增加,而线程数小于或等于内核数量,即4.实际上,这些是我从UNIX time命令使用"实时"的时间:
1个线程:1.005s
2个主题:1.018s
3个主题:1.528s
4个主题:1.982s
5个主题:2.479s
6个主题:2.934s
7个主题:3.356s
8个主题:3.793s
这表明添加一个额外的线程不会像预期的那样增加时间,但随后时间会增加3和4个线程.
起初我以为这可能是因为操作系统阻止了JVM使用所有内核,但我跑了top,它清楚地表明,有3个线程,3个内核运行在~100%,有4个线程,4个内核是超出.
我的问题是:为什么在3/4 CPU上运行的代码与在1/2运行时的速度大致相同?因为它是在所有核心并行运行.
这是我的主要参考方法:
class Example implements Runnable {
// using this so the compiler does not optimise the computation away
volatile int temp;
void delay(int arg) {
for (int i = 0; i < arg; i++) {
for (int j = 0; j < 1000000; j++) {
this.temp += i + j;
}
}
}
int arg;
int result;
Example(int arg) {
this.arg = arg;
}
public void run() {
delay(arg);
result = 42;
}
public static void main(String args[]) {
// Get the number of threads (the command line arg)
int numThreads = 1;
if (args.length > 0) {
try {
numThreads = Integer.parseInt(args[0]);
} catch (NumberFormatException nfe) {
System.out.println("First arg must be the number of threads!");
}
}
// Start up the threads
Thread[] threadList = new Thread[numThreads];
Example[] exampleList = new Example[numThreads];
for (int i = 0; i < numThreads; i++) {
exampleList[i] = new Example(1000);
threadList[i] = new Thread(exampleList[i]);
threadList[i].start();
}
// wait for the threads to finish
for (int i = 0; i < numThreads; i++) {
try {
threadList[i].join();
System.out.println("Joined with thread, ret=" + exampleList[i].result);
} catch (InterruptedException ie) {
System.out.println("Caught " + ie);
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
使用多个CPU有助于达到使某些底层资源饱和的程度.
在您的情况下,底层资源不是CPU的数量,而是您拥有的L1缓存的数量.在您的情况下,您似乎有两个内核,每个内核都有一个L1数据缓存,因为您使用易失性写入命中它,所以L1缓存是您的限制因素.
尝试少用L1缓存
public class Example implements Runnable {
// using this so the compiler does not optimise the computation away
volatile int temp;
void delay(int arg) {
for (int i = 0; i < arg; i++) {
int temp = 0;
for (int j = 0; j < 1000000; j++) {
temp += i + j;
}
this.temp += temp;
}
}
int arg;
int result;
Example(int arg) {
this.arg = arg;
}
public void run() {
delay(arg);
result = 42;
}
public static void main(String... ignored) {
int MAX_THREADS = Integer.getInteger("max.threads", 8);
long[] times = new long[MAX_THREADS + 1];
for (int numThreads = MAX_THREADS; numThreads >= 1; numThreads--) {
long start = System.nanoTime();
// Start up the threads
Thread[] threadList = new Thread[numThreads];
Example[] exampleList = new Example[numThreads];
for (int i = 0; i < numThreads; i++) {
exampleList[i] = new Example(1000);
threadList[i] = new Thread(exampleList[i]);
threadList[i].start();
}
// wait for the threads to finish
for (int i = 0; i < numThreads; i++) {
try {
threadList[i].join();
System.out.println("Joined with thread, ret=" + exampleList[i].result);
} catch (InterruptedException ie) {
System.out.println("Caught " + ie);
}
}
long time = System.nanoTime() - start;
times[numThreads] = time;
System.out.printf("%d: %.1f ms%n", numThreads, time / 1e6);
}
for (int i = 2; i <= MAX_THREADS; i++)
System.out.printf("%d: %.3f time %n", i, (double) times[i] / times[1]);
}
}
Run Code Online (Sandbox Code Playgroud)
在我的双核,超线程笔记本电脑上,它以形式生产 threads: factor
2: 1.093 time
3: 1.180 time
4: 1.244 time
5: 1.759 time
6: 1.915 time
7: 2.154 time
8: 2.412 time
Run Code Online (Sandbox Code Playgroud)
与原来的测试相比
2: 1.092 time
3: 2.198 time
4: 3.349 time
5: 3.079 time
6: 3.556 time
7: 4.183 time
8: 4.902 time
Run Code Online (Sandbox Code Playgroud)
过度使用的常见资源是L3缓存.这是在CPU之间共享的,虽然它允许一定程度的并发性,但它不能很好地扩展到CPU.我建议你查看你的示例代码正在做什么,并确保它们可以独立运行而不使用任何共享资源.例如,大多数芯片具有有限数量的FPU.
联想X1 Carbon中的Core i5不是四核处理器.它是具有超线程的双核处理器.当您只执行不会导致频繁,长管道停顿的简单操作时,超线程调度程序将没有太多机会将其他操作编织到停滞的管道中,您将看不到相当于四个实际内核的性能.
| 归档时间: |
|
| 查看次数: |
7624 次 |
| 最近记录: |