Rem*_*miX 6 java performance multithreading multicore intel
I'm experimenting with some multithreading constructions, but somehow it seems that multithreading is not faster than a single thread. I narrowed it down to a very simple test with a nested loop (1000x1000) in which the system only counts.
Below I posted the code for both single threading and multithreading and how they are executed.
The result is that the single thread completes the loop in about 110 ms, while the two threads also take about 112 ms.
I don't think the problem is the overhead of multithreading. If I only submit one of both Runnables to the ThreadPoolExecutor, it executes in half the time of the single thread, which makes sense. But adding that second Runnable makes it 10 times slower. Both 3.00Ghz cores are running 100%.
我认为它可能是特定于PC的,因为其他人的PC在多线程上显示了双倍速度的结果.但那么,我能做些什么呢?我有一台Intel Pentium 4 3.00GHz(2个CPU)和Java jre6.
测试代码:
// Single thread:
long start = System.nanoTime(); // Start timer
final int[] i = new int[1]; // This is to keep the test fair (see below)
int i = 0;
for(int x=0; x<10000; x++)
{
for(int y=0; y<10000; y++)
{
i++; // Just counting...
}
}
int i0[0] = i;
long end = System.nanoTime(); // Stop timer
Run Code Online (Sandbox Code Playgroud)
此代码在大约110毫秒内执行.
// Two threads:
start = System.nanoTime(); // Start timer
// Two of the same kind of variables to count with as in the single thread.
final int[] i1 = new int [1];
final int[] i2 = new int [1];
// First partial task (0-5000)
Thread t1 = new Thread() {
@Override
public void run()
{
int i = 0;
for(int x=0; x<5000; x++)
for(int y=0; y<10000; y++)
i++;
i1[0] = i;
}
};
// Second partial task (5000-10000)
Thread t2 = new Thread() {
@Override
public void run()
{
int i = 0;
for(int x=5000; x<10000; x++)
for(int y=0; y<10000; y++)
i++;
int i2[0] = i;
}
};
// Start threads
t1.start();
t2.start();
// Wait for completion
try{
t1.join();
t2.join();
}catch(Exception e){
e.printStackTrace();
}
end = System.nanoTime(); // Stop timer
Run Code Online (Sandbox Code Playgroud)
该代码在大约112毫秒内执行.
编辑:我将Runnables更改为Threads并摆脱了ExecutorService(为了简化问题).
编辑:尝试了一些建议
sne*_*rch 12
你肯定不想继续进行轮询Thread.isAlive()- 这没有充分的理由会烧掉很多CPU周期.请Thread.join()改用.
此外,让线程直接递增结果数组,缓存行和所有内容可能不是一个好主意.更新局部变量,并在计算完成后执行单个存储.
完全忽略了你正在使用奔腾4.据我所知,没有多核版本的P4--为了给出多核的幻觉,它有超线程:两个逻辑核共享一个物理的执行单元核心.如果您的线程依赖于相同的执行单元,那么您的性能将与(或更差!)单线程性能相同.例如,您需要在一个线程中进行浮点计算,在另一个线程中进行整数计算以获得性能改进.
P4 HT实现受到了很多批评,较新的实现(最近的core2)应该更好.
| 归档时间: |
|
| 查看次数: |
5929 次 |
| 最近记录: |