并发:Java Map

Phi*_*ixz 9 java multithreading java.util.concurrent

将2000万个实体推入java地图对象的最佳方法是什么?

  1. 没有多线程,它需要大约40秒.
  2. 使用ForkJoinPool需要大约25秒,我创建了2个任务,每个任务都推动了1000万个实体

我相信这两项任务都在两个不同的核心中运行.问题:当我创建一个推送1000万个数据的任务时,需要大约9秒,然后当运行2个任务时,每个任务都会推送1000万个数据,为什么需要大约26秒?难道我做错了什么 ?

在不到10秒的时间内插入20 M数据是否有不同的解决方案?

Lol*_*olo 1

如果没有看到您的代码,这些不良性能结果的最可能原因是垃圾收集活动。为了演示它,我编写了以下程序:

import java.lang.management.ManagementFactory;
import java.util.*;
import java.util.concurrent.*;

public class TestMap {
  // we assume NB_ENTITIES is divisible by NB_TASKS
  static final int NB_ENTITIES = 20_000_000, NB_TASKS = 2;
  static Map<String, String> map = new ConcurrentHashMap<>();

  public static void main(String[] args) {
    try {
      System.out.printf("running with nb entities = %,d, nb tasks = %,d, VM args = %s%n", NB_ENTITIES, NB_TASKS, ManagementFactory.getRuntimeMXBean().getInputArguments());
      ExecutorService executor = Executors.newFixedThreadPool(NB_TASKS);
      int entitiesPerTask = NB_ENTITIES / NB_TASKS;
      List<Future<?>> futures = new ArrayList<>(NB_TASKS);
      long startTime = System.nanoTime();
      for (int i=0; i<NB_TASKS; i++) {
        MyTask task = new MyTask(i * entitiesPerTask, (i + 1) * entitiesPerTask - 1);
        futures.add(executor.submit(task));
      }
      for (Future<?> f: futures) {
        f.get();
      }
      long elapsed = System.nanoTime() - startTime;
      executor.shutdownNow();
      System.gc();
      Runtime rt = Runtime.getRuntime();
      long usedMemory = rt.maxMemory() - rt.freeMemory();
      System.out.printf("processing completed in %,d ms, usedMemory after GC = %,d bytes%n", elapsed/1_000_000L, usedMemory);
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

  static class MyTask implements Runnable {
    private final int startIdx, endIdx;

    public MyTask(final int startIdx, final int endIdx) {
      this.startIdx = startIdx;
      this.endIdx = endIdx;
    }

    @Override
    public void run() {
      long startTime = System.nanoTime();
      for (int i=startIdx; i<=endIdx; i++) {
        map.put("sambit:rout:" + i, "C:\\Images\\Provision_Images");
      }
      long elapsed = System.nanoTime() - startTime;
      System.out.printf("task[%,d - %,d], completed in %,d ms%n", startIdx, endIdx, elapsed/1_000_000L);
    }
  }
}
Run Code Online (Sandbox Code Playgroud)

在处理结束时,此代码通过执行System.gc()紧随其后的a 计算已用内存的近似值Runtime.maxMemory() - Runtime.freeMemory()。这表明包含 2000 万个条目的地图大约需要不到 2.2 GB,这是相当大的。我已经使用 1 个和 2 个线程运行它,对于 -Xmx 和 -Xms JVM 参数的各种值,以下是结果输出(需要明确的是:2560m = 2.5g):

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 19,999,999], completed in 11,781 ms
processing completed in 11,782 ms, usedMemory after GC = 2,379,068,760 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms2560m, -Xmx2560m]
task[0 - 9,999,999], completed in 8,269 ms
task[10,000,000 - 19,999,999], completed in 12,385 ms
processing completed in 12,386 ms, usedMemory after GC = 2,379,069,480 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms3g, -Xmx3g]
task[0 - 19,999,999], completed in 12,525 ms
processing completed in 12,527 ms, usedMemory after GC = 2,398,339,944 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms3g, -Xmx3g]
task[0 - 9,999,999], completed in 12,220 ms
task[10,000,000 - 19,999,999], completed in 12,264 ms
processing completed in 12,265 ms, usedMemory after GC = 2,382,777,776 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms4g, -Xmx4g]
task[0 - 19,999,999], completed in 7,363 ms
processing completed in 7,364 ms, usedMemory after GC = 2,402,467,040 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms4g, -Xmx4g]
task[0 - 9,999,999], completed in 5,466 ms
task[10,000,000 - 19,999,999], completed in 5,511 ms
processing completed in 5,512 ms, usedMemory after GC = 2,381,821,576 bytes

running with nb entities = 20,000,000, nb tasks = 1, VM args = [-Xms8g, -Xmx8g]
task[0 - 19,999,999], completed in 7,778 ms
processing completed in 7,779 ms, usedMemory after GC = 2,438,159,312 bytes

running with nb entities = 20,000,000, nb tasks = 2, VM args = [-Xms8g, -Xmx8g]
task[0 - 9,999,999], completed in 5,739 ms
task[10,000,000 - 19,999,999], completed in 5,784 ms
processing completed in 5,785 ms, usedMemory after GC = 2,396,478,680 bytes
Run Code Online (Sandbox Code Playgroud)

这些结果可总结在下表中:

--------------------------------
heap      | exec time (ms) for: 
size (gb) | 1 thread | 2 threads
--------------------------------
2.5       |    11782 |     12386
3.0       |    12527 |     12265
4.0       |     7364 |      5512
8.0       |     7779 |      5785
--------------------------------
Run Code Online (Sandbox Code Playgroud)

我还观察到,对于 2.5g 和 3g 堆大小,由于 GC 活动,CPU 活动很高,在整个处理时间内峰值达到 100%,而对于 4g 和 8g,仅在最后观察到由于通话System.gc()

总结一下:

  1. 如果您的堆大小不合适,垃圾收集将消除您希望获得的任何性能提升。您应该将其设置得足够大,以避免长时间 GC 暂停的副作用。

  2. 您还必须意识到使用并发集合(例如)ConcurrentHashMap会产生显着的性能开销。为了说明这一点,我稍微修改了代码,以便每个任务使用自己的HashMap,然后最后所有映射都Map.putAll()在第一个任务的映射中聚合(使用 )。处理时间下降至3200毫秒左右