Nic*_*ick 1 java multithreading http file
我有一个包含数百万行的文件,我需要处理它.该文件的每一行都将导致HTTP调用.我正在试图找出解决问题的最佳方法.
我显然可以只读取文件并按顺序拨打电话,但速度会非常慢.我想并行化调用,但我不确定是否应该将整个文件读入内存(我不是很喜欢的东西)或尝试并行化文件的读取(我是我不确定是否有意义).
只是在这里寻找一些关于解决问题的最佳方法的想法.如果有一个类似的东西的现有框架或库我也很乐意使用它.
谢谢.
我想并行化调用,但我不确定是否应该将整个文件读入内存
你应该使用ExecutorService有界的BlockingQueue.当您阅读百万行时,您将作业提交到线程池,直到BlockingQueue它已满.这样,您就可以同时运行100个(或任何数量最佳的)HTTP请求,而无需事先读取文件的所有行.
RejectedExecutionHandler如果队列已满,您需要设置阻止.这比调用者运行处理程序更好.
BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
@Override
public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
try {
// this will block the producer until there's room in the queue
executor.getQueue().put(r);
} catch (InterruptedException e) {
throw new RejectedExecutionException(
"Unexpected InterruptedException", e);
}
}
});
// now read in the urls
while ((String url = urlReader.readLine()) != null) {
// submit them to the thread-pool. this may block.
threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
...
private class DownloadUrlRunnable implements Runnable {
private final String url;
public DownloadUrlRunnable(String url) {
this.url = url;
}
public void run() {
// download the URL
}
}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
933 次 |
| 最近记录: |