Spark调用ShuffleBlockFetcherIterator时发生了什么?

Ins*_*nct 12 apache-spark apache-spark-sql

我的火花工作似乎花了很多时间来获得积木.有时它会在一小时或2小时内执行此操作.我的数据集有1个分区,所以我不确定为什么它会这么多洗牌.谁知道这到底发生了什么?

15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:27 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:40 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:05:59 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:06:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:33 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:46 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:47 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 200 non-empty blocks out of 200 blocks
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Getting 4 non-empty blocks out of 4 blocks
15/12/16 18:07:58 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
Run Code Online (Sandbox Code Playgroud)

小智 1

ShuffleBlockFetcherIterator是一个 Scala 迭代器,它从本地和远程 BlockManager 获取多个随机块(也称为随机映射输出)。

它允许以(BlockId,InputStream)对的形式迭代一系列块,以便调用者可以在接收到洗牌块时以管道方式处理洗牌块。

为了性能——您需要调整您的操作;或配置。