为了测试spark中的序列化异常,我用2种方式编写了一个任务.
第一种方式:
package examples
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object dd {
def main(args: Array[String]):Unit = {
val sparkConf = new SparkConf
val sc = new SparkContext(sparkConf)
val data = List(1,2,3,4,5)
val rdd = sc.makeRDD(data)
val result = rdd.map(elem => {
funcs.func_1(elem)
})
println(result.count())
}
}
object funcs{
def func_1(i:Int): Int = {
i + 1
}
}
Run Code Online (Sandbox Code Playgroud)
这种方式火花效果很好.
当我将其更改为以下方式时,它不起作用并抛出NotSerializableException.
第二种方式:
package examples
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
object dd {
def main(args: Array[String]):Unit = {
val sparkConf = new SparkConf
val …Run Code Online (Sandbox Code Playgroud) 我们在Spark上使用Redis来缓存我们的键值对.这是代码:
import com.redis.RedisClient
val r = new RedisClient("192.168.1.101", 6379)
val perhit = perhitFile.map(x => {
val arr = x.split(" ")
val readId = arr(0).toInt
val refId = arr(1).toInt
val start = arr(2).toInt
val end = arr(3).toInt
val refStr = r.hmget("refStr", refId).get(refId).split(",")(1)
val readStr = r.hmget("readStr", readId).get(readId)
val realend = if(end > refStr.length - 1) refStr.length - 1 else end
val refOneStr = refStr.substring(start, realend)
(readStr, refOneStr, refId, start, realend, readId)
})
Run Code Online (Sandbox Code Playgroud)
但编译器给了我这样的反馈:
Exception in thread "main" org.apache.spark.SparkException: Task not serializable …Run Code Online (Sandbox Code Playgroud)