小编Use*_*130的帖子

从Apache Spark中的模式获取数据类型列表

我在Spark-Python中有以下代码来获取DataFrame架构中的名称列表,它工作正常,但我如何获取数据类型列表?

columnNames = df.schema.names
Run Code Online (Sandbox Code Playgroud)

例如,类似于:

columnTypes = df.schema.types
Run Code Online (Sandbox Code Playgroud)

有没有办法获得DataFrame模式中包含的单独的数据类型列表?

python schema types apache-spark spark-dataframe

15
推荐指数
2
解决办法
3万
查看次数

如何在Apache Spark中向Kryo Serializer注册类?

我正在使用Spark 1.6.1和Python.在使用PySpark时如何启用Kryo序列化?

我在spark-default.conf文件中有以下设置:

spark.eventLog.enabled             true
spark.eventLog.dir                 //local_drive/sparkLogs
spark.default.parallelism          8
spark.locality.wait.node           5s
spark.executor.extraJavaOptions    -XX:+UseCompressedOops
spark.serializer                   org.apache.spark.serializer.KryoSerializer
spark.kryo.classesToRegister      Timing, Join, Select, Predicate, Timeliness, Project, Query2, ScanSelect
spark.shuffle.compress             true
Run Code Online (Sandbox Code Playgroud)

并出现以下错误:

py4j.protocol.Py4JJavaError: An error occurred while calling o35.load.
: org.apache.spark.SparkException: Failed to register classes with Kryo
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
at org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:258)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:174)

Caused by: java.lang.ClassNotFoundException: Timing
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:120)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$4.apply(KryoSerializer.scala:120)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) …
Run Code Online (Sandbox Code Playgroud)

serialization kryo apache-spark pyspark

4
推荐指数
1
解决办法
3237
查看次数