相关疑难解决方法(0)

(null)在Pyspark上的saveAsTextFile()中的命令字符串异常中的条目

我在Windows 7中使用Jupyter笔记本(Python 2.7)在PySpark中工作.我有一个pyspark.rdd.PipelinedRDD名为RDD的类型idSums.尝试执行时idSums.saveAsTextFile("Output"),我收到以下错误:

Py4JJavaError: An error occurred while calling o834.saveAsTextFile.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 33.0 failed 1 times, most recent failure: Lost task 1.0 in stage 33.0 (TID 131, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\Users\seride\Desktop\Experiments\PySpark\Output\_temporary\0\_temporary\attempt_201611231307_0033_m_000001_131\part-00001
Run Code Online (Sandbox Code Playgroud)

在我看来,RDD对象不应该有任何问题,因为我能够无错误地执行其他操作,例如执行idSums.collect()产生正确的输出.

此外,Output创建目录(包含所有子目录)并part-00001创建文件,但它是0字节.

apache-spark pyspark jupyter-notebook

13
推荐指数
1
解决办法
1万
查看次数

Scala 和 Spark:Windows 上的 Dataframe.write._

有人设法在 Windows 上使用 Spark 的DataFrame编写文件(尤其是 CSV)吗?

由于自 2.0 版以来Sparks 编写 .CSV 的本机功能(和统一方法),许多关于 SO 的答案已经过时(例如这个write()。此外,我下载并添加winutils.exe此处建议的内容。

代码

// reading works just fine
val df = spark.read
             .option("header", true)
             .option("inferSchema", true)
             .csv("file:///C:/tmp/in.csv")
// writing fails, none of these work
df.write.csv("file:///C:/tmp/out.csv")
df.write.csv("C:/tmp/out.csv")
Run Code Online (Sandbox Code Playgroud)

错误

Exception in thread "main" org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:149)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
    at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136) …
Run Code Online (Sandbox Code Playgroud)

windows csv scala apache-spark

5
推荐指数
1
解决办法
1751
查看次数

标签 统计

apache-spark ×2

csv ×1

jupyter-notebook ×1

pyspark ×1

scala ×1

windows ×1