相关疑难解决方法(0)

使用spark-csv编写单个CSV文件

我正在使用https://github.com/databricks/spark-csv,我正在尝试编写单个CSV,但不能,它正在创建一个文件夹.

需要一个Scala函数,它将获取路径和文件名等参数并写入该CSV文件.

csv scala apache-spark spark-csv

92
推荐指数
8
解决办法
17万
查看次数

使用Apache Spark将RDD写为文本文件

我正在探索Spark进行批处理.我使用独立模式在本地计算机上运行spark.

我试图使用saveTextFile()方法将Spark RDD转换为单个文件[最终输出],但它不起作用.

例如,如果我有多个分区,我们可以将一个文件作为最终输出.

更新:

我尝试了以下方法,但我得到空指针异常.

person.coalesce(1).toJavaRDD().saveAsTextFile("C://Java_All//output");
person.repartition(1).toJavaRDD().saveAsTextFile("C://Java_All//output");
Run Code Online (Sandbox Code Playgroud)

例外是:

    15/06/23 18:25:27 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/06/23 18:25:27 INFO deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
15/06/23 18:25:27 INFO deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
15/06/23 18:25:27 INFO deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
15/06/23 18:25:27 INFO deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
15/06/23 18:25:27 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.NullPointerException
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:404)
    at …
Run Code Online (Sandbox Code Playgroud)

java apache-spark apache-spark-sql

9
推荐指数
2
解决办法
3万
查看次数

为什么Spark应用程序在命令字符串中出现"IOException:(null)条目失败:null chmod 0644"?

我正在尝试使用JAVA将数据集结果写入单个CSV

dataset.write().mode(SaveMode.Overwrite).option("header",true).csv("C:\\tmp\\csvs");
Run Code Online (Sandbox Code Playgroud)

但它超时,文件没有被写入.

抛出 org.apache.spark.SparkException: Job aborted.

错误:

org.apache.spark.SparkException: Job aborted due to stage failure:

Task 0 in stage 13.0 failed 1 times, most recent failure: Lost task 0.0 in stage 13.0 (TID 16, localhost): java.io.IOException: (null) entry in command string: null chmod 0644 C:\tmp\12333333testSpark\_temporary\0\_temporary\attempt_201712282255_0013_m_000000_0\part-r-00000-229fd1b6-ffb9-4ba1-9dc9-89dfdbd0be43.csv
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:770)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:866)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:849)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:733)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:225)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:132)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVRelation.scala:200)
at …
Run Code Online (Sandbox Code Playgroud)

java apache-spark apache-spark-sql

3
推荐指数
1
解决办法
1万
查看次数

标签 统计

apache-spark ×3

apache-spark-sql ×2

java ×2

csv ×1

scala ×1

spark-csv ×1