相关疑难解决方法(0)

java.lang.OutOfMemoryError:无法获取100个字节的内存,得到0

我使用以下命令在本地模式下使用Spark 2.0调用Pyspark:

pyspark --executor-memory 4g --driver-memory 4g
Run Code Online (Sandbox Code Playgroud)

输入数据帧正在从tsv文件中读取,并具有5​​80 K x 28列.我正在对数据帧进行一些操作,然后我尝试将其导出到tsv文件,我收到此错误.

df.coalesce(1).write.save("sample.tsv",format = "csv",header = 'true', delimiter = '\t')
Run Code Online (Sandbox Code Playgroud)

任何指针如何摆脱这个错误.我可以轻松显示df或计算行数.

输出数据帧为3100行,共23列

错误:

Job aborted due to stage failure: Task 0 in stage 70.0 failed 1 times, most recent failure: Lost task 0.0 in stage 70.0 (TID 1073, localhost): org.apache.spark.SparkException: Task failed while writing rows
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    at org.apache.spark.scheduler.Task.run(Task.scala:85)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Unable to acquire 100 bytes …
Run Code Online (Sandbox Code Playgroud)

python memory hadoop apache-spark pyspark

16
推荐指数
4
解决办法
1万
查看次数

会话在 AWS EMR 集群中不活跃 Pyspark

我打开了一个 AWS EMR 集群,并在 pyspark3 jupyter notebook 中运行了以下代码:

"..
textRdd = sparkDF.select(textColName).rdd.flatMap(lambda x: x)
textRdd.collect().show()
.."
Run Code Online (Sandbox Code Playgroud)

我收到此错误:

An error was encountered:
Invalid status code '400' from http://..../sessions/4/statements/7 with error payload: {"msg":"requirement failed: Session isn't active."}
Run Code Online (Sandbox Code Playgroud)

运行线路:

sparkDF.show()
Run Code Online (Sandbox Code Playgroud)

作品!

我还创建了该文件的一小部分,并且我的所有代码都运行良好。

问题是什么?

amazon-emr pyspark

14
推荐指数
3
解决办法
8667
查看次数

标签 统计

pyspark ×2

amazon-emr ×1

apache-spark ×1

hadoop ×1

memory ×1

python ×1