Apache Spark Stderr和Stdout

use*_*843 9 apache-spark

我通过连接到一个拥有一个主服务器和两个从服务器的spark独立集群来运行spark-1.0.0.我通过Spark-submit运行wordcount.py,实际上它从HDFS读取数据并将结果写入HDFS.到目前为止一切都很好,结果将正确写入HDFS.但令我担心的是,当我为每个工人检查Stdout时,它是空的我不知道它是否是空的?我在stderr中得到了关注:

某些的stderr日志页面(app-20140704174955-0002)

Spark 
Executor Command: "java" "-cp" "::
/usr/local/spark-1.0.0/conf:
/usr/local/spark-1.0.0
/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.2.1.jar:/usr/local/hadoop/conf" "
-XX:MaxPermSize=128m" "-Xms512M" "-Xmx512M" "org.apache.spark.executor.CoarseGrainedExecutorBackend
" "akka.tcp://spark@master:54477/user/CoarseGrainedScheduler" "0" "slave2" "1
" "akka.tcp://sparkWorker@slave2:41483/user/Worker" "app-20140704174955-0002"
========================================


14/07/04 17:50:14 ERROR CoarseGrainedExecutorBackend: 
Driver Disassociated [akka.tcp://sparkExecutor@slave2:33758] -> 
[akka.tcp://spark@master:54477] disassociated! Shutting down.
Run Code Online (Sandbox Code Playgroud)

sam*_*est 8

Spark总是写一切,甚至INFO到stderr.人们似乎这样做是为了阻止stdout缓冲消息并导致更少的可预测日志记录.当已知应用程序永远不会用于bash脚本时,这是一种可接受的做法,因此对于日志记录尤其常见.

  • @samthebest - 你是说所有Spark输出到stderr?我在我的火花地图功能中做了一个简单的"print()",当我查看来自我的奴隶机器的日志文件时,在work/app- <APPNUMBER>/0 /下,我看到stderr中的打印输出但不是stdout.我的标准输出是空的.我发现这很奇怪 - 如果它总是空的,那么stdout是什么意思? (2认同)

小智 6

在传递给Spark的log4j.properties中尝试这个(或修改Spark/conf下的默认配置)

# Log to stdout and stderr
log4j.rootLogger=INFO, stdout, stderr

# Send TRACE - INFO level to stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=TRACE
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.filter.filter1=org.apache.log4j.varia.LevelRangeFilter
log4j.appender.stdout.filter.filter1.levelMin=TRACE
log4j.appender.stdout.filter.filter1.levelMax=INFO
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Send WARN or higher to stderr
log4j.appender.stderr=org.apache.log4j.ConsoleAppender
log4j.appender.stderr.Threshold=WARN
log4j.appender.stderr.Target  =System.err
log4j.appender.stderr.layout=org.apache.log4j.PatternLayout
log4j.appender.stderr.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

# Change this to set Spark log level
log4j.logger.org.apache.spark=WARN
log4j.logger.org.apache.spark.util=ERROR
Run Code Online (Sandbox Code Playgroud)

此外,INFO级别显示的进度条将发送到stderr.

禁用

spark.ui.showConsoleProgress=false
Run Code Online (Sandbox Code Playgroud)