Dik*_*owa 5 foreach log4j apache-spark spark-streaming
我正在尝试记录火花流的输出,如下面的代码所示
dStream.foreachRDD { rdd =>
if (rdd.count() > 0) {
@transient lazy val log = Logger.getLogger(getClass.getName)
log.info("Process Starting")
rdd.foreach { item =>
log.info("Output :: "+item._1 + "," + item._2 + "," + System.currentTimeMillis())
}
}
Run Code Online (Sandbox Code Playgroud)
使用以下命令在纱线集群上执行代码
./bin/spark-submit --class "StreamingApp" --files file:/home/user/log4j.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/home/user/log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/home/user/log4j.properties" --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 /home/user/Abc.jar
Run Code Online (Sandbox Code Playgroud)
当我查看yarn集群的日志时,我可以找到在foreach之前写入的日志log.info("Process Starting"),但foreach内部的日志没有打印。
我还尝试创建一个单独的可序列化类,如下所示
object LoggerObj extends Serializable{
@transient lazy val log = Logger.getLogger(getClass.getName)
}
Run Code Online (Sandbox Code Playgroud)
并在 foreach 内部使用相同的内容,如下所示
dStream.foreachRDD { rdd =>
if (rdd.count() > 0) {
LoggerObj.log.info("Process Starting")
rdd.foreach { item =>
LoggerObj.log.info("Output :: "+item._1 + "," + item._2 + "," + System.currentTimeMillis())
}
}
Run Code Online (Sandbox Code Playgroud)
但仍然是同样的问题,只打印 foreach 之外的日志。
log4j.properties 如下
log4j.rootLogger=INFO,stdout,FILE
log4j.rootCategory=INFO,FILE
log4j.appender.file=org.apache.log4j.FileAppender
log4j.appender.FILE=org.apache.log4j.RollingFileAppender
log4j.appender.FILE.File=/tmp/Rt.log
log4j.appender.FILE.ImmediateFlush=true
log4j.appender.FILE.Threshold=debug
log4j.appender.FILE.Append=true
log4j.appender.FILE.MaxFileSize=500MB
log4j.appender.FILE.MaxBackupIndex=10
log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
log4j.appender.FILE.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.Holder=INFO,FILE
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
2007 次 |
| 最近记录: |