如何抑制在EMR上运行的spark-sql的INFO消息?

ron*_*nre 14 log4j emr apache-spark

在Amazon Elastic MapReduce上运行Spark和Spark SQL中描述的在EMR上运行Spark:

本教程将指导您在Amazon EMR集群上安装和运行Spark,这是一种用于大规模数据处理的快速通用引擎.您还将使用Spark SQL在Amazon S3中创建和查询数据集,并了解如何使用Amazon CloudWatch监控Amazon EMR集群上的Spark.

我试图INFO通过编辑$HOME/spark/conf/log4j.properties来抑制日志无济于事.

输出如下:

$ ./spark/bin/spark-sql
Spark assembly has been built with Hive, including Datanucleus jars on classpath
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/.versions/spark-1.1.1.e/lib/spark-assembly-1.1.1-hadoop2.4.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2014-12-14 20:59:01,819 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
2014-12-14 20:59:01,825 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1009)) - mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
Run Code Online (Sandbox Code Playgroud)

如何压制上面的INFO消息?

Pey*_*ton 17

如果您知道要禁止新EMR集群的日志记录,也可以在创建集群时添加配置选项.

EMR接受配置选项为JSON,您可以直接将其输入AWS控制台,或在使用CLI时通过文件传入.

在这种情况下,为了将日志级别更改为WARN,这里是JSON:

[
  {
    "classification": "spark-log4j",
    "properties": {"log4j.rootCategory": "WARN, console"}
  }
]
Run Code Online (Sandbox Code Playgroud)

在控制台中,您将在第一个创建步骤中添加它:

AWS控制台中的配置

或者,如果您使用CLI创建集群:

aws emr create-cluster <options here> --configurations config_file.json
Run Code Online (Sandbox Code Playgroud)

您可以在EMR文档中阅读更多内容.


ron*_*nre 13

我可以通过编辑要做到这一点$HOME/spark/conf/log4j.properties是需要的,并呼吁spark-sql--driver-java-options如下:

./spark/bin/spark-sql --driver-java-options "-Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"
Run Code Online (Sandbox Code Playgroud)

我可以通过添加-Dlog4j.debug选项来验证是否正在使用正确的文件:

./spark/bin/spark-sql --driver-java-options "-Dlog4j.debug -Dlog4j.configuration=file:///home/hadoop/spark/conf/log4j.properties"
Run Code Online (Sandbox Code Playgroud)

  • 在 Spark 3 中,日志配置文件位于 `/etc/spark/conf.dist/log4j.properties` (2认同)

Vad*_*rov 5

spark-sql --driver-java-options"-Dlog4j.configuration = file:///home/hadoop/conf/log4j.properties"

cat conf/log4j.properties

# Set everything to be logged to the console
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

# Settings to quiet third party logs that are too verbose
log4j.logger.org.eclipse.jetty=WARN
log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN
Run Code Online (Sandbox Code Playgroud)