如何为独立群集非hdfs模式启用火花历史记录服务器

Ros*_*hta 2 apache-spark pyspark

我在独立模式下按照http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/设置了Spark2.1.1集群(1个主设备2个从设备)。我的机器上没有Hadoop之前的设置。我想启动火花历史记录服务器。我按如下方式运行:

roshan@bolt:~/spark/spark_home/sbin$ ./start-history-server.sh
Run Code Online (Sandbox Code Playgroud)

在spark-defaults.conf中,我将其设置为:

spark.eventLog.enabled           true
Run Code Online (Sandbox Code Playgroud)

但是它失败并显示以下错误:

7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(roshan); groups with view permissions: Set(); users  with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
    at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
    at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)
Run Code Online (Sandbox Code Playgroud)

我应该怎么设置成spark.history.fs.logDirectoryspark.eventLog.dir

更新1:

spark.eventLog.enabled           true
spark.history.fs.logDirectory   file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir               file:////home/roshan/spark/spark_home/logs
Run Code Online (Sandbox Code Playgroud)

但我总是收到此错误:

java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co
Run Code Online (Sandbox Code Playgroud)

Ram*_*jan 6

默认情况下,spark定义file:/tmp/spark-events为历史记录服务器的日志目录,并且您的日志明确指出未配置spark.history.fs.logDirectory

首先,您需要在/ tmp中创建spark-events文件夹(这不是一个好主意,因为每次重新启动计算机时都会刷新/ tmp),然后在spark-defaults.conf中添加spark.history.fs.logDirectory指向到该目录。但是我建议您创建另一个文件夹,该文件夹可以使spark用户有权访问并更新spark-defaults.conf文件。

您需要在spark-defaults.conf文件中定义另外两个变量

spark.eventLog.dir              file:path to where you want to store your logs
spark.history.fs.logDirectory   file:same path as above
Run Code Online (Sandbox Code Playgroud)

假设你想存储的/ opt /火花事件,其中火花用户访问,然后在上面的参数火花defaults.conf

spark.eventLog.enabled          true
spark.eventLog.dir              file:/opt/spark-events
spark.history.fs.logDirectory   file:/opt/spark-events
Run Code Online (Sandbox Code Playgroud)

您可以在监控和仪器中找到更多信息

  • 尝试重新启动计算机,可以吗? (2认同)