JDe*_*Dev 5 apache-spark kubernetes
通过提交火花,我在Kubernetes集群上启动了应用程序。而且只有在访问http:// driver-pod:port时,我才能看到Spark-UI 。
如何在集群上启动Spark-UI History Server?如何使所有正在运行的Spark作业都在Spark-UI历史记录服务器上注册。
这可能吗?
对的,这是可能的。简而言之,您需要确保以下几点:
filesystem,s3,hdfs等)。现在 spark(默认情况下)仅从filesystem路径中读取,因此我将使用spark 运算符详细说明这种情况:
PVC具有支持ReadWriteMany模式的卷类型。例如NFS音量。以下代码段假设您已经为NFS( nfs-volume) 配置了存储类:apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: spark-pvc
namespace: spark-apps
spec:
accessModes:
- ReadWriteMany
volumeMode: Filesystem
resources:
requests:
storage: 5Gi
storageClassName: nfs-volume
Run Code Online (Sandbox Code Playgroud)
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
Run Code Online (Sandbox Code Playgroud)
---
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
name: spark-java-pi
namespace: spark-apps
spec:
type: Java
mode: cluster
image: gcr.io/spark-operator/spark:v2.4.4
mainClass: org.apache.spark.examples.SparkPi
mainApplicationFile: "local:///opt/spark/examples/jars/spark-examples_2.11-2.4.4.jar"
imagePullPolicy: Always
sparkVersion: 2.4.4
sparkConf:
"spark.eventLog.enabled": "true"
"spark.eventLog.dir": "file:/mnt"
restartPolicy:
type: Never
volumes:
- name: spark-data
persistentVolumeClaim:
claimName: spark-pvc
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 2.4.4
serviceAccount: spark
volumeMounts:
- name: spark-data
mountPath: /mnt
executor:
cores: 1
instances: 1
memory: "512m"
labels:
version: 2.4.4
volumeMounts:
- name: spark-data
mountPath: /mnt
Run Code Online (Sandbox Code Playgroud)
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: spark-history-server
namespace: spark-apps
spec:
replicas: 1
template:
metadata:
name: spark-history-server
labels:
app: spark-history-server
spec:
containers:
- name: spark-history-server
image: gcr.io/spark-operator/spark:v2.4.0
resources:
requests:
memory: "512Mi"
cpu: "100m"
command:
- /sbin/tini
- -s
- --
- /opt/spark/bin/spark-class
- -Dspark.history.fs.logDirectory=/data/
- org.apache.spark.deploy.history.HistoryServer
ports:
- name: http
protocol: TCP
containerPort: 18080
readinessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
livenessProbe:
timeoutSeconds: 4
httpGet:
path: /
port: http
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: spark-pvc
readOnly: true
Run Code Online (Sandbox Code Playgroud)
您也可以使用 Google Cloud Storage、Azrue Blob Storage 或 AWS S3 作为事件日志位置。为此,您需要安装一些额外的东西,jars因此我建议您查看 lightbend spark history server image and charts。
| 归档时间: |
|
| 查看次数: |
376 次 |
| 最近记录: |