Small question regarding how to get the Spark executor ID in a Apache Spark job please.
I have a very straightforward piece of code:
final Dataset<Row> rowDataSet = sparkSession.read()[...].load();
final Dataset<String> stringDataSet = rowDataSet
.map(
(MapFunction<Row, String>) row ->
doSomeTransformationFromRowToStringUsingSparkExecutorID(row, SparkEnv.executorId()), Encoders.STRING()
);
stringDataSet.show();
Run Code Online (Sandbox Code Playgroud)
And the question is regarding the doSomeTransformationFromRowToStringUsingSparkExecutorID method.
This method needs the spark executor ID the row is being processed, in order to do some transformation.
As I need the Spark executor ID, I went to use SparkEnv.executorId() found in the official documentation.
Unfortunately, above is not working, as Non-static method 'executorId()' cannot be referenced from a static context
Is SparkEnv.executorId() even the good way to get the executor ID in this scenario?
If yes, how to workaround this Non-static method 'executorId()' issue please?
If not, what is the best alternative to get the executor ID please?
Thank you
根据文档, SparkEnv类有获取实例的static方法。在这个实例上,您可以调用方法(不是静态的):getSparkEnvexecutorId()
import org.apache.spark.SparkEnv
SparkEnv sparkEnv = SparkEnv.get();
String executorId = sparkEnv.executorId();
Run Code Online (Sandbox Code Playgroud)
final Dataset<Row> rowDataSet = sparkSession.read()[...].load();
final Dataset<String> stringDataSet = rowDataSet
.map(
(MapFunction<Row, String>) row ->
doSomeTransformationFromRowToStringUsingSparkExecutorID(row, SparkEnv.get().executorId()), Encoders.STRING()
);
stringDataSet.show();
Run Code Online (Sandbox Code Playgroud)