假设这个代码:
public static Dataset<Row> getData(SparkSession sparkSession,
StructType schema, String delimiter, String pathToData) {
final Dataset<Row> dataset = sparkSession
.read()
.option("delimiter", "\\t")
.csv(pathToData);
StructType nSchema= newSchema(schema, schema.size(), dataset.columns().length);
...
}
Run Code Online (Sandbox Code Playgroud)
在将变量提供给 newSchema 方法之前声明变量并使它们成为 final 是最佳实践吗?
public static Dataset<Row> getData(SparkSession sparkSession,
StructType schema, String delimiter, String pathToData) {
final Dataset<Row> dataset = sparkSession
.read()
.option("delimiter", "\\t")
.csv(pathToData);
final int dataSize = dataset.columns().length;
final int schemaSize = schema.size();
StructType nSchema = newSchema(schema, schemaSize, dataSize);
...
}
Run Code Online (Sandbox Code Playgroud)
谢谢