我正在尝试在Spark任务中的DataFrame上执行foreach循环。我使用以下cmd提交我的spark任务。
spark-submit --class Hive_Cis.DataAnalyze --master local --deploy-mode client --executor-memory 1g --name DataAnalyze --conf "spark.app.id=DataAnalyze" Hive_CIS-1.0-SNAPSHOT-jar-with-dependencies.jar
Run Code Online (Sandbox Code Playgroud)
现在,我的类是spark任务的驱动程序,如下所示。
enter public class DataAnalyze {
public static void main(String[] args){
SparkConf conf = new SparkConf().setAppName("DataAnalyze").setMaster("local");
JavaSparkContext sc = new JavaSparkContext(conf);
HiveContext hiveContext = new HiveContext(sc);
DataFrame dataSrcsTable = hiveContext.sql("SELECT * FROM default.data_tables_metrics");
dataSrcsTable.show();
dataSrcsTable.foreach(new DataTableReader(hiveContext));
}//END OF MAIN
Run Code Online (Sandbox Code Playgroud)
那么扩展AbstractFunction1的类是:
public class DataTableReader extends AbstractFunction1 implements Serializable{
private HiveContext hiveConnection;
private static final long serialVersionUID = 1919222653470174456L;
public DataTableReader(HiveContext hiveData){
this.hiveConnection = hiveData;
}
@Override …Run Code Online (Sandbox Code Playgroud)