如何使用java从Spark中的List或Array创建Row

use*_*706 6 java apache-spark apache-spark-mllib

在Java中,我使用RowFactory.create()来创建一个Row:

Row row = RowFactory.create(record.getLong(1), record.getInt(2), record.getString(3));
Run Code Online (Sandbox Code Playgroud)

其中"record"是来自数据库的记录,但我不能提前知道"记录"的长度,所以我想使用List或Array来创建"row".在Scala中,我可以使用Row.fromSeq()从List或Array创建一个Row,但是如何在Java中实现它?

aba*_*hel 10

我不确定我是否正确地得到了您的问题但您可以使用RowFactory在java中从ArrayList创建Row.

List<MyData> mlist = new ArrayList<MyData>();
    mlist.add(d1);
    mlist.add(d2);

Row row = RowFactory.create(mlist.toArray());   
Run Code Online (Sandbox Code Playgroud)


And*_*der 7

我们经常需要在实际应用程序中创建数据集或数据框。这是如何在Java应用程序中创建行和数据集的示例:

// initialize first SQLContext
SQLContext sqlContext = ... 
StructType schemata = DataTypes.createStructType(
        new StructField[]{
                createStructField("NAME", StringType, false),
                createStructField("STRING_VALUE", StringType, false),
                createStructField("NUM_VALUE", IntegerType, false),
        });
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
Run Code Online (Sandbox Code Playgroud)
// initialize first SQLContext
SQLContext sqlContext = ... 
StructType schemata = DataTypes.createStructType(
        new StructField[]{
                createStructField("NAME", StringType, false),
                createStructField("STRING_VALUE", StringType, false),
                createStructField("NUM_VALUE", IntegerType, false),
        });
Row r1 = RowFactory.create("name1", "value1", 1);
Row r2 = RowFactory.create("name2", "value2", 2);
List<Row> rowList = ImmutableList.of(r1, r2);
Dataset<Row> data = sqlContext.createDataFrame(rowList, schemata);
Run Code Online (Sandbox Code Playgroud)