Pra*_*wal 8 hive apache-spark apache-spark-sql spark-structured-streaming
使用Apache Spark 2.2:Structured Streaming,我正在创建一个程序,它从Kafka读取数据并将其写入Hive.我正在寻找写入Kafka主题@100记录/秒的批量数据.
蜂巢表创建:
CREATE TABLE demo_user( timeaa BIGINT, numberbb INT, decimalcc DOUBLE, stringdd STRING, booleanee BOOLEAN ) STORED AS ORC ;
Run Code Online (Sandbox Code Playgroud)
通过手动Hive查询插入:
INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true);
Run Code Online (Sandbox Code Playgroud)
通过spark结构化流媒体代码插入:
SparkConf conf = new SparkConf();
conf.setAppName("testing");
conf.setMaster("local[2]");
conf.set("hive.metastore.uris", "thrift://localhost:9083");
SparkSession session =
SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();
// workaround START: code to insert static data into hive
String insertQuery = "INSERT INTO TABLE demo_user (1514133139123, 14, 26.4, 'pravin', true)";
session.sql(insertQuery);
// workaround END:
// Solution START
Dataset<Row> dataset = readFromKafka(sparkSession); // private method reading data from Kafka's 'xyz' topic
// **My question here:**
// some code which writes dataset into hive table demo_user
// Solution END
Run Code Online (Sandbox Code Playgroud)
小智 -1
使用以下内容时不需要创建配置单元表,该表会自动创建
dataset.write.jdbc(字符串 url、字符串表、java.util.Properties 连接属性)
或使用
dataset.write.saveAsTable(String 表名)
| 归档时间: |
|
| 查看次数: |
1052 次 |
| 最近记录: |