Apache Spark中的Printschema（）

Question

Apache Spark中的Printschema（）

rus*_*hak 0 apache-spark spark-dataframe apache-spark-dataset

Dataset<Tweet> ds = sc.read().json("/path").as(Encoders.bean(Tweet.class));



Tweet class :-
long id
string user;
string text;


ds.printSchema();

Run Code Online (Sandbox Code Playgroud)

输出：-

root
  |-- id: string (nullable = true)
  |-- text: string (nullable = true)  
  |-- user: string (nullable = true)

Run Code Online (Sandbox Code Playgroud)

json文件具有字符串类型的所有参数

我的问题是接受输入并将其编码为Tweet.class。模式中为id指定的数据类型为Long，但在打印模式时将其强制转换为String。

它是否为printscheme a / c提供了读取文件的方式或根据我们所做的编码（此处为Tweet.class）？

Answer 1

Sat*_*uri 5

我不知道您的代码无法正常工作的确切原因，但是如果您想更改字段类型，可以编写customSchema。

val schema =  StructType(List
                        (
                          StructField("id", LongType, nullable = true),
                          StructField("text", StringType, nullable = true),
                          StructField("user", StringType, nullable = true)
                        )))

Run Code Online (Sandbox Code Playgroud)

您可以按以下方式将架构应用于数据框：

Dataset<Tweet> ds = sc.read().schema(schema).json("/path")

ds.printSchema()

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，6 月前
查看次数：	17968 次
最近记录：	7 年，6 月前