如何将printSchema的结果保存到PySpark中的文件中

Question

如何将printSchema的结果保存到PySpark中的文件中

我df.printSchema()在 pyspark 中使用过，它为我提供了树结构的模式。现在我需要将它保存在变量或文本文件中。

我尝试了以下保存方法，但没有奏效。

v = str(df.printSchema())  
print(v) 
#and
df.printSchema().saveAsTextFile(<path>)

Run Code Online (Sandbox Code Playgroud)

我需要以下格式的保存模式

|-- COVERSHEET: struct (nullable = true)                              
 |    |-- ADDRESSES: struct (nullable = true)
 |    |    |-- ADDRESS: struct (nullable = true)
 |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |-- _city: string (nullable = true)
 |    |    |    |-- _primary: long (nullable = true)
 |    |    |    |-- _state: string (nullable = true)
 |    |    |    |-- _street: string (nullable = true)
 |    |    |    |-- _type: string (nullable = true)
 |    |    |    |-- _zip: long (nullable = true)
 |    |-- CONTACTS: struct (nullable = true)
 |    |    |-- CONTACT: array (nullable = true)
 |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |    |-- _name: string (nullable = true)
 |    |    |    |    |-- _type: string (nullable = true)

Run Code Online (Sandbox Code Playgroud)

Answer 1

phi*_*ert 8

你需要treeString（出于某种原因，我在 python API 中找不到）

#v will be a string
v = df._jdf.schema().treeString()

Run Code Online (Sandbox Code Playgroud)

您可以将其转换为 RDD 并使用 saveAsTextFile

sc.parallelize([v]).saveAsTextFile(...)

Run Code Online (Sandbox Code Playgroud)

或者使用 Python 特定的 API 将字符串写入文件。

归档时间：	8 年前
查看次数：	8117 次
最近记录：	5 年，4 月前