小编Ahi*_*ito的帖子

如何将printSchema的结果保存到PySpark中的文件中

我df.printSchema()在 pyspark 中使用过，它为我提供了树结构的模式。现在我需要将它保存在变量或文本文件中。

我尝试了以下保存方法，但没有奏效。

v = str(df.printSchema())  
print(v) 
#and
df.printSchema().saveAsTextFile(<path>)

Run Code Online (Sandbox Code Playgroud)

我需要以下格式的保存模式

|-- COVERSHEET: struct (nullable = true)                              
 |    |-- ADDRESSES: struct (nullable = true)
 |    |    |-- ADDRESS: struct (nullable = true)
 |    |    |    |-- _VALUE: string (nullable = true)
 |    |    |    |-- _city: string (nullable = true)
 |    |    |    |-- _primary: long (nullable = true)
 |    |    |    |-- _state: string (nullable = true)
 |    |    |    |-- _street: string (nullable = true)
 | …

Run Code Online (Sandbox Code Playgroud)

python apache-spark pyspark

Ahi*_*ito

2018 06-12

7
推荐指数

1
解决办法

8117
查看次数

如何在数据框中投射一列？

我正在从 hbase 获取数据并将其转换为数据帧。现在，我在数据框中有一列是string数据类型。但我需要将其数据类型转换为Int.

尝试了下面的代码，但它给我一个错误

df.withColumn("order", 'order.cast(int)')

Run Code Online (Sandbox Code Playgroud)

我面临的错误如下

error:col should be column

Run Code Online (Sandbox Code Playgroud)

我在这里给出了正确的列名，我需要在 pyspark 中更改上述代码的语法吗？

dataframe apache-spark apache-spark-sql pyspark

Ahi*_*ito

lucky-day

-2
推荐指数

1
解决办法

3万
查看次数

标签统计

apache-spark ×2

pyspark ×2

apache-spark-sql ×1

dataframe ×1

python ×1

如何将printSchema的结果保存到PySpark中的文件中

如何在数据框中投射一列？

标签 统计

小编Ahi_ito的帖子

标签统计