我在pyspark(2.1.0)中有一个SparkDataFrame,我希望只获取数字列的名称或仅获取字符串列.
例如,这是我的DF的架构:
root
|-- Gender: string (nullable = true)
|-- SeniorCitizen: string (nullable = true)
|-- MonthlyCharges: double (nullable = true)
|-- TotalCharges: double (nullable = true)
|-- Churn: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
这就是我需要的:
num_cols = [MonthlyCharges, TotalCharges]
str_cols = [Gender, SeniorCitizen, Churn]
Run Code Online (Sandbox Code Playgroud)
我该怎么做?谢谢!