如何在Spark中处理数据框列名称中的空格

ben*_*ben 3 apache-spark apache-spark-sql pyspark

我从df中注册了一个tmp表,该df列标题中有空格。如何在通过sqlContext使用sql查询时提取列。我试图使用反勾号,但是它不起作用

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)
Run Code Online (Sandbox Code Playgroud)

him*_*ian 5

您只需要在反引号中放置列名,而不必使用别名:

没有别名

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")
Run Code Online (Sandbox Code Playgroud)

与别名

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")
Run Code Online (Sandbox Code Playgroud)


Rak*_*mar 5

查询中有问题,更正后的查询如下(在 `` 中包装为 Z_Score):-

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)
Run Code Online (Sandbox Code Playgroud)

还有一个替补:-

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()
Run Code Online (Sandbox Code Playgroud)