为什么 SparkSession.sql("set hive.support.quoted.identifiers=None") 不起作用？

Question

为什么 SparkSession.sql("set hive.support.quoted.identifiers=None") 不起作用？

yuc*_*ang 2 hive apache-spark pyspark

我想在 SparkSession.sql 中使用正则表达式，但无论我使用：

SparkSession.builder.enableHiveSupport().config("hive.support.quoted.identifiers", None)

或者

SparkSession.sql("set hive.support.quoted.identifiers=None")。

请告诉我该怎么做。

代码：

ss = (pyspark.sql.SparkSession
      .builder
      .enableHiveSupport()          
      .config("hive.support.quoted.identifiers", None)
      .getOrCreate())                                         
#ss.sql("set hive.support.quoted.identifiers=None")
ss.sql("SELECT `(col)?+.+` FROM table")

Run Code Online (Sandbox Code Playgroud)

程序结果：

pyspark.sql.utils.AnalysisException: "cannot resolve '`(col)?+.+`' given input columns: ... ...

Run Code Online (Sandbox Code Playgroud)

Answer 1

小智 5

您可以尝试启用正则表达式吗？默认情况下，此行为被禁用，因此您需要在使用 RegEx 列运行查询之前将以下属性设置为 true。

spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true")

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，5 月前
查看次数：	2469 次
最近记录：	6 年，5 月前