MrE*_*MrE 12 scala apache-spark
试图在DataFrame中删除一列,但我有一些带有点的列名,我将其转义.
在我逃避之前,我的架构看起来像这样:
root
|-- user_id: long (nullable = true)
|-- hourOfWeek: string (nullable = true)
|-- observed: string (nullable = true)
|-- raw.hourOfDay: long (nullable = true)
|-- raw.minOfDay: long (nullable = true)
|-- raw.dayOfWeek: long (nullable = true)
|-- raw.sensor2: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)
如果我尝试删除列,我得到:
df = df.drop("hourOfWeek")
org.apache.spark.sql.AnalysisException: cannot resolve 'raw.hourOfDay' given input columns raw.dayOfWeek, raw.sensor2, observed, raw.hourOfDay, hourOfWeek, raw.minOfDay, user_id;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:57)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:319)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:53)
Run Code Online (Sandbox Code Playgroud)
请注意,我甚至没有试图在名称上删除带有点的列.因为在没有转义列名的情况下似乎无法做很多事情,所以我将模式转换为:
root
|-- user_id: long (nullable = true)
|-- hourOfWeek: string (nullable = true)
|-- observed: string (nullable = true)
|-- `raw.hourOfDay`: long (nullable = true)
|-- `raw.minOfDay`: long (nullable = true)
|-- `raw.dayOfWeek`: long (nullable = true)
|-- `raw.sensor2`: long (nullable = true)
Run Code Online (Sandbox Code Playgroud)
但这似乎没有帮助.我仍然得到同样的错误.
我尝试转义所有列名称,并使用转义名称删除,但这也不起作用.
root
|-- `user_id`: long (nullable = true)
|-- `hourOfWeek`: string (nullable = true)
|-- `observed`: string (nullable = true)
|-- `raw.hourOfDay`: long (nullable = true)
|-- `raw.minOfDay`: long (nullable = true)
|-- `raw.dayOfWeek`: long (nullable = true)
|-- `raw.sensor2`: long (nullable = true)
df.drop("`hourOfWeek`")
org.apache.spark.sql.AnalysisException: cannot resolve 'user_id' given input columns `user_id`, `raw.dayOfWeek`, `observed`, `raw.minOfDay`, `raw.hourOfDay`, `raw.sensor2`, `hourOfWeek`;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:60)
Run Code Online (Sandbox Code Playgroud)
是否有另一种方法可以删除在此类数据上不会失败的列?
小智 5
val data = df.drop("Customers");
Run Code Online (Sandbox Code Playgroud)
对于普通列可以正常工作
val new = df.drop(df.col("old.column"));
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
41808 次 |
最近记录: |