我正在尝试在 Hive 中执行以下查询:
SELECT
regexp_replace('2016-08-05_11:29:46', '\\_', ' ') as tmstmp,
distinct(P.name)
FROM table P;
Run Code Online (Sandbox Code Playgroud)
它抛出一个异常,说无法识别选择目标中“不同”“(”“P”附近的输入。
当我运行交换列的查询时,如下所示:
SELECT
distinct(P.name),
regexp_replace('2016-08-05_11:29:46', '\\_', ' ') as tmstmp
FROM table P;
Run Code Online (Sandbox Code Playgroud)
效果很好。对这个问题有什么想法吗?
我最近遇到一个场景,需要从目录中读取HDFS的输入
/user/project/jsonFile
Run Code Online (Sandbox Code Playgroud)
并将结果写回同一目录:
/user/project/jsonFile
Run Code Online (Sandbox Code Playgroud)
读取 jsonFile 后,执行多个连接,并将结果写入 /user/project/jsonFile 使用:
result.write().mode(SaveMode.Overwrite).json("/user/project/jsonFile");
Run Code Online (Sandbox Code Playgroud)
下面是我看到的错误:
[task-result-getter-0]o.a.s.s.TaskSetManager: Lost task 10.0 in stage 7.0 (TID 2508, hddev1db015dxc1.dev.oclc.org, executor 3): java.io.FileNotFoundException: File does not exist: /user/project/jsonFile
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77)
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
Run Code Online (Sandbox Code Playgroud)
为什么它会抛出java.io.FileNotFoundException: File does not exist? result包含写回 HDFS 的联接输出的数据集,一旦result数据集可用,spark …
如何使用 java.sql.Timestamp; 获取当前时间戳 - x 周数;
这是我当前的时间戳Timestamp.from(Instant.now(clock));
x- 可以是 0-5 之间的任意数字