小编van*_*d39的帖子

unix_timestamp()可以在Apache Spark中以毫秒为单位返回unix时间吗？

我试图从时间戳字段获取unix时间,以毫秒(13位数)为单位,但目前它以秒为单位返回(10位数).

scala> var df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.123", "2017-01-18 11:00:00.882", "2017-01-18 11:00:02.432").toDF()
df: org.apache.spark.sql.DataFrame = [value: string]

scala> df = df.selectExpr("value timeString", "cast(value as timestamp) time")
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp]


scala> df = df.withColumn("unix_time", unix_timestamp(df("time")))
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp ... 1 more field]

scala> df.take(4)
res63: Array[org.apache.spark.sql.Row] = Array(
[2017-01-18 11:00:00.000,2017-01-18 11:00:00.0,1484758800], 
[2017-01-18 11:00:00.123,2017-01-18 11:00:00.123,1484758800], 
[2017-01-18 11:00:00.882,2017-01-18 11:00:00.882,1484758800], 
[2017-01-18 11:00:02.432,2017-01-18 11:00:02.432,1484758802])

Run Code Online (Sandbox Code Playgroud)

即使2017-01-18 11:00:00.123并且2017-01-18 11:00:00.000不同,我也会得到相同的unix时间1484758800

我错过了什么？

timestamp unix-timestamp apache-spark

van*_*d39

lucky-day

14
推荐指数

4
解决办法

6684
查看次数

在Apache Spark中读取漂亮的打印json文件

我的S3存储桶中有很多json文件,我希望能够读取它们并查询这些文件.问题是它们印刷得很漂亮.一个json文件只有一个庞大的字典,但它不在一行中.根据这个线程,json文件中的字典应该在一行中,这是Apache Spark的限制.我没有这样的结构.

我的JSON架构看起来像这样 -

{
    "dataset": [
        {
            "key1": [
                {
                    "range": "range1", 
                    "value": 0.0
                }, 
                {
                    "range": "range2", 
                    "value": 0.23
                }
             ]
        }, {..}, {..}
    ],
    "last_refreshed_time": "2016/09/08 15:05:31"
}

Run Code Online (Sandbox Code Playgroud)

这是我的问题 -

我可以避免转换这些文件以匹配Apache Spark所需的架构(文件中每行一个字典)并仍能读取它吗？
如果没有,在Python中最好的方法是什么？我每天都有一堆这些文件.存储桶按日分区.
有没有其他工具更适合查询Apache Spark以外的这些文件？我在AWS堆栈上,所以可以尝试使用Zeppelin笔记本的任何其他建议工具.

python json amazon-s3 apache-spark

van*_*d39

2017 05-23

6
推荐指数

1
解决办法

850
查看次数

Python cx_Oracle 中 Oracle Prepared Statement 的 IN 子句

我想在 Python 中使用 cx_Oracle 将 IN 子句与准备好的 Oracle 语句一起使用。

例如查询 - select name from employee where id in ('101', '102', '103')

在 python 方面，我有一个列表[101, 102, 103]，我将其转换为这样的字符串('101', '102', '103')并在 python 中使用以下代码 -

import cx_Oracle
ids = [101, 102, 103]
ALL_IDS = "('{0}')".format("','".join(map(str, ids)))
conn = cx_Oracle.connect('username', 'pass', 'schema')
cursor = conn.cursor()
results = cursor.execute('select name from employee where id in :id_list', id_list=ALL_IDS)
names = [x[0] for x in cursor.description]
rows = results.fetchall()

Run Code Online (Sandbox Code Playgroud)

这不起作用。难道我做错了什么？

python oracle cx-oracle prepared-statement

van*_*d39

lucky-day

5
推荐指数

1
解决办法

4554
查看次数

读取 json 时解释 Spark 中的时间戳字段

我正在尝试读取一个漂亮的打印 json，其中包含时间字段。我想在读取 json 本身时将时间戳列解释为时间戳字段。但是，当我时它仍然将它们读取为字符串printSchema

例如输入 json 文件 -

[{
    "time_field" : "2017-09-30 04:53:39.412496Z"
}]

Run Code Online (Sandbox Code Playgroud)

代码 -

df = spark.read.option("multiLine", "true").option("timestampFormat","yyyy-MM-dd HH:mm:ss.SSSSSS'Z'").json('path_to_json_file')

Run Code Online (Sandbox Code Playgroud)

输出df.printSchema()-

root
 |-- time_field: string (nullable = true)

Run Code Online (Sandbox Code Playgroud)

我在这里缺少什么？

json timestamp apache-spark

van*_*d39

lucky-day

5
推荐指数

1
解决办法

4119
查看次数

SQL中的Zeppelin动态表单下拉值

我的齐柏林飞艇笔记本中有一个下拉菜单元素

val instrument = z.select("Select Item", Seq(("A", "1"),("B", "2"),("C", "3")))

我想instrument在我的SQL中使用此变量的值。例如，我在笔记本中的下一段包含

%sql select * from table_name where item='<<instrument selected above>>'

这可能吗？如果是，语法会是什么样？

dynamic-forms apache-spark apache-spark-sql apache-zeppelin

van*_*d39

lucky-day

3
推荐指数

1
解决办法

7057
查看次数

为SBT项目添加maven依赖

我试图在我的build.sbt文件中引用 Maven 项目依赖项。我知道我需要resolver向我的文件添加附加内容，因为该项目托管在内部工件存储中

build.sbt

libraryDependencies ++= Seq(
  "org.apache.spark" %% "spark-core" % "2.4.4",
  "org.apache.spark" %% "spark-sql" % "2.4.4",
  "com.<companyname>" %% "<libraryname>" % "2.3.0"
)

resolvers += "<library name>" at "http://artifactory.<internal url>.io:80/dsc-mvn"

Run Code Online (Sandbox Code Playgroud)

然而，事实证明，SBT 最终会搜索附加了_2.11版本的路径。这是我在 IntelliJ 中看到的错误消息

[info] Loading settings for project sbt-demo from build.sbt ...
[warn] Discarding 1 session setting.  Use 'session save' to persist session settings.
[info] Set current project to SparkExample (in build file:<project_path>)
[info] Defining Global / sbtStructureOptions
[info] The new value will …

Run Code Online (Sandbox Code Playgroud)

scala maven sbt sbt-plugin sbt-revolver

van*_*d39

lucky-day

2
推荐指数

1
解决办法

6368
查看次数

Oracle PL SQL 中的 ACCEPT 语句

我是 PL SQL 的初学者。

我正在尝试使用 PL SQL ACCEPT 语句接受变量中的字符串。这是代码 -

ACCEPT lastname CHAR FORMAT 'A20' PROMPT 'Enter employee lastname:  '
DECLARE
BEGIN

DBMS_OUTPUT.PUT_LINE(lastname);  
END;

Run Code Online (Sandbox Code Playgroud)

我在 SQL Developer 中没有收到任何错误或输出。我无法理解我在这里错过了什么。

基本上，我想要做的是从用户那里读取一个值（字符串）并在我对表的查询中使用它。

oracle sqlplus oracle-sqldeveloper

van*_*d39

2021 01-20

1
推荐指数

1
解决办法

1万
查看次数

标签统计

apache-spark ×4

json ×2

oracle ×2

python ×2

timestamp ×2

amazon-s3 ×1

apache-spark-sql ×1

apache-zeppelin ×1

cx-oracle ×1

dynamic-forms ×1

maven ×1

oracle-sqldeveloper ×1

prepared-statement ×1

sbt ×1

sbt-plugin ×1

sbt-revolver ×1

scala ×1

sqlplus ×1

unix-timestamp ×1

标签 统计

小编van_d39的帖子

标签统计