opu*_*111 14 sql mysqli scala mysql-connector apache-spark
我确定这是一个简单的SQLContext问题,但我在Spark文档或Stackoverflow中找不到任何答案
我想从MySQL上的SQL查询创建Spark Dataframe
例如,我有一个复杂的MySQL查询
SELECT a.X,b.Y,c.Z FROM FOO as a JOIN BAR as b ON ... JOIN ZOT as c ON ... WHERE ...
Run Code Online (Sandbox Code Playgroud)
我想要一个包含X,Y和Z列的Dataframe
我想出了如何将整个表加载到Spark中,我可以将它们全部加载,然后在那里进行加入和选择.然而,这是非常低效的.我只想加载我的SQL查询生成的表.
这是我目前对代码的近似,但不起作用.Mysql-connector有一个"dbtable"选项,可用于加载整个表.我希望有一些方法来指定查询
val df = sqlContext.format("jdbc").
option("url", "jdbc:mysql://localhost:3306/local_content").
option("driver", "com.mysql.jdbc.Driver").
option("useUnicode", "true").
option("continueBatchOnError","true").
option("useSSL", "false").
option("user", "root").
option("password", "").
sql(
"""
select dl.DialogLineID, dlwim.Sequence, wi.WordRootID from Dialog as d
join DialogLine as dl on dl.DialogID=d.DialogID
join DialogLineWordInstanceMatch as dlwim o n dlwim.DialogLineID=dl.DialogLineID
join WordInstance as wi on wi.WordInstanceID=dlwim.WordInstanceID
join WordRoot as wr on wr.WordRootID=wi.WordRootID
where d.InSite=1 and dl.Active=1
limit 100
"""
).load()
Run Code Online (Sandbox Code Playgroud)
谢谢彼得
opu*_*111 34
好的,这是答案......
我在这里发现了这一点通过Spark SQL进行批量数据迁移
dbname参数可以是包含在带括号的括号中的任何查询.所以在我的情况下,我需要这样做......
val query = """
(select dl.DialogLineID, dlwim.Sequence, wi.WordRootID from Dialog as d
join DialogLine as dl on dl.DialogID=d.DialogID
join DialogLineWordInstanceMatch as dlwim on dlwim.DialogLineID=dl.DialogLineID
join WordInstance as wi on wi.WordInstanceID=dlwim.WordInstanceID
join WordRoot as wr on wr.WordRootID=wi.WordRootID
where d.InSite=1 and dl.Active=1
limit 100) foo
"""
val df = sqlContext.format("jdbc").
option("url", "jdbc:mysql://localhost:3306/local_content").
option("driver", "com.mysql.jdbc.Driver").
option("useUnicode", "true").
option("continueBatchOnError","true").
option("useSSL", "false").
option("user", "root").
option("password", "").
option("dbtable",query).
load()
Run Code Online (Sandbox Code Playgroud)
正如所料,将每个表作为自己的Dataframe加载并在Spark中加入它们效率非常低.
| 归档时间: |
|
| 查看次数: |
24521 次 |
| 最近记录: |