小编no1*_*3ff的帖子

我的spark sql限制非常慢

我使用spark从elasticsearch读取。

select col from index limit 10;

Run Code Online (Sandbox Code Playgroud)

问题在于索引非常大，包含1000亿行，而spark会生成数千个任务来完成工作。
我只需要10行，即使1个任务也可以返回10行就可以完成工作了，我不需要那么多任务。
极限甚至是极限1都很慢。
代码？

sql = select col from index limit 10
sqlExecListener.sparkSession.sql(sql).createOrReplaceTempView(tempTable)

Run Code Online (Sandbox Code Playgroud)

elasticsearch apache-spark apache-spark-sql spark-submit

no1*_*3ff

2017 11-30

6
推荐指数

1
解决办法

840
查看次数

在纱线上使用火花时火花执行器和纱线容器是什么关系

在纱线上使用火花时火花执行器和纱线容器是什么关系？
比如我设置executor-memory=20G，yarn容器内存=10G，1个executor包含2个容器吗？

hadoop-yarn apache-spark

no1*_*3ff

lucky-day

5
推荐指数

2
解决办法

5936
查看次数

如何使用array_contains和ElasticSearch数据源进行谓词下推？

我正在尝试在ElasticSearch中查询数组

data: "names":[{"name":"allen"},{"name":"bill"},{"name":"dave"},{"name":"poter"}]
goal: "select names from table where array_contains(names.name, "bill")"

Run Code Online (Sandbox Code Playgroud)

但如果SQL语句使用array_contains函数,spark不会执行谓词下推.
hint: names.name = ["allen","bill","dave","poter"]
我试过了

select * from table where array_contains(names.name,"bill") 
-- and  
select explode(names.name) as name from table as t1;select * from t1 where name = "bill" 
-- and  
select * from table where cast(names.name as string) like '%bill%'

Run Code Online (Sandbox Code Playgroud)

所有未能做下推,其他任何方式做到这一点？

elasticsearch apache-spark apache-spark-sql

no1*_*3ff

2018 07-21

4
推荐指数

1
解决办法

414
查看次数