如何使用array_contains和ElasticSearch数据源进行谓词下推?

no1*_*3ff 4 elasticsearch apache-spark apache-spark-sql

我正在尝试在ElasticSearch中查询数组

data: "names":[{"name":"allen"},{"name":"bill"},{"name":"dave"},{"name":"poter"}]
goal: "select names from table where array_contains(names.name, "bill")"
Run Code Online (Sandbox Code Playgroud)

但如果SQL语句使用array_contains函数,spark不会执行谓词下推.
hint: names.name = ["allen","bill","dave","poter"]
我试过了

select * from table where array_contains(names.name,"bill") 
-- and  
select explode(names.name) as name from table as t1;select * from t1 where name = "bill" 
-- and  
select * from table where cast(names.name as string) like '%bill%'
Run Code Online (Sandbox Code Playgroud)

所有未能做下推,其他任何方式做到这一点?

小智 6

预计未能进行下推.对于要委派的谓词,您需要一个数据源支持,并且ElasticSearch连接器不会array_contains推送的操作中列出,如今包括:

  • =,=>,<,>=,<=
  • is_null/is_not_null
  • in
  • String[Starts|Ends]With, StringContains
  • NULL 安全平等.
  • 布尔运算符的应用AND/ OR/ NOT.

此外,任何其他转换(包括CAST)都会禁用谓词下推.