相关疑难解决方法(0)

Pypsark-使用collect_list时保留空值

按照接受的答案在pyspark collect_set或GROUPBY collect_list，当你做一个collect_list特定列，在null此列值将被删除。我已经检查过了，这是真的。

但就我而言，我需要保留null列-如何实现此目的？

我没有找到有关此类collect_list功能变体的任何信息。

解释我为什么要空值的背景上下文：

我有一个数据框df如下：

cId   |  eId  |  amount  |  city
1     |  2    |   20.0   |  Paris
1     |  2    |   30.0   |  Seoul
1     |  3    |   10.0   |  Phoenix
1     |  3    |   5.0    |  null

Run Code Online (Sandbox Code Playgroud)

我想使用以下映射将其写入Elasticsearch索引：

"mappings": {
    "doc": {
        "properties": {
            "eId": { "type": "keyword" },
            "cId": { "type": "keyword" },
            "transactions": {
                "type": "nested", 
                "properties": {
                    "amount": { "type": …

Run Code Online (Sandbox Code Playgroud)

nested collect elasticsearch-mapping elasticsearch-hadoop pyspark-sql

act*_*ner

2018 03-21

5
推荐指数

1
解决办法

1740
查看次数