小编Fal*_*kra的帖子

Count occurrences of a list of substrings in a pyspark df column

I want to count the occurrences of list of substrings and create a column based on a column in the pyspark df which contains a long string.

Input:          
       ID    History

       1     USA|UK|IND|DEN|MAL|SWE|AUS
       2     USA|UK|PAK|NOR
       3     NOR|NZE
       4     IND|PAK|NOR

 lst=['USA','IND','DEN']


Output :
       ID    History                      Count

       1     USA|UK|IND|DEN|MAL|SWE|AUS    3
       2     USA|UK|PAK|NOR                1
       3     NOR|NZE                       0
       4     IND|PAK|NOR                   1
Run Code Online (Sandbox Code Playgroud)

python hive pyspark pyspark-sql

5
推荐指数
2
解决办法
4907
查看次数

标签 统计

hive ×1

pyspark ×1

pyspark-sql ×1

python ×1