小编Fal*_*kra的帖子

Count occurrences of a list of substrings in a pyspark df column

I want to count the occurrences of list of substrings and create a column based on a column in the pyspark df which contains a long string.

Input:          
       ID    History

       1     USA|UK|IND|DEN|MAL|SWE|AUS
       2     USA|UK|PAK|NOR
       3     NOR|NZE
       4     IND|PAK|NOR

 lst=['USA','IND','DEN']


Output :
       ID    History                      Count

       1     USA|UK|IND|DEN|MAL|SWE|AUS    3
       2     USA|UK|PAK|NOR                1
       3     NOR|NZE                       0
       4     IND|PAK|NOR                   1

Run Code Online (Sandbox Code Playgroud)

python hive pyspark pyspark-sql

Fal*_*kra

lucky-day

5
推荐指数

2
解决办法

4907
查看次数