小编And*_*aki的帖子

使用 Pyspark 分解包含 Json 列表的列

我试图分解一个名为 phone 的列,如下所示的架构和内容:

(customer_external_id,StringType
phones,StringType)

customer_id    phones
x8x46x5        [{"phone" : "(xx) 35xx4x80"},{"phone" : "(xx) xxxx46605"}]
xx44xx5        [{"phone" : "(xx) xxx3x8443"}]
4xxxxx5        [{"phone" : "(xx) x6xx083x3"},{"areaCode" : "xx"},{"phone" : "(xx) 3xxx83x3"}]
xx6564x        [{"phone" : "(x3) x88xx344x"}]
xx8x4x0        [{"phone" : "(xx) x83x5x8xx"}]
xx0434x        [{"phone" : "(0x) 3x6x4080"},{"areaCode" : "xx"}]
x465x40        [{"phone" : "(6x) x6x445xx"}]
x0684x8        [{"phone" : "(xx) x8x88x4x4"},{"phone" : "(xx) x8x88x4x4"}]
x84x850        [{"phone" : "(xx) 55x56xx4"}]
x0604xx        [{"phone" : "(xx) x8x4xxx68"}]
4x6xxx0        [{"phone" : "(xx) x588x43xx"},{"phone" : "(xx) 5x6465xx"},{"phone" : "(xx) …
Run Code Online (Sandbox Code Playgroud)

python etl apache-spark pyspark

2
推荐指数
1
解决办法
4460
查看次数

标签 统计

apache-spark ×1

etl ×1

pyspark ×1

python ×1