我有一个Cassandra表,为简单起见,看起来像:
key: text
jsonData: text
blobData: blob
Run Code Online (Sandbox Code Playgroud)
我可以使用spark和spark-cassandra-connector为此创建一个基本数据框:
val df = sqlContext.read
.format("org.apache.spark.sql.cassandra")
.options(Map("table" -> "mytable", "keyspace" -> "ks1"))
.load()
Run Code Online (Sandbox Code Playgroud)
我正在努力将JSON数据扩展到其底层结构中.我最终希望能够根据json字符串中的属性进行过滤并返回blob数据.像jsonData.foo ="bar"之类的东西并返回blobData.这目前可能吗?
scala dataframe apache-spark apache-spark-sql spark-cassandra-connector
我使用Spark 2.4.3 和 Scala 2.11
下面是 DataFrame 列中我当前的 JSON 字符串。我尝试使用函数将其模式存储JSON string在另一列中schema_of_json。但它的抛出低于错误。我该如何解决这个问题?
{
"company": {
"companyId": "123",
"companyName": "ABC"
},
"customer": {
"customerDetails": {
"customerId": "CUST-100",
"customerName": "CUST-AAA",
"status": "ACTIVE",
"phone": {
"phoneDetails": {
"home": {
"phoneno": "666-777-9999"
},
"mobile": {
"phoneno": "333-444-5555"
}
}
}
},
"address": {
"loc": "NORTH",
"adressDetails": [
{
"street": "BBB",
"city": "YYYYY",
"province": "AB",
"country": "US"
},
{
"street": "UUU",
"city": "GGGGG",
"province": "NB",
"country": "US"
}
]
}
} …Run Code Online (Sandbox Code Playgroud)