Apache Spark Scala CosmosDB连接器将DataFrame写回到数据库

Question

Apache Spark Scala CosmosDB连接器将DataFrame写回到数据库

dev*_*v53 2 scala apache-spark azure-cosmosdb

我正在使用Scala中具有Azure CosmosDB连接器的Apache Spark，想知道是否有人对我如何将DataFrame写回到CosmosDB中的集合有示例或见解。目前，我能够连接到我的一个收藏夹并返回数据并对其进行操作，但是我想将结果写回到同一数据库内的另一个收藏夹中。

我创建了一个writeConfig，其中包含我的EndPoint，MasterKey，数据库和要写入的Collection。

然后，我尝试使用以下行将其写入集合。

manipulatedData.toJSON.write.mode(SaveMode.Overwrite).cosmosDB(writeConfig)

Run Code Online (Sandbox Code Playgroud)

运行正常，不显示任何错误，但我的收藏集中没有任何显示。

我浏览了可以在https://github.com/Azure/azure-cosmosdb-spark上找到的文档，但是没有找到将数据写回数据库的任何示例。

如果有比我正在做的写一个documentDB / cosmosDB更简单的方法？我愿意接受任何选择。

谢谢你的帮助。

Answer 1

Den*_*Lee 5

就像您提到的那样，您可以直接从Spark DataFrame保存到Cosmos DB。您可能不需要使用toJSON，例如：

// Import SaveMode so you can Overwrite, Append, ErrorIfExists, Ignore
import org.apache.spark.sql.{Row, SaveMode, SparkSession}

// Create new DataFrame `df` which has slightly flights information 
// i.e. change the delay value to -999
val df = spark.sql("select -999 as delay, distance, origin, date, destination from c limit 5")

// Save to Cosmos DB (using Append in this case)
//    Ensure the baseConfig contains a Read-Write Key
//    The key provided in our examples is a Read-Only Key
df.write.mode(SaveMode.Append).cosmosDB(baseConfig)

Run Code Online (Sandbox Code Playgroud)

至于文档，您的正确之处在于应该更好地调用保存功能。我已经在用户指南/示例脚本中创建了包含方法，该方法如何保存到Cosmos DB＃91来解决此问题。

至于保存但没有看到错误，您的配置是否有机会使用只读键而不是读写键？我刚刚使用只读键创建了保存到CosmosDB，没有错误＃92指出了相同的问题。

归档时间：	8 年，7 月前
查看次数：	767 次
最近记录：	8 年，7 月前