ash*_*ley 4 apache-spark pyspark databricks dbutils databricks-community-edition
尝试读取databricks 社区版集群中的增量日志文件。(databricks-7.2 版本)
df=spark.range(100).toDF("id")
df.show()
df.repartition(1).write.mode("append").format("delta").save("/user/delta_test")
Run Code Online (Sandbox Code Playgroud)
with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
Run Code Online (Sandbox Code Playgroud)
Getting file not found error:
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<command-1759925981994211> in <module>
----> 1 with open('/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
2 for l in f:
3 print(l)
FileNotFoundError: [Errno 2] No such file or directory: '/user/delta_test/_delta_log/00000000000000000000.json'
Run Code Online (Sandbox Code Playgroud)
我尝试添加/dbfs/,但dbfs:/没有解决,仍然出现相同的错误。
with open('/dbfs/user/delta_test/_delta_log/00000000000000000000.json','r') as f:
for l in f:
print(l)
Run Code Online (Sandbox Code Playgroud)
但是使用dbutils.fs.head我能够读取文件。
dbutils.fs.head("/user/delta_test/_delta_log/00000000000000000000.json")
'{"commitInfo":{"timestamp":1598224183331,"userId":"284520831744638","userName":"","operation":"WRITE","operationParameters":{"mode":"Append","partitionBy":"[]"},"notebook":{"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"1","numOutputBytes":"1171","numOutputRows":"100"}}}\n{"protocol":{"minReaderVersi...etc
Run Code Online (Sandbox Code Playgroud)
我们如何dbfs file在 databricks 中读取/cat a python open method?
默认情况下,这些数据在 DBFS 上,您的代码需要了解如何访问它。Python 不知道——这就是它失败的原因。
但是有一个解决方法 - DBFS 挂载到节点 /dbfs,因此您只需要将其附加到您的文件名:而不是/user/delta_test/_delta_log/00000000000000000000.json,使用/dbfs/user/delta_test/_delta_log/00000000000000000000.json
更新:在社区版上,在 DBR 7+ 中,此安装被禁用。解决方法是使用dbutils.fs.cp命令将文件从 DBFS 复制到本地目录,例如/tmp,或/var/tmp,然后从中读取:
dbutils.fs.cp("/file_on_dbfs", "file:///tmp/local_file")
Run Code Online (Sandbox Code Playgroud)
请注意,如果您不指定 URI 架构,则默认情况下该文件是指 DBFS,而要指代您需要使用file://前缀的本地文件(请参阅文档)。
| 归档时间: |
|
| 查看次数: |
1921 次 |
| 最近记录: |