Hey*_*Man 5 python pyspark databricks pynacl databricks-connect
我正在尝试加载PyNaCl到在 Windows 上运行的 pyspark UDF。
from nacl import bindings as c
def verify_signature(msg, keys):
c.crypto_sign_ed25519ph_update(...)
...
verify_signature_udf = udf(lambda x: verify_signature(x, public_keys), BooleanType())
data_signed = data.withColumn(
"is_signature_valid", verify_signature_udf("state_values")
)
Run Code Online (Sandbox Code Playgroud)
PyNaCl已在本地安装(使用databricks-connect),但据我了解,它没有安装在执行器上。因此我得到这个:
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 679, in subimport
__import__(name)
ModuleNotFoundError: No module named 'nacl'
Run Code Online (Sandbox Code Playgroud)
正如Python 打包中所述,我尝试像这样加载它:
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 679, in subimport
__import__(name)
ModuleNotFoundError: No module named 'nacl'
Run Code Online (Sandbox Code Playgroud)
没有变化,同样的消息。如果我只是从 tar.gz 中提取 nacl 包并将其存储为 zip 文件并按如下方式加载:
import os
os.environ['PYSPARK_PYTHON'] = "./environment/bin/python"
spark = SparkSession.builder.config(
"spark.archives",
"pyspark_venv.tar.gz#environment").getOrCreate()
Run Code Online (Sandbox Code Playgroud)
它已加载,但我现在收到此错误:
File "/local_disk0/spark-xxx8db3a-5436-4ce8-8ff5-19eaeb4397b4/executor-xxxb7a74-4e1b-40bf-aae2-fc3553155f91/spark-xxx70cb9-482d-42a9-901a-c36f66a42a19/isolatedSparkFiles/0e10cb02-db69-4d63-b7ea-6c2b415fb5d9/nacl.zip/nacl/bindings/crypto_aead.py", line 17, in <module>
from nacl._sodium import ffi, lib
ModuleNotFoundError: No module named 'nacl._sodium'
Run Code Online (Sandbox Code Playgroud)
有任何想法吗?它可以与dbx一起使用吗?或者,是否有一个选项可以在不使用 UDF 的情况下实现此目的?
编辑:在 zip 文件中有以下钠成分。tgz 中没有额外的钠物质,而 zip 中没有:
spark.sparkContext.addPyFile(path="nacl.zip")
Run Code Online (Sandbox Code Playgroud)
Edit2:当我将导入移至 for 循环时,databricks-connect 将运行无错误,但在执行器上执行时会引发错误,因此它也不会像这样工作(是我的误解):
File "/local_disk0/spark-xxx8db3a-5436-4ce8-8ff5-19eaeb4397b4/executor-xxxb7a74-4e1b-40bf-aae2-fc3553155f91/spark-xxx70cb9-482d-42a9-901a-c36f66a42a19/isolatedSparkFiles/0e10cb02-db69-4d63-b7ea-6c2b415fb5d9/nacl.zip/nacl/bindings/crypto_aead.py", line 17, in <module>
from nacl._sodium import ffi, lib
ModuleNotFoundError: No module named 'nacl._sodium'
Run Code Online (Sandbox Code Playgroud)