我正在尝试加载PyNaCl到在 Windows 上运行的 pyspark UDF。
from nacl import bindings as c
def verify_signature(msg, keys):
c.crypto_sign_ed25519ph_update(...)
...
verify_signature_udf = udf(lambda x: verify_signature(x, public_keys), BooleanType())
data_signed = data.withColumn(
"is_signature_valid", verify_signature_udf("state_values")
)
Run Code Online (Sandbox Code Playgroud)
PyNaCl已在本地安装(使用databricks-connect),但据我了解,它没有安装在执行器上。因此我得到这个:
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 679, in subimport
__import__(name)
ModuleNotFoundError: No module named 'nacl'
Run Code Online (Sandbox Code Playgroud)
正如Python 打包中所述,我尝试像这样加载它:
File "/databricks/spark/python/pyspark/cloudpickle/cloudpickle.py", line 679, in subimport
__import__(name)
ModuleNotFoundError: No module named 'nacl'
Run Code Online (Sandbox Code Playgroud)
没有变化,同样的消息。如果我只是从 tar.gz 中提取 nacl 包并将其存储为 zip 文件并按如下方式加载:
import os
os.environ['PYSPARK_PYTHON'] = "./environment/bin/python"
spark = SparkSession.builder.config(
"spark.archives",
"pyspark_venv.tar.gz#environment").getOrCreate()
Run Code Online (Sandbox Code Playgroud)
它已加载,但我现在收到此错误: …