我正在运行运行时 8.1(包括 Apache Spark 3.1.1、Scala 2.12),试图让 hyperopt 按定义工作
py4j.Py4JException: Method maxNumConcurrentTasks([]) does not exist
Run Code Online (Sandbox Code Playgroud)
当我尝试
spark_trials = SparkTrials()
Run Code Online (Sandbox Code Playgroud)
我需要做什么特别的事情才能使其正常工作吗?
这是我正在使用的集群
{
"autoscale": {
"min_workers": 1,
"max_workers": 2
},
"cluster_name": "mlops_tiny_ml",
"spark_version": "8.2.x-cpu-ml-scala2.12",
"spark_conf": {},
"aws_attributes": {
"first_on_demand": 1,
"availability": "SPOT_WITH_FALLBACK",
"zone_id": "us-west-2b",
"instance_profile_arn": "arn:aws:iam::112437402463:instance-profile/databricks_instance_role_s3",
"spot_bid_price_percent": 100,
"ebs_volume_type": "GENERAL_PURPOSE_SSD",
"ebs_volume_count": 3,
"ebs_volume_size": 100
},
"node_type_id": "m4.large",
"driver_node_type_id": "m4.large",
"ssh_public_keys": [],
"custom_tags": {},
"spark_env_vars": {},
"autotermination_minutes": 120,
"enable_elastic_disk": false,
"cluster_source": "UI",
"init_scripts": [],
"cluster_id": "0xxxxxt404"
}
Run Code Online (Sandbox Code Playgroud)
我有以下架构:
{
"name": "AgentRecommendationList",
"type": "record",
"fields": [
{
"name": "userid",
"type": "string"
},
{
"name": "friends",
"type": {
"type": "array",
"items": {
"name": "SchoolFriends",
"type": "record",
"fields": [
{
"name": "Name",
"type": "string"
},
{
"name": "phoneNumber",
"type": "string"
},
{
"name": "email",
"type": "string"
}
]
}
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
我正在使用GenericRecord,并且想为SchoolFriends放入一个数组数组。
val avschema = new RestService(URL).getLatestVersion(name)
val schema = new Schema.Parser().parse(avschema.getSchema)
val record = new GenericData.Record(schema)
Run Code Online (Sandbox Code Playgroud)
我想做类似record.put(x)的事情
我正在考虑为客户提供雪花,但我无法从文档中得知他们将数据存储在哪里?好像是s3,但为什么存储成本这么贵?数据是在用户的s3中还是雪花s3中?