无法在 Spark 2.x 中加载逻辑回归模型

Question

无法在 Spark 2.x 中加载逻辑回归模型

我正在尝试 Spark 2.x 版本中可用的保存和加载选项。我构建了一个 LogisticRegression 模型并成功保存了模型。但是在加载模型时，面临以下问题

代码片段：

from pyspark.ml.classification import LogisticRegressionModel
LogisticRegressionModel.load("lrmodel")

Run Code Online (Sandbox Code Playgroud)

错误信息：

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/jars/hadoop-auth-2.7.3.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
18/10/03 16:26:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/03 16:26:20 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
Traceback (most recent call last):
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o25.load.
: java.lang.IllegalArgumentException: requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.classification.LogisticRegressionModel but found class name org.apache.spark.ml.PipelineModel
    at scala.Predef$.require(Predef.scala:224)
    at org.apache.spark.ml.util.DefaultParamsReader$.parseMetadata(ReadWrite.scala:404)
    at org.apache.spark.ml.util.DefaultParamsReader$.loadMetadata(ReadWrite.scala:383)
    at org.apache.spark.ml.classification.LogisticRegressionModel$LogisticRegressionModelReader.load(LogisticRegression.scala:1197)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:564)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.base/java.lang.Thread.run(Thread.java:844)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Volumes/Data/Innominds/WorkSpace/SparkIncrementalLearning/src/PipilineBasedModelling.py", line 59, in <module>
    loadAndRetrainModel(spark)
  File "/Volumes/Data/Innominds/WorkSpace/SparkIncrementalLearning/src/PipilineBasedModelling.py", line 51, in loadAndRetrainModel
    LogisticRegressionModel.load("lrmodel")
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/pyspark/ml/util.py", line 257, in load
    return cls.read().load(path)
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/pyspark/ml/util.py", line 197, in load
    java_obj = self._jread.load(path)
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/Volumes/Data/Innominds/spark-2.2.0-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.classification.LogisticRegressionModel but found class name org.apache.spark.ml.PipelineModel'

Run Code Online (Sandbox Code Playgroud)

我在这里错过了什么吗？

Answer 1

Aar*_*uya 5

That's because your model is not a LogisticRegressionModel. If you read the stracktrace you'll see this particular line (emphasis mine):

pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Error loading metadata: Expected class name org.apache.spark.ml.classification.LogisticRegressionModel but found class name org.apache.spark.ml.PipelineModel'

Therefore you should use PipelineModel

from pyspark.ml import PipelineModel

PipelineModel.load("lrmodel")

Run Code Online (Sandbox Code Playgroud)

归档时间：	7 年，5 月前
查看次数：	1071 次
最近记录：	7 年，5 月前