使用pyspark:
from pyspark.sql import SparkSession
spark = SparkSession\
.builder\
.appName("spark play")\
.getOrCreate()
df = spark.read\
.format("jdbc")\
.option("url", "jdbc:mysql://localhost:port")\
.option("dbtable", "schema.tablename")\
.option("user", "username")\
.option("password", "password")\
.load()
Run Code Online (Sandbox Code Playgroud)
我宁愿抓取查询的结果集,而不是获取"schema.tablename".
我有一个LIBSVM缩放模型(使用svm-scale生成),我想将其移植到PySpark.我天真地尝试了以下内容:
scaler_path = "path to model"
a = MinMaxScaler().load(scaler_path)
Run Code Online (Sandbox Code Playgroud)
但是我发出错误,期待一个元数据目录:
Py4JJavaErrorTraceback (most recent call last)
<ipython-input-22-1942e7522174> in <module>()
----> 1 a = MinMaxScaler().load(scaler_path)
/srv/data/spark/spark-2.0.0-bin-hadoop2.6/python/pyspark/ml/util.pyc in load(cls, path)
226 def load(cls, path):
227 """Reads an ML instance from the input path, a shortcut of `read().load(path)`."""
--> 228 return cls.read().load(path)
229
230
/srv/data/spark/spark-2.0.0-bin-hadoop2.6/python/pyspark/ml/util.pyc in load(self, path)
174 if not isinstance(path, basestring):
175 raise TypeError("path should be a basestring, got type %s" % type(path))
--> 176 java_obj = self._jread.load(path)
177 if not hasattr(self._clazz, …Run Code Online (Sandbox Code Playgroud)