小编Des*_* pv的帖子

如何从Cassandra表加载元组?

从包含元组类型的 cassandra表中将数据加载到Spark中时,我遇到了一个问题.我的系统规格如下.

  • Spark:1.6.0
  • Spark Cassandra连接器:1.6.0-M1
  • 卡珊德拉:2.1.8

代码片段:

val myDataFrame =  sqlContext.read.format("org.apache.spark.sql.cassandra").options(Map( "table" -> "test3", "keyspace" -> "pa" , "cluster"
->"ClusterOne")).load.select($"id")
Run Code Online (Sandbox Code Playgroud)

"test3"是在cassandra中的键空间"pa"下创建的表名

"test3"的表结构

 CREATE TABLE pa.test3 (
 id int,
 m1 Tuple<text, int>,
 PRIMARY KEY (id)
Run Code Online (Sandbox Code Playgroud)

我收到了以下错误

java.util.NoSuchElementException: key not found: TupleType(Vector(TupleFieldDef(0,VarCharType), TupleFieldDef(1,IntType)))
    at scala.collection.MapLike$class.default(MapLike.scala:228)
    at scala.collection.AbstractMap.default(Map.scala:58)
    at scala.collection.MapLike$class.apply(MapLike.scala:141)
    at scala.collection.AbstractMap.apply(Map.scala:58)
    at org.apache.spark.sql.cassandra.DataTypeConverter$.catalystDataType(DataTypeConverter.scala:55)
    at org.apache.spark.sql.cassandra.DataTypeConverter$.toStructField(DataTypeConverter.scala:61)
    at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1$$anonfun$apply$1.apply(CassandraSourceRelation.scala:64)
    at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1$$anonfun$apply$1.apply(CassandraSourceRelation.scala:64)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1.apply(CassandraSourceRelation.scala:64)
    at org.apache.spark.sql.cassandra.CassandraSourceRelation$$anonfun$schema$1.apply(CassandraSourceRelation.scala:64)
    at scala.Option.getOrElse(Option.scala:120) …
Run Code Online (Sandbox Code Playgroud)

apache-spark apache-spark-sql spark-cassandra-connector

6
推荐指数
1
解决办法
370
查看次数

如何从BinaryClassificationMetrics绘制ROC曲线和精确回忆曲线

我试图在图中绘制ROC曲线和Precision-Recall曲线.这些点是从Spark Mllib BinaryClassificationMetrics生成的.按照以下Spark https://spark.apache.org/docs/latest/mllib-evaluation-metrics.html

[(1.0,1.0), (0.0,0.4444444444444444)] Precision
[(1.0,1.0), (0.0,1.0)] Recall
[(1.0,1.0), (0.0,0.6153846153846153)] - F1Measure    
[(0.0,1.0), (1.0,1.0), (1.0,0.4444444444444444)]- Precision-Recall curve
[(0.0,0.0), (0.0,1.0), (1.0,1.0), (1.0,1.0)] - ROC curve
Run Code Online (Sandbox Code Playgroud)

machine-learning apache-spark apache-spark-mllib

6
推荐指数
1
解决办法
7279
查看次数

如何在Naive Bayes模型的BinaryClassificationMetrics评估中给出预测和标签列

我对BinaryClassificationMetrics(Mllib)输入感到困惑.按照Apache的火花1.6.0,我们需要传递predictedandlabel类型的(RDD[(Double,Double)])从转化的数据帧将具有预测概率(矢量)rawPrediction(矢量).

我已经从Predicted和label列创建了RDD [(Double,Double)].在BinaryClassificationMetricsNavieBayesModel执行评估之后,我能够检索ROC,PR等.但是值是有限的,我无法使用从此生成的值绘制曲线.Roc包含4个值,PR包含3个值.

它是准备以正确的方式PredictedandLabel或者我需要使用rawPrediction列或概率列,而不是预测列?

scala machine-learning apache-spark apache-spark-ml apache-spark-mllib

6
推荐指数
1
解决办法
2134
查看次数