Sud*_*van 6 machine-learning outliers apache-spark apache-spark-mllib
Spark 2.0.0中是否有预先构建的异常检测算法/四分位间距识别方法?我在这里找到了一些代码,但我不认为这在spark2.0.0中可用
谢谢
如果您没有\xc2\xb4t 找到预构建的方法,您可以执行类似的操作:
\n\n使用盒须图检测异常值的示例:
\n\nval sampleData = List(10.2, 14.1,14.4,14.4,14.4,14.5,14.5,14.6,14.7,\n 14.7, 14.7,14.9,15.1, 15.9,16.4)\nval rowRDD = sparkSession.sparkContext.makeRDD(sampleData.map(value => Row(value)))\nval schema = StructType(Array(StructField("value",DoubleType)))\nval df = sparkSession.createDataFrame(rowRDD,schema)\nval quantiles = df.stat.approxQuantile("value", Array(0.25,0.75),0.0)\nval Q1 = quantiles(0)\nval Q3 = quantiles(1)\nval IQR = Q3 - Q1\nval lowerRange = Q1 - 1.5*IQR\nval upperRange = Q3+ 1.5*IQR\n\nval outliers = df.filter(s"value < $lowerRange or value > $upperRange")\noutliers.show()\nRun Code Online (Sandbox Code Playgroud)\n\n解决方案来源:
\n\n\n