在Spark中是否可以从scala集合中实现'.combinations'功能?
/** Iterates over combinations.
*
* @return An Iterator which traverses the possible n-element combinations of this $coll.
* @example `"abbbc".combinations(2) = Iterator(ab, ac, bb, bc)`
*/
Run Code Online (Sandbox Code Playgroud)
例如,对于size = 2的组合,如何从RDD [X]到RDD [List [X]]或RDD [(X,X)].并且假设RDD中的所有值都是唯一的.
如何用scalaz实现这样的行为:
"Fail1".failNel[Int] and "Fail2".failNel[Int] to Failure("Fail1", "Fail2")
"Fail1".failNel[Int] and 100.successNel[String] to Success(100)
Run Code Online (Sandbox Code Playgroud)
我的解决方案看起来很复杂,我想存在一些其他方式来做succint:
def aggregateErrorsOrSuccess(v1: ValidationNEL[String, Int],
v2: ValidationNEL[String, Int]) = {
v2.fold(
nl => (nl.fail[Int] |@| v1) {(i1, i2) => (/*actually should never happen*/)},
res => res.successNel[String]
)
}
Run Code Online (Sandbox Code Playgroud)
=====================
第二个解决方案:
implicit def nel2list[T](nl: NonEmptyList[T]) = nl.head :: nl.tail;
implicit def ValidationNELPlus[X]: Plus[({type ?[?]=ValidationNEL[X, ?]})#?] = new Plus[({type ?[?]=ValidationNEL[X, ?]})#?] {
def plus[A](a1: ValidationNEL[X, A], a2: => ValidationNEL[X, A]) = a1 match {
case Success(_) => a1
case Failure(f1) …Run Code Online (Sandbox Code Playgroud) 我有基于hyperloglog的示例.我正在尝试Container使用大小参数化我的参数,并使用反射在容器上的函数中使用此参数.
import Data.Proxy
import Data.Reflection
newtype Container p = Container { runContainer :: [Int] }
deriving (Eq, Show)
instance Reifies p Integer => Monoid (Container p) where
mempty = Container $ replicate (fromIntegral (reflect (Proxy :: Proxy p))) 0
mappend (Container l) (Container r) = undefined
Run Code Online (Sandbox Code Playgroud)
我的跛脚Monoid实例mempty基于已知参数定义,并做一些"类型安全" mappend.当我尝试对不同大小的容器进行求和时,它会完美地工作,并且会出现类型错误.
但是它仍然可以被欺骗coerce,我正在寻找在编译时阻止它的方法:
ghci> :set -XDataKinds
ghci> :m +Data.Coerce
ghci> let c3 = mempty :: Container 3
ghci> c3
ghci> Container {runContaner: [0,0,0]} …Run Code Online (Sandbox Code Playgroud) 我正试图从笔记本电脑启动bin/spark-shell,bin/pyspark在yarn-client模式下连接到Yarn集群,我得到了同样的错误
WARN ScriptBasedMapping: Exception running
/etc/hadoop/conf.cloudera.yarn1/topology.py 10.0.240.71
java.io.IOException: Cannot run program "/etc/hadoop/conf.cloudera.yarn1/topology.py"
(in directory "/Users/eugenezhulenev/projects/cloudera/spark"): error=2,
No such file or directory
Run Code Online (Sandbox Code Playgroud)
Spark试图/etc/hadoop/conf.cloudera.yarn1/topology.py在我的笔记本电脑上运行,但不在Yarn的工作节点上运行.
从Spark 1.2.0更新到1.3.0(CDH 5.4.2)后出现此问题
这是一个简短的例子.我想知道为什么在TypeClass示例中我没有明确说明,forall而在函数定义中没有forall它无法编译:
Couldn't match kind ‘Nat’ with ‘*’
When matching types
proxy0 :: Nat -> *
Proxy :: * -> *
Expected type: proxy0 n0
Actual type: Proxy p0
In the first argument of ‘natVal’, namely ‘(Proxy :: Proxy p)’
In the second argument of ‘($)’, namely ‘natVal (Proxy :: Proxy p)’
In the first argument of ‘(++)’, namely
‘(show $ natVal (Proxy :: Proxy p))’
Run Code Online (Sandbox Code Playgroud)
码:
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE …Run Code Online (Sandbox Code Playgroud)