小编Bip*_*was的帖子

Apache Apex vs Apache Flink

由于两者都是流式框架,一次处理事件,这两种技术/流式框架之间的核心架构差异是什么?

还有哪些特定用例,哪一个比另一个更合适?

stream-processing apache-flink apache-apex

6
推荐指数
1
解决办法
2085
查看次数

无法访问返回的 h5py 对象实例

我在这里有一个非常奇怪的问题。我有两个函数:一个读取使用 h5py 创建的 HDF5 文件,另一个创建一个新的 HDF5 文件,该文件连接前一个函数返回的内容。

def read_file(filename):
    with h5py.File(filename+".hdf5",'r') as hf:

        group1 = hf.get('group1')
        group1 = hf.get('group2')            
        dataset1 = hf.get('dataset1')
        dataset2 = hf.get('dataset2')
        print group1.attrs['w'] # Works here

        return dataset1, dataset2, group1, group1
Run Code Online (Sandbox Code Playgroud)

和创建文件功能

def create_chunk(start_index, end_index):

    for i in range(start_index, end_index):
        if i == start_index:
            mergedhf = h5py.File("output.hdf5",'w')
            mergedhf.create_dataset("dataset1",dtype='float64')
            mergedhf.create_dataset("dataset2",dtype='float64')

            g1 = mergedhf.create_group('group1')
            g2 = mergedhf.create_group('group2')

    rd1,rd2,rg1,rg2 = read_file(filename)

    print rg1.attrs['w'] #gives me <Closed HDF5 group> message

    g1.attrs['w'] = "content"
    g1.attrs['x'] = "content"
    g2.attrs['y'] = "content"
    g2.attrs['z'] …
Run Code Online (Sandbox Code Playgroud)

python hdf5 python-2.7 h5py

5
推荐指数
1
解决办法
1727
查看次数

使用“spark.executor.userClassPathFirst”配置时出错

我试图通过spark-submit运行spark作业,尽管由于某些依赖项(特别是jackson-databind)spark无法获取较新版本的依赖项并使用其自己的版本。这导致“NoSuchMethodError ”,因为显然该方法在旧版本中不可用,而旧版本是 Spark 分发的一部分。

为了解决“ NoSuchMethodError”,我通过spark-submit传递了依赖项的jar文件,并且正如其他一些答案中所指出的,我将以下内容添加到我的spark-submit调用中

--conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true
Run Code Online (Sandbox Code Playgroud)

但是当我尝试运行它时,我收到以下异常作为来自 YARN 集群的日志

17/05/10 11:21:27 ERROR TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 5822357421500897232
java.lang.ClassCastException: cannot assign instance of scala.concurrent.duration.FiniteDuration to field org.apache.spark.rpc.RpcTimeout.duration of type scala.concurrent.duration.FiniteDuration in instance of org.apache.spark.rpc.RpcTimeout
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2237)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:109)
    at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:262)
    at …
Run Code Online (Sandbox Code Playgroud)

java hortonworks-data-platform apache-spark spark-streaming

5
推荐指数
0
解决办法
6268
查看次数