为什么Kafka基于拉力而不是推基础?我同意卡夫卡给了我经历过的高通量,但我没有看到卡夫卡吞吐量将如何往下走,如果它是基于推.关于基于推送的任何想法都会降低性能?
我正在使用测试容器库来启动容器。它工作了一段时间,但目前遇到了这个
java.lang.IllegalStateException: Could not connect to Ryuk at localhost:49167
at org.testcontainers.utility.ResourceReaper.start(ResourceReaper.java:201)
at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:205)
at org.testcontainers.LazyDockerClient.getDockerClient(LazyDockerClient.java:14)
at org.testcontainers.LazyDockerClient.authConfig(LazyDockerClient.java:12)
at org.testcontainers.containers.GenericContainer.start(GenericContainer.java:310)
Run Code Online (Sandbox Code Playgroud)
我查看了资源收割机代码,它似乎在这里失败了
public synchronized void performCleanup() {
this.registeredContainers.forEach(this::stopContainer);
this.registeredNetworks.forEach(this::removeNetwork); //FAILS HERE
this.registeredImages.forEach(this::removeImage);
}
Run Code Online (Sandbox Code Playgroud) Apache Mesos做什么Kubernetes不能做,反之亦然?
Mesos是一个两级调度程序.当然它从每台机器上抓取资源信息并将其提供给顶级调度程序,以便像kubernetes这样的框架可以用来跨机器调度容器,但Kubernetes本身可以跨机器调度容器(从这方面不需要Mesos).那么,Apache Mesos可以做些什么,Kubernetes不能做,反之亦然?
我经常不断得到以下异常,我想知道为什么会发生这种情况?经过研究,我发现我能做到,.set("spark.submit.deployMode", "nio");但这也不起作用,我使用的是火花2.0.0
WARN TransportChannelHandler: Exception in connection from /172.31.3.245:46014
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:898)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
Run Code Online (Sandbox Code Playgroud) 在实践中(非理论),小批量与实时流之间有什么区别?从理论上讲,我理解迷你批量是在给定的时间范围内批量生成的,而实时流式更像是在数据到达时做某事但是我最大的问题是为什么不使用epsilon时间框架(比如说一毫秒)或我想了解为什么一个人比其他人更有效的解决方案?
我最近遇到了一个例子,其中迷你批处理(Apache Spark)用于欺诈检测,实时流(Apache Flink)用于欺诈预防.有人还评论说小批量不是防止欺诈的有效解决方案(因为目标是防止交易发生)现在我想知道为什么这对迷你批次(Spark)不会那么有效?为什么以1毫秒的延迟运行迷你批处理无效?批处理是一种在任何地方使用的技术,包括操作系统和内核TCP/IP堆栈,其中磁盘或网络的数据确实被缓冲,那么说一个比其他更有效的令人信服的因素是什么?
data-processing stream-processing batch-processing apache-spark apache-flink
如何降低数据框列名的大小写而不是其值?使用 RAW Spark SQL 和 Dataframe 方法?
输入数据框(假设我有 100 个大写的这些列)
NAME | COUNTRY | SRC | CITY | DEBIT
---------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
Run Code Online (Sandbox Code Playgroud)
目标数据框
name | country | src | city | debit
------------------------------------------------
"foo"| "NZ" | salary | "Auckland" | 15.0
"bar"| "Aus" | investment | "Melbourne"| 12.5
Run Code Online (Sandbox Code Playgroud) 这个错误是最难追查的.我不确定发生了什么.我正在我的位置机器上运行Spark集群.所以整个火花集群都在一个主机下127.0.0.1,我在独立模式下运行
JavaPairRDD<byte[], Iterable<CassandraRow>> cassandraRowsRDD= javaFunctions(sc).cassandraTable("test", "hello" )
.select("rowkey", "col1", "col2", "col3", )
.spanBy(new Function<CassandraRow, byte[]>() {
@Override
public byte[] call(CassandraRow v1) {
return v1.getBytes("rowkey").array();
}
}, byte[].class);
Iterable<Tuple2<byte[], Iterable<CassandraRow>>> listOftuples = cassandraRowsRDD.collect(); //ERROR HAPPENS HERE
Tuple2<byte[], Iterable<CassandraRow>> tuple = listOftuples.iterator().next();
byte[] partitionKey = tuple._1();
for(CassandraRow cassandraRow: tuple._2()) {
System.out.println("************START************");
System.out.println(new String(partitionKey));
System.out.println("************END************");
}
Run Code Online (Sandbox Code Playgroud)
这个错误是最难追查的.它显然发生在cassandraRowsRDD.collect(),我不知道为什么?
16/10/09 23:36:21 ERROR Executor: Exception in task 2.3 in stage 0.0 (TID 21)
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ …Run Code Online (Sandbox Code Playgroud) 运行5-6小时后,我从火花驱动程序中得到以下错误.我使用的是Ubuntu 16.04 LTS和open-jdk-8.
Exception in thread "ForkJoinPool-50-worker-11" Exception in thread "dag-scheduler-event-loop" Exception in thread "ForkJoinPool-50-worker-13" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
at scala.concurrent.forkjoin.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1795)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:117)
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.push(ForkJoinPool.java:1072)
at scala.concurrent.forkjoin.ForkJoinTask.fork(ForkJoinTask.java:654)
at scala.collection.parallel.ForkJoinTasks$WrappedTask$class.start(Tasks.scala:377)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.start(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$$anonfun$spawnSubtasks$1.apply(Tasks.scala:189)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$$anonfun$spawnSubtasks$1.apply(Tasks.scala:186)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.spawnSubtasks(Tasks.scala:186)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.spawnSubtasks(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:157)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
at scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
at scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
at scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
at …Run Code Online (Sandbox Code Playgroud) 我有一个以以下行开头的dockerfile
FROM java:8
我认为这应该从docker容器注册表中提取图像并安装.没有?
当我在容器中运行java命令时,我收到以下错误
ERROR: JAVA_HOME is not set and no 'java' command could be found in your PATH.
Run Code Online (Sandbox Code Playgroud)
使用docker安装java 8(openjdk版本)最简单,最好的方法是什么?
更新:
RUN apt-get install -y --no-install-recommends software-properties-common
RUN add-apt-repository -y ppa:openjdk-r/ppa
RUN apt-get update
RUN apt-get install -y openjdk-8-jdk
RUN apt-get install -y openjdk-8-jre
RUN update-alternatives --config java
RUN update-alternatives --config javac
Run Code Online (Sandbox Code Playgroud) runsequence是下面的代码是不是很有效?
var gulp = require('gulp');
var del = require('del');
var browserify = require('gulp-browserify');
var concat = require('gulp-concat');
var runSequence = require('run-sequence');
var nodemon = require('gulp-nodemon');
gulp.task('clean', function(cb) {
console.log('YOLO1');
del(['build/*'], cb);
});
gulp.task('copy', function() {
console.log('YOLO2')
return gulp.src('client/www/index.html')
.pipe(gulp.dest('build'));
});
gulp.task('browserify', function() {
console.log('YOLO3')
return gulp.src('client/index.js')
.pipe(browserify({transform: 'reactify'}))
.pipe(concat('app.js'))
.pipe(gulp.dest('build'));
});
gulp.task('build', function(cb) {
console.log('YOLO4')
runSequence('clean', 'browserify', 'copy', cb);
});
gulp.task('default', ['build'], function() {
gulp.watch('client/*/*', ['build']);
nodemon({ script: './bin/www', ignore: ['gulpfile.js', 'build', 'client', 'dist'] });
});
Run Code Online (Sandbox Code Playgroud)
电流输出:
YOLO4,
YOLO1
Run Code Online (Sandbox Code Playgroud)
期望的输出: …
apache-spark ×5
docker ×2
java ×2
apache-flink ×1
apache-kafka ×1
gulp ×1
java-8 ×1
javascript ×1
kubernetes ×1
mesos ×1
mesosphere ×1
reactjs ×1