小编kos*_*ios的帖子

hadoop协议消息太大了.可能是恶意的.使用CodedInputStream.setSizeLimit()来增加大小限制

我在datanodes的日志中看到了这一点.这可能是因为我将500万个文件复制到hdfs:

java.lang.IllegalStateException: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limit.
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:332)
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:310)
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder.getBlockListAsLongs(BlockListAsLongs.java:288)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.blockReport(DatanodeProtocolClientSideTranslatorPB.java:190)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:507)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:738)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:874)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message was too large.  May be malicious.  Use CodedInputStream.setSizeLimit() to increase the size limit.
at com.google.protobuf.InvalidProtocolBufferException.sizeLimitExceeded(InvalidProtocolBufferException.java:110)
at com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:755)
at com.google.protobuf.CodedInputStream.readRawByte(CodedInputStream.java:769)
at com.google.protobuf.CodedInputStream.readRawVarint64(CodedInputStream.java:462)
at com.google.protobuf.CodedInputStream.readSInt64(CodedInputStream.java:363)
at org.apache.hadoop.hdfs.protocol.BlockListAsLongs$BufferDecoder$1.next(BlockListAsLongs.java:326)
... 7 more
Run Code Online (Sandbox Code Playgroud)

我只是使用hadoop fs -put ....将文件复制到hdfs.最近我开始在客户端获取这些类型的消息:

15/06/30 15:00:58 INFO hdfs.DFSClient:无法完成/pdf-nxml/file1.nxml.COPYING retrying ... 15/06/30 15:01:05 …

hadoop hadoop2

9
推荐指数
1
解决办法
3761
查看次数

如何定义我的工件groupId?

我有一个sbt项目,我需要发布本地,然后从其他项目中使用它作为"libraryDependencies".这一切都很好,但工件的groupId与它的名称相同.我可以在build.sbt中以某种方式指定groupId吗?

sbt

7
推荐指数
1
解决办法
2718
查看次数

为什么SBT不使用根项目中的解析器(在多模块项目中)

在多模块项目中,SBT似乎resolvers在构建模块时不使用.解析器在根项目中声明build.sbt如下:

resolvers += "SpringSource Milestone Repository" at "http://repo.springsource.org/milestone"
Run Code Online (Sandbox Code Playgroud)

并且项目声明如下:

lazy val core = project.settings(
    libraryDependencies ++= { ... }
)
Run Code Online (Sandbox Code Playgroud)

但是在编译时,没有使用旋转变压器,我得到:

[info] Resolving org.springframework.scala#spring-scala;1.0.0.BUILD-SNAPSHOT ...
[warn]  module not found: org.springframework.scala#spring-scala;1.0.0.BUILD-SNAPSHOT
[warn] ==== local: tried
[warn]   /home/ariskk/.ivy2/local/org.springframework.scala/spring-scala/1.0.0.BUILD-SNAPSHOT/ivys/ivy.xml
[warn] ==== public: tried
[warn]   http://repo1.maven.org/maven2/org/springframework/scala/spring-scala/1.0.0.BUILD-SNAPSHOT/spring-scala-1.0.0.BUILD-SNAPSHOT.pom
[info] Resolving org.fusesource.jansi#jansi;1.4 ...
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  ::          UNRESOLVED DEPENDENCIES         ::
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
[warn]  :: org.springframework.scala#spring-scala;1.0.0.BUILD-SNAPSHOT: not found
[warn]  ::::::::::::::::::::::::::::::::::::::::::::::
Run Code Online (Sandbox Code Playgroud)

什么想法可能是错的?

sbt

6
推荐指数
1
解决办法
1434
查看次数

spark + hadoop数据位置

我有一个文件名的RDD,所以RDD [String].我通过并行化文件名列表(hdfs中的文件)得到它.

现在我映射这个rdd,我的代码使用FileSystem.open(path)打开一个hadoop流.然后我处理它.

当我运行任务时,我使用spark UI/Stages,我看到所有任务的"Locality Level"="PROCESS_LOCAL".我不认为spark可能像我运行任务一样(在4个数据节点的集群上)实现数据局部性,这怎么可能?

hadoop hdfs apache-spark

6
推荐指数
2
解决办法
2602
查看次数

apache calcite,不使用jdbc api查询

我想在不使用 jdbc 连接的情况下使用 apache calcite api raw。我可以很好地使用 jdbc api,但是在尝试使用 api 时出现空 ptr 异常。到目前为止我所做的是:

package calcite.examples

import java.util.Properties

import calcite.demo.DemoSchema
import org.apache.calcite.DataContext
import org.apache.calcite.config.CalciteConnectionConfigImpl
import org.apache.calcite.jdbc.CalcitePrepare.Query
import org.apache.calcite.jdbc.{CalcitePrepare, CalciteSchema, JavaTypeFactoryImpl}
import org.apache.calcite.prepare.CalcitePrepareImpl

import scala.collection.JavaConverters._

object TryIt extends App
{
    val ctx = new AdapterContext
    val sql = Query.of[Any]("SELECT * FROM dep")
    //  assert(sql.rel != null)

    val elementType = classOf[Array[Object]]
    val prepared = new CalcitePrepareImpl().prepareSql(ctx, sql, elementType, -1)
    val enumerable = prepared.enumerable(new MyDataContext)
}

class AdapterContext extends CalcitePrepare.Context
{
    private val properties …
Run Code Online (Sandbox Code Playgroud)

apache-calcite

6
推荐指数
1
解决办法
1243
查看次数

cassandra 计数器,原子获取和设置

cassandra 是否支持计数器的原子获取和设置?IE

create table c(id int, value counter, primary key (id));

update update c set value=value+1 where id=1;
Run Code Online (Sandbox Code Playgroud)

好的,现在我的柜台准备好了。但是我希望能够从多个服务器读取一个唯一值,每个服务器都获得一个不同的值,例如

select value from c where id=1; 
Run Code Online (Sandbox Code Playgroud)

atomic cassandra

5
推荐指数
0
解决办法
2762
查看次数

SparkContext.clean java.util.zip.ZipException:LOC标头无效(签名错误)

这个奇怪的例外是终止我的火花任务,任何想法?

我通过sc.parallelize(... seq of 256 items ......)"提交"了很多小任务来激发上下文.(不要问我为什么,但这是我需要的).

Exception in thread "main" java.util.zip.ZipException: invalid LOC header (bad signature)
at java.util.zip.ZipFile.read(Native Method)
at java.util.zip.ZipFile.access$1400(ZipFile.java:56)
at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:679)
at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:415)
at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.spark.util.Utils$.copyStream(Utils.scala:347)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$getClassReader(ClosureCleaner.scala:40)
at org.apache.spark.util.ClosureCleaner$.getInnerClasses(ClosureCleaner.scala:84)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:107)
at org.apache.spark.SparkContext.clean(SparkContext.scala:1623)
at org.apache.spark.rdd.RDD.flatMap(RDD.scala:295)
at com.stratified.pdfingestion.CermineJob$.extractPdfText(CermineJob.scala:53)
at com.stratified.pdfingestion.CermineJob$.execute(CermineJob.scala:41)
at com.stratified.pdfingestion.CermineJob$$anonfun$main$1.apply(CermineJob.scala:31)
at com.stratified.pdfingestion.CermineJob$$anonfun$main$1.apply(CermineJob.scala:29)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at com.stratified.pdfingestion.CermineJob$.main(CermineJob.scala:29)
at com.stratified.pdfingestion.CermineJob.main(CermineJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Run Code Online (Sandbox Code Playgroud)

apache-spark

3
推荐指数
1
解决办法
1861
查看次数

标签 统计

apache-spark ×2

hadoop ×2

sbt ×2

apache-calcite ×1

atomic ×1

cassandra ×1

hadoop2 ×1

hdfs ×1