小编New*_*bie的帖子

如何为Map <String,Integer>定义JSON模式？

我有一个json:

{
"itemType": {"food":22,"electrical":2},
"itemCount":{"NA":211}
}

Run Code Online (Sandbox Code Playgroud)

这里的itemType和itemCount将是常见的,但不是它们内部的值(食物,NA,电子),它们将不断变化,但将采用以下格式:地图

如何为这种通用结构定义Json Schema？

我试过了 :

"itemCount":{
      "type": "object"
    "additionalProperties": {"string", "integer"}

    }

Run Code Online (Sandbox Code Playgroud)

schema json jsonschema geojson json-schema-validator

New*_*bie

2016 11-23

11
推荐指数

2
解决办法

1万
查看次数

CommitDeniedException Spark：如何解决此问题？

我正进入（状态：

org.apache.spark.SparkException: Task failed while writing rows.
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to commit task
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267)
    ... 8 more
Caused by: org.apache.spark.executor.CommitDeniedException: attempt_201611091630_0009_m_000131_1: Not committed because the driver did not authorize commit
    at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135)
    at org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219)
    at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282)
    ... 9 more

Run Code Online (Sandbox Code Playgroud)

在使用write.parquet时。有什么办法可以解决这个问题？我正在使用Spark 1.5。

hadoop commit emr apache-spark parquet

New*_*bie

lucky-day

5
推荐指数

0
解决办法

755
查看次数

如何在scala中执行OUTER JOIN

我有两个数据帧:df1和df2

DF1

|--- id---|---value---|
|    1    |    23     |
|    2    |    23     |
|    3    |    23     |
|    2    |    25     |
|    5    |    25     |

Run Code Online (Sandbox Code Playgroud)

DF2

|-idValue-|---count---|
|    1    |    33     |
|    2    |    23     |
|    3    |    34     |
|    13   |    34     |
|    23   |    34     |

Run Code Online (Sandbox Code Playgroud)

我怎么得到这个？

|--- id--------|---value---|---count---|
|    1         |    23     |    33     |
|    2         |    23     |    23     |
|    3         |    23     |    34     | …

Run Code Online (Sandbox Code Playgroud)

scala join dataframe apache-spark

New*_*bie

lucky-day

3
推荐指数

1
解决办法

1万
查看次数

Scala中的三元运算符

我想简化这个:

var countA: Int = 0
var countB: Int = 0

if (validItem) {
  if (region.equalsIgnoreCase( "US" )) {
    if (itemList > 0) {
      countB = 1
    } else {
      countA = 1
    }
  } else {
    countB = 1
  }
} else {
  countA = 1
}

Run Code Online (Sandbox Code Playgroud)

如何在scala中使用三元运算符.

if-statement scala ternary-operator

New*_*bie

2016 11-17

1
推荐指数

3
解决办法

1268
查看次数