我想知道是否可以有一个可选的数组.让我们假设一个这样的架构:
{
"type": "record",
"name": "test_avro",
"fields" : [
{"name": "test_field_1", "type": "long"},
{"name": "subrecord", "type": [{
"type": "record",
"name": "subrecord_type",
"fields":[{"name":"field_1", "type":"long"}]
},"null"]
},
{"name": "simple_array",
"type":{
"type": "array",
"items": "string"
}
}
]
}
Run Code Online (Sandbox Code Playgroud)
尝试编写没有"simple_array"的avro记录会导致数据编写器中的NPE.对于subrecord,它很好,但是当我尝试将数组定义为可选时:
{"name": "simple_array",
"type":[{
"type": "array",
"items": "string"
}, "null"]
Run Code Online (Sandbox Code Playgroud)
它不会导致NPE,但会导致运行时异常:
AvroRuntimeException: Not an array schema: [{"type":"array","items":"string"},"null"]
Run Code Online (Sandbox Code Playgroud)
谢谢.
我有一个相当大的数据框,我需要一个好的方法(下面解释)来提取在某一组标签内具有给定字段的最大值的行的索引.为了更好地解释这一点,这里是一个示例10行数据帧:
value label
1 5.531637 D
2 5.826498 A
3 8.866210 A
4 1.387978 C
5 8.128505 C
6 7.391311 B
7 1.829392 A
8 4.373273 D
9 7.380244 A
10 6.157304 D
Run Code Online (Sandbox Code Playgroud)
生成:
structure(list(value = c(5.531637, 5.826498, 8.86621, 1.387978, 8.128505,
7.391311, 1.829392, 4.373273, 7.380244, 6.157304),
label = c("D", "A", "A", "C", "C", "B", "A", "D", "A", "D")),
.Names = c("value", "label"), class = "data.frame", row.names = c(NA, -10L))
Run Code Online (Sandbox Code Playgroud)
如果我想知道每个标签具有最大值的行的索引是什么,我目前使用以下代码:
idx <- sapply(split(1:nrow(d), d$label), function(x) {
x[which.max(d[x,"value"])]
})
Run Code Online (Sandbox Code Playgroud)
生成这个答案:
A …Run Code Online (Sandbox Code Playgroud) 是否有可能在Dummyvariables模型中检查多重共线性?假设以下示例
treatment <- factor(rep(c(1, 2), c(43, 41)), levels = c(1, 2), labels = c("placebo", "treated"))
improved <- factor(rep(c(1, 2, 3, 1, 2, 3), c(29, 7, 7, 13, 7, 21)), levels = c(1, 2, 3), labels = c("none", "some", "marked"))
numberofdrugs <- rpois(84, 5)+1
healthvalue <- rpois(84,5)
y <- data.frame(healthvalue,numberofdrugs, treatment, improved)
test <- lm(healthvalue~numberofdrugs+treatment+improved, y)
Run Code Online (Sandbox Code Playgroud)
当我想检查这种模型中是否存在多重共线性时,我该怎么做?