我使用mongogem 使用MongoDB和Ruby .
我有以下场景:
coll1,看看key1和key2coll2为其匹配值key1key2key3的值设置为key3#1中引用的文档中的值coll3MongoDB的一般准则是处理应用程序代码中的交叉收集操作.
所以我做了以下事情:
client = Mongo::Client.new([ '127.0.0.1:27017' ], :database => some_db,
:server_selection_timeout => 5)
cursor = client[:coll1].find({}, { :projection => {:_id => 0} }) # exclude _id
cursor.each do |doc|
doc_coll2 = client[:coll2].find('$and' => [{:key1 => doc[:key1]}, {:key2 => doc[:key2] }]).limit(1).first # no find_one method
if(doc_coll2 && doc[:key3])
doc_coll2[:key3] = doc[:key3]
doc_coll2.delete(:_id) # remove key :_id
client[:coll3].insert_one(doc_coll2) …Run Code Online (Sandbox Code Playgroud) 使用df下面的数据框
df <- data.frame(
model=c(rep('corolla',3),rep( 'accord',3), rep('sunny',3)),
variable=c('urban_mileage', 'rural_mileage', 'highway_mileage'),
rescale=rnorm(9),
year=c(rep(1998,3),rep( 1997,3), rep(2003,3)),
kmdone=sample(10:100,9)*1e3
)
> df
model variable rescale year kmdone
1 corolla urban_mileage -1.03675182 1998 56000
2 corolla rural_mileage 1.06079162 1998 83000
3 corolla highway_mileage -0.18808551 1998 19000
4 accord urban_mileage -0.05151496 1997 69000
5 accord rural_mileage 0.05219512 1997 54000
6 accord highway_mileage -2.03139240 1997 21000
7 sunny urban_mileage -0.06225862 2003 40000
8 sunny rural_mileage 1.38191440 2003 96000
9 sunny highway_mileage -1.02367124 2003 55000
> …Run Code Online (Sandbox Code Playgroud) 我有一个如下数据框
+--------+-----------+-----+
| make | model | cnt |
+--------+-----------+-----+
| toyota | camry | 10 |
| toyota | corolla | 4 |
| honda | city | 8 |
| honda | accord | 13 |
| jeep | compass | 3 |
| jeep | wrangler | 5 |
| jeep | renegade | 1 |
| accura | x1 | 2 |
| accura | x3 | 1 |
+--------+-----------+-----+
Run Code Online (Sandbox Code Playgroud)
我需要创建一个馅饼(是的,真的)每个品牌的百分比份额.
我现在做以下事情.
library(ggplot2)
library(dplyr)
df <- …Run Code Online (Sandbox Code Playgroud) 给定如下数据框
colVals = [['05:17:55.703', '', '', '', '', '', '21', '', '3', '89', '891', '11', ''], ['05:17:55.703', '', '', '', '', '', '21', '', '3', '217', '891', '12', ''], ['05:17:55.703', '', '', '', '', '', '21', '', '3', '217', '891', '13', ''], ['05:17:55.703', '', '', '', '', '', '21', '', '3', '217', '891', '15', ''], ['05:17:55.703', '', '', '', '', '', '21', '', '3', '217', '891', '16', ''], ['05:17:55.703', '', '', '', '', '', '21', '', '3', '217', '891', '17', …Run Code Online (Sandbox Code Playgroud) Ruby 2.0
为什么下面的代码会给出意外的返回(LocalJumpError)?
# some code here
puts "Scanning for xml files .."
zip_files = Dir.entries(directory).select { |f| File.extname(f) == '.zip' }
if(zip_files.count == 0)
puts "No files found, exiting..."
return
end
# more code here ( if files found)
Run Code Online (Sandbox Code Playgroud)
Error: unexpected return (LocalJumpError)
No files found, exiting...
[Finished in 0.9s with exit code 1]
Run Code Online (Sandbox Code Playgroud) 如下表所示。
make | model | engine | cars_checked | avg_mileage
---------------------------------------|--------
suzuki | sx4 | petrol | 11 | 12
suzuki | sx4 | diesel | 150 | 16
suzuki | swift | petrol | 140 | 15
suzuki | swift | diesel | 18 | 19
toyota | prius | petrol | 16 | 17
toyota | prius | hybrid | 250 | 24
Run Code Online (Sandbox Code Playgroud)
所需的输出是
无法做到简单,group by因为cars_checked需要考虑每条记录()的样本数权重,以避免平均值平均值的问题。
什么是实现它的正确方法?有没有办法考虑样本数量以进行加权平均group by?
更新 …
我正在使用数据框并使用 ggplot 生成饼图。
df <- data.frame(Make=c('toyota','toyota','honda','honda','jeep','jeep','jeep','accura','accura'),
Model=c('camry','corolla','city','accord','compass', 'wrangler','renegade','x1', 'x3'),
Cnt=c(10, 4, 8, 13, 3, 5, 1, 2, 1))
row_threshold = 2
dfc <- df %>%
group_by(Make) %>%
summarise(volume = sum(Cnt)) %>%
mutate(share=volume/sum(volume)*100.0) %>%
arrange(desc(volume))
dfc$Make <- factor(dfc$Make, levels = rev(as.character(dfc$Make)))
pie <- ggplot(dfc[1:10, ], aes("", share, fill = Make)) +
geom_bar(width = 1, size = 1, color = "white", stat = "identity") +
coord_polar("y") +
geom_text(aes(label = paste0(round(share), "%")),
position = position_stack(vjust = 0.5)) +
labs(x = NULL, y = …Run Code Online (Sandbox Code Playgroud) 我有一个如下所示的数据框
> data = data.frame(name = c('Mike', 'Tony', 'Carol', 'Tim', 'Joe'), veh = c('car', 'bike', 'car', 'car', 'cycle') )
> data
name veh
1 Mike car
2 Tony bike
3 Carol car
4 Tim car
5 Joe cycle
> str(data$name)
Factor w/ 5 levels "Carol","Joe",..: 3 5 1 4 2
> str(data$veh)
Factor w/ 3 levels "bike","car","cycle": 2 1 2 2 3
> levels(data$veh)
[1] "bike" "car" "cycle"
Run Code Online (Sandbox Code Playgroud)
默认情况下,自行车的因子级别设置为 1,汽车的因子级别设置为 2,自行车的因子级别设置为 3。我需要将汽车的因子级别更改为 1,自行车的因子级别为 2,自行车的因子级别为 3 - 我该如何处理?
我在 Windows 10 上,RStudio 1.0.136,R 3.3.1
我正在创建一个 Rmd 文件,并希望将 Rmd 文件的位置设置为工作目录。
我有一个如下数据框:
text <- "
brand a b c d e f
nissan 99.21 99.78 6496 1.28 216 0.63
toyota 99.03 99.78 7652 1.39 205 0.60
"
df <- read.table(textConnection(text), sep="\t", header = T)
Run Code Online (Sandbox Code Playgroud)
我试图将两组的所有变量绘制在一个ggplot中,使用face_wrap如下:
library(reshape2)
library(ggplot2)
library(ggthemes)
library(RColorBrewer)
ggplot(melt(df, id = "brand")) +
aes(brand, value, fill = brand) +
geom_bar(stat = "identity", position='dodge') +
geom_text(data=melt(df, id = "brand"), angle = 0,
aes(brand, value,
label = ifelse(value > 100, round(value, 0), value) ) ) +
facet_wrap(~ variable, scales = …Run Code Online (Sandbox Code Playgroud)