我一直在尝试删除数据框中的空白区域(使用R).数据框很大(> 1gb),并且有多个列,每个数据条目中都包含空格.
有没有快速的方法从整个数据框中删除空白?我一直在尝试使用以下方法对前10行数据的子集执行此操作:
gsub( " ", "", mydata)
Run Code Online (Sandbox Code Playgroud)
这似乎不起作用,虽然R返回了我无法解释的输出.
str_replace( " ", "", mydata)
Run Code Online (Sandbox Code Playgroud)
R返回了47个警告并且没有移除空白区域.
erase_all(mydata, " ")
Run Code Online (Sandbox Code Playgroud)
R返回错误,说"错误:找不到功能"erase_all"'
我真的很感激一些帮助,因为我花了最后24小时试图解决这个问题.
谢谢!
我有以下数据集:
observation <- c(1:10)
pop.d.rank <- c(1:10)
cost.1 <- c(101:110)
cost.2 <- c(102:111)
cost.3 <- c(103:112)
all <- data.frame(observation,pop.d.rank,cost.1, cost.2, cost.3)
Run Code Online (Sandbox Code Playgroud)
我想在三年内分配以下金额:
annual.investment <- 500
Run Code Online (Sandbox Code Playgroud)
我可以使用以下脚本在第一年执行此操作:
library(dplyr)
all <- all %>%
mutate(capital_allocated.5G = diff(c(0, pmin(cumsum(cost), annual.investment)))) %>%
mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%
mutate(year = ifelse(capital_percentage.5G >= 50, "Year.1",0))
Run Code Online (Sandbox Code Playgroud)
但是当我第二年尝试这样做时,考虑到前一年的投资,代码不起作用.这是我尝试在mutate循环中放置一个ifelse语句,以便它不会覆盖前一年分配的钱:
all <- all %>%
mutate(capital_allocated.5G = ifelse(year == 0, diff(c(0, pmin(cumsum(cost), annual.investment))), 0) %>%
mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%
mutate(year = ifelse(capital_percentage.5G >= 50, "Year.2",0)) …
Run Code Online (Sandbox Code Playgroud) 我已经阅读了很多这方面的内容,但我还没有得到有效的答案。
我一直在使用setdiff
R 中的函数来查看两个数据帧之间的匹配数。我知道 200 个观察结果中有 71 个匹配,其余的不匹配。
到目前为止,我刚刚这样做是为了获取匹配和不匹配值的数量:
check = setdiff(dataset1$variable1, dataset2$variable1)
Run Code Online (Sandbox Code Playgroud)
如何返回匹配和不匹配值的列表?
谢谢,
埃德
我一直在研究一个三级 RStan 模型,其中重复的宽带测量(年份 ID = yrid)嵌套在地方当局(LA ID = 铺设)内,最终嵌套在区域内(区域 ID = rnid)。(记录的)因变量是速度,(记录的)预测变量是人口密度 (pd) 和超高速宽带渗透率 (sfbb)。目前在地方当局和区域级别(2 级和 3 级)有随机拦截。
如何扩展模型以在 1 级或 2 级具有随机斜率?
这是数据的子集、RStan 模型和整个 R 代码。任何帮助将非常感激。
library(rstan)
###Data
yrid = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,
21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,
61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,
81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,
101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,
121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,
141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,
161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,
181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199)
laid <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,
6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9,10,10,10,10,
11,11,11,11,12,12,12,12,13,13,13,13,14,14,14,14,15,15,15,15,
16,16,16,16,17,17,17,17,18,18,18,18,19,19,19,19,20,20,20,20,
21,21,21,21,22,22,22,22,23,23,23,23,24,24,24,24,25,25,25,25,
26,26,26,26,27,27,27,27,28,28,28,28,29,29,29,29,30,30,30,30,
31,31,31,31,32,32,32,32,33,33,33,33,34,34,34,34,35,35,35,35,
36,36,36,36,37,37,37,37,38,38,38,38,39,39,39,39,40,40,40,40,
41,41,41,41,42,42,42,42,43,43,43,43,44,44,44,44,45,45,45,45,
46,46,46,46,47,47,47,47,48,48,48,48,49,49,49,49,50,50,50)
rnid <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,
3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,
4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,
4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,
5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
6,6,6,6,6,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8)
pd <- c(7.59262,7.59875,7.6027,7.60375,7.5301,7.53444,7.53604,7.54136,8.378,8.3936,
8.40061,8.41183,7.36682,7.36992,7.37607,7.38268,7.20065,7.2011,7.20162,7.20578,
7.78846,7.79947,7.80743,7.81992,7.71797,7.72011,7.72396,7.73026,7.66336,7.66561,
7.66744,7.66833,7.66973,7.67587,7.68327,7.69321,7.4326,7.43449,7.43762,7.44167,
7.43053,7.43053,7.43189,7.43396,8.33459,8.34315,8.34548,8.35036,7.15921,7.16325,
7.16379,7.16943,7.4898,7.48869,7.48689,7.48796,7.61918,7.62046,7.62075,7.62261,
6.55763,6.56541,6.57438,6.58286,6.27777,6.27833,6.28133,6.28339,6.80184,6.8045,
6.80572,6.81113,7.31315,7.32324,7.32804,7.33446,7.24893,7.24843,7.24744,7.24993,
7.80751,7.81927,7.83475,7.84514,7.80045,7.80147,7.80543,7.80792,7.74119,7.74253,
7.74323,7.74457,7.6027,7.6042,7.60564,7.60852,8.29695,8.30721,8.31356,8.32186,
8.07527,8.09465,8.11516,8.13795,8.06994,8.07091,8.07347,8.07788,8.19141,8.19883,
8.20841,8.21603,7.05652,7.05893,7.06613,7.07089,7.85991,7.86511,7.8699,7.87721,
8.18894,8.19332,8.19572,8.20125,7.26382,7.26669,7.2701,7.27351,6.32972,6.33505, …
Run Code Online (Sandbox Code Playgroud) 当两个列表的长度不同(使用 Python 3.6)时,我想在一个键上合并两个字典列表。例如,如果我们有一个名为 的字典列表l1
:
l1 = [{'pcd_sector': 'ABDC', 'coverage_2014': '100'},
{'pcd_sector': 'DEFG', 'coverage_2014': '0'}]
Run Code Online (Sandbox Code Playgroud)
和另一个名为 dicts 的列表l2
:
l2 = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
{'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
{'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
{'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
{'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]
Run Code Online (Sandbox Code Playgroud)
如何将它们合并使用pcd_sector
以获得此(?):
result = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'},
{'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd', 'coverage_2014': '100'},
{'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je', …
Run Code Online (Sandbox Code Playgroud) 我有一个包含速度测量的列,我需要将其更改为数字,以便我可以使用均值和求和函数.但是,当我转换它们时,值会发生很大变化.
为什么是这样?
这是我的数据最初的样子:
这是数据框的结构:
'data.frame': 1899571 obs. of 20 variables:
$ pcd : Factor w/ 1736958 levels "AB101AA","AB101AB",..: 1 2 3 4 5 6 6 7 7 8
$ pcdstatus : Factor w/ 5 levels "Insufficient Data",..: 4 4 4 4 4 2 3 2 3 3 ...
$ mbps2 : Factor w/ 3 levels "N","N/A","Y": 2 2 2 2 2 2 2 2 2 2 ...
$ averagesp : Factor w/ 301 levels ">=30","0","0.2",..: 301 301 301 301 301 …
Run Code Online (Sandbox Code Playgroud) 我有一个看起来像这样的data.frame:
Geotype <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3)
Strategy <- c("Demand", "Strategy 1", "Strategy 2", "Strategy 3", "Strategy 4", "Strategy 5", "Strategy 6")
Year.1 <- c(1:21)
Year.2 <- c(1:21)
Year.3 <- c(1:21)
Year.4 <- c(1:21)
mydata <- data.frame(Geotype,Strategy,Year.1, Year.2, Year.3, Year.4)
Run Code Online (Sandbox Code Playgroud)
我想总结每一年的每项战略.
这意味着我需要在数据框中的每一列下面加6行,然后跳过Demand行.然后我想对所有专栏(40年)重复这一点.
我希望输出数据框看起来像这样:
Geotype.output <- c(1, 2, 3)
Year.1.output <- c(27, 69, 111)
Year.2.output <- c(27, 69, 111)
Year.3.output <- c(27, 69, 111) …
Run Code Online (Sandbox Code Playgroud) 我有这个:
A B C
1 4 string1
2 11 string2
1 13 string3
2 43 string4
Run Code Online (Sandbox Code Playgroud)
并且,我想同时按A和B排序,以获得:
A B C
1 4 string1
1 13 string3
2 11 string2
2 43 string4
Run Code Online (Sandbox Code Playgroud)
使用以下内容没有进行排序
data = data.sort_values(by=['A','B'], ascending=[True,True])
Run Code Online (Sandbox Code Playgroud) 我遇到了使用 ggplot2 错误排序数据标签的问题。
不幸的是,关于这个主题的其他 SE Q&A 并不是很有见地(示例),所以我不得不与 reprex 联系。我有以下数据:
df = as.data.frame(structure(list(geotype = c('urban','urban','urban','urban','suburban','suburban','suburban','suburban'),
limitations = c('all','some','all','some','all','some','all','some'),
metric = c('lte','lte','5g','5g','lte','lte','5g','5g'),
capacity=c(12,11,5,4,14,10,5,3))))
Run Code Online (Sandbox Code Playgroud)
如果我然后尝试使用此代码绘制此数据:
ggplot(df, aes(x = geotype, y = capacity, fill=metric)) + geom_bar(stat="identity") +
facet_grid(~limitations) +
geom_text(data = df, aes(geotype, capacity + 2, label=capacity), size = 3)
Run Code Online (Sandbox Code Playgroud)
我得到这个不正确的标签顺序:
我已经使用了年龄变量的排序(例如 rev(capacity)),但我无法解决这个问题。任何人都可以为整个 SE 社区提供关于如何处理标签订购的更全面的答案吗?
我有以下 csv 文件,我只想选择每个字符串中具有匹配 'pop' 和 'throughput' 值的文件:
example_pop_high_throughput_high_strategy.csv
example_pop_high_throughput_base_strategy.csv
example_pop_high_throughput_low_strategy.csv
example_pop_base_throughput_high_strategy.csv
example_pop_base_throughput_base_strategy.csv
example_pop_base_throughput_low_strategy.csv
example_pop_low_throughput_high_strategy.csv
example_pop_low_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv
Run Code Online (Sandbox Code Playgroud)
我只想要这些:
example_pop_high_throughput_high_strategy.csv
example_pop_base_throughput_base_strategy.csv
example_pop_low_throughput_low_strategy.csv
Run Code Online (Sandbox Code Playgroud)
我可以使用 list.files 来选择所有文件,例如,'high':
file_names <- list.files("made/up/path", pattern = c("high"))
Run Code Online (Sandbox Code Playgroud)
虽然,尝试这样做两次以匹配“高”和“高”,但没有奏效:
file_names <- list.files("made/up/path", pattern = c("high", "high"))
Run Code Online (Sandbox Code Playgroud)
有没有办法选择具有匹配“pop”和“吞吐量”值的文件,最好是在单个表达式中?