我正在尝试将我的数据文件(其中有十几个)放入SQLite中的表中.每个文件都有一个标题,我将在接下来的几年里收到它们几次,所以我想:
我定义我的表并导入数据......
> .separator "\t"
> .headers on
> CREATE TABLE clinical(
patid VARCHAR(20),
eventdate CHAR(10),
sysdate CHAR(10),
constype INT,
consid INT,
medcode INT,
staffid VARCHAR(20),
textid INT,
episode INT,
enttype INT,
adid INT);
> .import "Sample_Clinical001.txt" clinical
> SELECT * FROM clinical LIMIT 10;
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
patid eventdate sysdate constype consid medcode staffid textid episode enttype adid
471001 30/01/1997 09/03/1997 4 68093 180 0 0 0 20 11484 …Run Code Online (Sandbox Code Playgroud) 尝试编写一个相对简单的包装器来生成一些图,但是无法弄清楚如何指定整理评估的分组变量,这些变量被指定为...一个面向变量但不通过分组区分的示例函数...
my_plot <- function(df = starwars,
select = c(height, mass),
...){
results <- list()
## Tidyeval arguments
quo_select <- enquo(select)
quo_group <- quos(...)
## Filter, reshape and plot
results$df <- df %>%
dplyr::filter(!is.na(!!!quo_group)) %>%
dplyr::select(!!quo_select, !!!quo_group) %>%
gather(key = variable, value = value, !!!quo_select) %>%
## Specify what to plot
ggplot(aes(value)) +
geom_histogram(stat = 'count') +
facet_wrap(~variable, scales = 'free', strip.position = 'bottom')
return(results)
}
## Plot height and mass as facets but colour histograms by hair_color
my_plot(df …Run Code Online (Sandbox Code Playgroud) 我正在设置一台运行Gentoo的新笔记本电脑并希望安装R(就像我在所有电脑上一样!).
但是,在安装软件包时遇到了一些问题.
我首先尝试:
> install.packages(c("ggplot2", "plyr", "reshape2"))
Run Code Online (Sandbox Code Playgroud)
它适当地下载了所有的包及其依赖项.但他们没有安装报告.
Error in library(data.table) : there is no package called ‘data.table’
Calls: .First -> library
Execution halted
Error in library(data.table) : there is no package called ‘data.table’
Calls: .First -> library
Execution halted
Error in library(data.table) : there is no package called ‘data.table’
Calls: .First -> library
Execution halted
Error in library(data.table) : there is no package called ‘data.table’
Calls: .First -> library
Execution halted
Error in library(data.table) : there is no package …Run Code Online (Sandbox Code Playgroud) 我想降价文件转换为使用HTML狮身人面像,但我遇到了麻烦[links](another.md)被翻译成<a href="another.html">links</a>,而目标的延伸仍保持原有的.md并显示为<a href="another.md">links</a>。
我创建了一个简单的例子......
测试文件
[Test link](https://www.stackoverflow.com)
[Another Markdown doc](another.md)
Run Code Online (Sandbox Code Playgroud)
另一个.md
# Another test markdown
Run Code Online (Sandbox Code Playgroud)
这两个文件都驻留在顶级目录中,我运行sphinx-quickstartcreate conf.py,接受默认值。然后我修改conf.py为...
from recommonmark.parser import CommonMarkParser
extensions = [
'sphinx.ext.autodoc',
]
source_suffix = ['.rst', '.md']
source_parsers = {
'.md': CommonMarkParser,
}
Run Code Online (Sandbox Code Playgroud)
生成的 html 文件已生成,但从test.htmlto的链接another.html不正确并显示为...
测试.html
...
<p><a class="reference external" href="https://thefloow.com">Test link</a></p>
<p><a class="reference external" href="another.md">A real test</a></p>
...
Run Code Online (Sandbox Code Playgroud)
...并指向another.md而不是another.html。几天前我问过,并被指出使用 recommonmark 的AutoStructify(请参阅此处的 …
我试图在函数中包含一些dplyr魔法来生成一个data.frame然后用xtable打印.
最终目标是让这个工作的dplyr版本,并阅读我遇到了非常有用的summarise_each()功能,在使用regroup()(因为这是在一个函数内)子集后,我可以用来解析所有列.
我遇到的问题(到目前为止)是is.na()在summarise_each(funs(is.na))我被告知的内部呼叫Error: expecting a single value.
我故意不发布我的功能,但是下面是一个最小的例子(注意 - 这group_by()在我的函数中使用,我用它替换它regroup())...
library(dplyr)
library(magrittr)
> t <- data.frame(grp = rbinom(10, 1, 0.5),
a = as.factor(round(rnorm(10))),
b = rnorm(10),
c = rnorm(10))
t %>%
group_by(grp) %>% ## This is replaced with regroup() in my function
summarise_each(funs(is.na))
Error: expecting a single value
Run Code Online (Sandbox Code Playgroud)
运行这个失败,它的调用is.na()是问题,因为如果我改为计算出每个中的观察数量(需要得出丢失的比例),它的工作原理......
> t %>%
group_by(grp) %>% ## This is replaced with regroup() …Run Code Online (Sandbox Code Playgroud) 我正在利用ggplot2中最近添加的辅助轴标签功能.我想只旋转辅助轴,但无法找到文档或计算出如何执行此操作.
它足够简单,可以使用...旋转所有文本
ggplot(mtcars, aes(x = wt, y = mpg, colour = mpg)) +
geom_point() +
scale_x_continuous(name = 'Bottom Axis',
sec.axis = sec_axis(trans = ~ .,
name = 'Top Axis',
breaks = c(2:5),
labels = c('Two Two', 'Three Three Three', 'Four Four Four Four', 'Five Five Five Five Five'))) +
## Rotate text of x-axis
theme(axis.text.x = element_text(angle = 90))
Run Code Online (Sandbox Code Playgroud)
在我读过的任何文档中都没有提到它(例如scale_continuous和themes)如何实现只有一个轴的旋转.
我要求这样做的动机是,我希望应用于我的数据的一些标签是长的并且在水平放置时重叠,通过旋转它我可以避免这种情况,但我希望在底部轴上保持水平方向.
我正在尝试使用库中na.approx()的zoo函数(与之结合xts)来为具有多个测量的多个个体的重复测量数据插入缺失值.
样本数据...
event.date <- c("2010-05-25", "2010-09-10", "2011-05-13", "2012-03-28", "2013-03-07",
"2014-02-13", "2010-06-11", "2010-09-10", "2011-05-13", "2012-03-28",
"2013-03-07", "2014-02-13")
variable <- c("neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd", "neck.bmd",
"wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd", "wbody.bmd")
value <- c(0.7490, 0.7615, 0.7900, 0.7730, NA, 0.7420, 1.0520, 1.0665, 1.0760,
1.0870, NA, 1.0550)
## Bind into a data frame
df <- data.frame(event.date, variable, value)
rm(event.date, variable, value)
## Convert date
df$event.date <- as.Date(df$event.date)
## Load libraries
library(magrittr)
library(xts)
library(zoo)
Run Code Online (Sandbox Code Playgroud)
我可以使用xts()和为一个给定的人插入一个缺失数据点的单个结果na.approx() …
三天前,我可以在三个不同的系统上安装包没有任何问题.R在所有三个系统上重建(Gentoo强制重建),现在我无法从CRAN上下载和安装它们中的任何一个....
> install.packages('rmarkdown')
Warning: unable to access index for repository https://r-forge.r-project.org/src/contrib:
internet routines cannot be loaded
Warning: unable to access index for repository https://cran.rstudio.com/src/contrib:
internet routines cannot be loaded
Warning: unable to access index for repository https://cran.uk.r-project.org/src/contrib:
internet routines cannot be loaded
Run Code Online (Sandbox Code Playgroud)
这是我定义的三个存储库.Rprofile......
## Set CRAN mirrors
local({r <- getOption("repos"); r["CRAN"] <- "https://cran.uk.r-project.org"; options(repos=r)})
options(repos=c(RStudio='https://rstudio.org/_packages', getOption('repos')))
options(repos=c(RStudio='https://cran.rstudio.com/', getOption('repos')))
Run Code Online (Sandbox Code Playgroud)
并获取完整信息..
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C
[3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 …Run Code Online (Sandbox Code Playgroud) 试图了解dplyr使用的非标准评估,但没有成功.我想要一个简短的函数,它返回一组指定变量的汇总统计数据(N,mean,sd,median,IQR,min,max).
我的功能的简化版本......
my_summarise <- function(df = temp,
to.sum = 'eg1',
...){
## Summarise
results <- summarise_(df,
n = ~n(),
mean = mean(~to.sum, na.rm = TRUE))
return(results)
}
Run Code Online (Sandbox Code Playgroud)
并使用一些虚拟数据运行它...
set.seed(43290)
temp <- cbind(rnorm(n = 100, mean = 2, sd = 4),
rnorm(n = 100, mean = 3, sd = 6)) %>% as.data.frame()
names(temp) <- c('eg1', 'eg2')
mean(temp$eg1)
[1] 1.881721
mean(temp$eg2)
[1] 3.575819
my_summarise(df = temp, to.sum = 'eg1')
n mean
1 100 NA
Run Code Online (Sandbox Code Playgroud)
计算N,但均值不计算,无法弄清楚原因.
最终,我希望我的功能更加通用,沿着...的路线.
my_summarise <- function(df = …Run Code Online (Sandbox Code Playgroud) 我正在尝试summarise()通过任意组编写一个简单的包装到任意变量的包装,并且现在已经取得了进展我已经加载了正确的库版本,但我很困惑(再次)关于如何使用多个值取消引用参数.
我目前有以下功能......
table_summary <- function(df = .,
id = individual_id,
select = c(),
group = site,
...){
## Quote all arguments (see http://dplyr.tidyverse.org/articles/programming.html)
quo_id <- enquo(id)
quo_select <- enquo(select)
quo_group <- enquo(group)
## Subset the data
df <- df %>%
dplyr::select(!!quo_id, !!quo_select, !!quo_group) %>%
unique()
## gather() data, just in case there is > 1 variable selected to be summarised
df <- df %>%
gather(key = variable, value = value, !!quo_select)
## Summarise selected …Run Code Online (Sandbox Code Playgroud) I've a repository for my dotfiles and went to push changes from a branch only to encounter...
$ git push
Enumerating objects: 46, done.
Counting objects: 100% (46/46), done.
Writing objects: 100% (46/46), 3.20 MiB | 1.52 MiB/s, done.
Total 46 (delta 0), reused 0 (delta 0)
To gitlab.com:auser/dotfiles.git
! [remote rejected] kimura -> origin/kimura (deny updating a hidden ref)
! [remote rejected] master -> origin/master (deny updating a hidden ref)
error: failed to push some refs to 'git@gitlab.com:auser/dotfiles.git'
Run Code Online (Sandbox Code Playgroud)
我试图了解嵌套列表理解并阅读了这里的优秀解释.
我正在翻译的问题是if我的内循环中有一个子句,我无法看到如何将这个应用于该func()步骤,因为enumerate()当我从嵌套循环到列表理解时我失去了计数器.
nested_list = [[{'a': 1, 'b': 2}, {'c': 3, 'd': 4}], [{'a': 5, 'b': 6}, {'c': 7, 'd': 8}]]
new_list = []
for c, x in enumerate(nested_list):
for d, y in enumerate(x):
if d == 1:
new_list.append(y)
print(new_list)
[{'c': 3, 'd': 4}, {'c': 7, 'd': 8}]
Run Code Online (Sandbox Code Playgroud)
嵌套列表理解可能看起来像
new_list = [if ??? y
for x in nested_list
for y in x]
Run Code Online (Sandbox Code Playgroud)
...但是我无法看到/想到如何获得该子句,因为我在嵌套列表理解下没有计数器.
有没有办法实现这一点,还是应该坚持嵌套循环方法?
我试图在我的数据集的子集中获取摘要统计信息,然后自然转向plyr包,因为我正在使用数据框ddply().我不明白为什么这不起作用....
t <- as.data.frame(cbind(1, seq(1:20)))
t2 <- as.data.frame(cbind(2, seq(21:40)))
t <- rbind(t, t2)
rm(t2)
is.data.frame(t)
[1] TRUE
ddply(t, .(V1), function(x) c(missing = sum(is.na(t$V2)),
n = sum(!is.na(t$V2)),
mean = mean(t$V2, na.rm = TRUE),
sd = sd(t$V2, na.rm = TRUE)))
V1 missing n mean sd
1 1 0 40 10.5 5.83974
2 2 0 40 10.5 5.83974
Run Code Online (Sandbox Code Playgroud)
我已经阅读了一些像这样的快速概述和Stackoverflow上的一些线程搜索并发现了类似的问题,并认为我做对了,但显然没有.我会非常感激地了解我做错了什么或误解了什么.
提前致谢,
slackline