我正在尝试使用R的Brew包编写报告.我首先采用了这个网页上的一些代码http://learnr.wordpress.com/2009/09/09/brew-creating-repetitive-reports/
我可以使用brew为这样简单的东西制作一个PDF-able Tex文件:
documentclass[11pt]{amsart}
\begin{document}
<% library(xtable); library(ggplot2) %>
<% for (i in 1:2) { %>
<%=print(i) %>
<% } -%>
\end{document}
Run Code Online (Sandbox Code Playgroud)
但如果我尝试插入一个简单的cat命令:
documentclass[11pt]{amsart}
\begin{document}
<% library(xtable); library(ggplot2) %>
<% for (i in 1:2) { %>
<%=cat("\section{", i, "}", sep="") %>
<% } -%>
\end{document}
Run Code Online (Sandbox Code Playgroud)
我收到以下错误:
brew("Brew/test_brew3.brew", "Brew/test_brew2.tex")
Error: '\s' is an unrecognized escape in character string starting "\s"
Run Code Online (Sandbox Code Playgroud)
什么可能出错?在上面的帖子中调用\ section命令,所以我想知道它是否与我的R环境有关?
我有一个值的数据框,我想探索异常值的行。我在下面写了一个可以用该groupby().apply()函数调用的函数,它适用于高值或低值,但是当我想将它们组合在一起时,我会产生一个错误。我以某种方式搞乱了布尔OR选择,但我只能找到使用&. 任何建议,将不胜感激。
扎克
df = DataFrame( {'a': [1,1,1,2,2,2,2,2,2,2], 'b': [5,5,6,9,9,9,9,9,9,20] } )
#this works fine
def get_outliers(group):
x = mean(group.b)
y = std(group.b)
top_cutoff = x + 2*y
bottom_cutoff = x - 2*y
cutoffs = group[group.b > top_cutoff]
return cutoffs
#this will trigger an error
def get_all_ outliers(group):
x = mean(group.b)
y = std(group.b)
top_cutoff = x + 2*y
bottom_cutoff = x -2*y
cutoffs = group[(group.b > top_cutoff) or (group.b < top_cutoff)]
return cutoffs …Run Code Online (Sandbox Code Playgroud) 我想让dplyr返回一个字符向量而不是数据帧.是否有捷径可寻?
#example data frame
df <- data.frame( x=c('a','b','c','d','e','f','g','h'),
y=c('a','a','b','b','c','c','d','d'),
z=c('a','a','a','a','a','a','d','d'),
stringsAsFactors = FALSE)
#desired output
unique(df$z)
[1] "a" "d"
#dplys's output
df %>%
select(z) %>%
unique()
z
1 a
7 d
Run Code Online (Sandbox Code Playgroud) 我有两个分类列(A,B)和数字列(C).我想获得A的值,其中C是B定义的组的最大值.我正在寻找一个data.table解决方案.
library(data.table)
dt <- data.table( A = c("a","b","c"),
B = c("d","d","d"),
C = c(1,2,3))
dt
A B C
1: a d 1
2: b d 2
3: c d 3
# I want to find the value of A for the maximum value
# of C when grouped by B
dt[,max(C), by=c("B")]
B V1
1: d 3
#how can I get the A column, value = "c"
Run Code Online (Sandbox Code Playgroud) 我有一个包含大量条目的FASTA文件.尽管所有DNA序列都不同,但一些FASTA名称是相同的.如果有一个名称的多个副本,我想附加一个数字,以便它们成为唯一的名称.例如:
>NAME
ATTTTTGGGGGGTGTGTG
>NAME
ATTTTTTTTCGCGCGC
>NAME
AAACCCTTTGTG
Run Code Online (Sandbox Code Playgroud)
会成为:
>NAME_1
ATTTTTGGGGGGTGTGTG
>NAME_2
ATTTTTTTTCGCGCGC
>NAME_3
AAACCCTTTGTG
Run Code Online (Sandbox Code Playgroud)
谢谢.
更新.因为我计划在R中使用它,所以我将fasta序列导入R并将其作为数据帧df.然后我可以根据需要使用以下行重命名:
library(plyr)
ddply(df, Name_Column, transform, Column = paste(Name_Column,seq_along(Name_Column), sep=""))
Run Code Online (Sandbox Code Playgroud)
代码灵感来自这篇文章
我有一个数据框,我想绘制条形图,但我希望分类的x值按照我用列表指定的特定顺序.我将使用mtcars数据集显示一个示例.
#get a small version of the mtcars dataset and add a named column
mtcars2 <- mtcars
mtcars2[["car"]] <- rownames(mtcars2)
mtcars2 <- mtcars[0:5,]
# I would like to plot this using the following
p = ggplot(mtcars2, aes(x=car, y=mpg))+ geom_bar(stat="identity")
Run Code Online (Sandbox Code Playgroud)
x轴的值按字母顺序排序.但是,如果我有一个汽车列表,我希望ggplot保留订单怎么办:
#list out of alphabetical order
orderlist = c("Hornet 4 Drive", "Mazda RX4 Wag", "Mazda RX4",
"Datsun 710", "Hornet Sportabout")
# I would like to plot the bar graph as above but preserve the plot order
# something like this:
p …Run Code Online (Sandbox Code Playgroud) 我想根据项目和计数生成相同项目的向量.这似乎是一个比循环更容易做的事情.任何使功能更紧凑/精简的想法?
;take an object and a nubmer n and return a vector of those objects that is n-long
(defn return_multiple_items [item number-of-items]
(loop [x 0
items [] ]
(if (= x number-of-items)
items
(recur (+ x 1)
(conj items item)))))
>(return_multiple_items "A" 5 )
>["A" "A" "A" "A" "A"]
>(return_multiple_items {:years 3} 3)
>[{:years 3} {:years 3} {:years 3}]
Run Code Online (Sandbox Code Playgroud) 我在for循环中调用脚本并遇到变量扩展的问题,其中两个变量中的第一个未包含在输出中.(注意:代码改编自这里)
LIST1 := a b c
LIST2 := 1 2 3
all:
@for x in $(LIST1); do \
for y in $(LIST2); do\
echo $$x $$y; \
echo $$x_$$y.txt; \
done \
done
#This will output:
a 1
1.txt
a 2
2.txt ....
#Where I expect
a 1
a_1.txt
a 2
a_2.txt
Run Code Online (Sandbox Code Playgroud)
关于如何解决这个问题的任何想法?
谢谢zach cp
我想将大型数据框子集化并创建每个分组的ggplot.听起来像是dplyr的完美候选者,但我遇到了在group_by结果上调用函数的问题.任何提示将不胜感激.
# what I want to do using base functions: "groupby" the elements in a column
# and create/save a plot for each group
for (i in levels(iris$Species)){
df = iris[iris$Species == i,]
p <- ggplot(df, aes(x=Sepal.Length, y=Sepal.Width) + geom_point())
ggsave(p, filename=paste(i,".pdf",sep=""))
}
# I'm trying to get something like this using dplyr
library(dplyr)
iris %>%
group_by(Species) %>%
do({
p <- ggplot(., aes(x=Sepal.Length, y=Sepal.Width) + geom_point())
ggsave(p, filename=paste(quote(Species),".pdf",sep=""))
})
Run Code Online (Sandbox Code Playgroud) 作为这篇文章的后续内容,我想根据索引连接多个列,但我遇到了一些问题.在这个例子中,我得到一个与map函数相关的Attribute错误.可以理解这个错误的帮助,因为代码会执行等效的列连接.
#data
df = DataFrame({'A':['a','b','c'], 'B':['d','e','f'], 'C':['concat','me','yo'], 'D':['me','too','tambien']})
#row function to concat rows with index greater than 2
def cnc(row):
temp = []
for x in range(2,(len(row))):
if row[x] != None:
temp.append(row[x])
return map(concat, temp)
#apply function per row
new = df.apply(cnc,axis=1)
#Expected Output
new
concat me
me too
yo tambien
Run Code Online (Sandbox Code Playgroud)
谢谢,zach cp