我有以下数据:
head(df2,20)
LAF_Date variable value pos
1 2001-04-03 Net -1470 FALSE
2 2001-04-04 Net -17675 FALSE
3 2001-04-09 Net -13820 FALSE
4 2001-04-10 Net -16885 FALSE
5 2001-04-11 Net -18160 FALSE
6 2001-04-12 Net -19170 FALSE
7 2001-04-16 Net -13715 FALSE
8 2001-04-17 Net -17265 FALSE
9 2001-04-18 Net -11115 FALSE
10 2001-04-19 Net -600 FALSE
11 2001-04-20 Net -11375 FALSE
12 2001-04-23 Net -8200 FALSE
13 2001-04-24 Net -5600 FALSE
14 2001-04-25 Net -5300 FALSE
15 2001-04-26 Net 0 …Run Code Online (Sandbox Code Playgroud) 我创建了一个自定义函数来绘制回归诊断,就像这些版本的ggplot2和gridextra一样:
ggplot2 * 1.0.1 2015-03-17 CRAN (R 3.2.1)
gridExtra * 2.0.0 2015-07-14 CRAN (R 3.2.1)
head(dadHospital)
SL. BODY.WEIGHT TOTAL.COST.TO.HOSPITAL
## 1 1 49 660293
## 2 2 41 809130
## 3 3 47 362231
## 4 4 80 629990
## 5 5 58 444876
## 6 6 45 372357
fit1<-lm(TOTAL.COST.TO.HOSPITAL~BODY.WEIGHT,data=dadHospital)
#custom function of plotting model diagnostics using ggplot2
library(ggplot2)
diagPlot<-function(model){
p1<-ggplot(model, aes(.fitted, .resid))+geom_point()
p1<-p1+stat_smooth(method="loess")+geom_hline(yintercept=0, col="red", linetype="dashed")
p1<-p1+xlab("Fitted values")+ylab("Residuals")
p1<-p1+ggtitle("Residual vs Fitted Plot")+theme_bw()
p2<-ggplot(model, aes(qqnorm(.stdresid)[[1]], .stdresid))+geom_point(na.rm = TRUE)
p2<-p2+geom_abline(aes(qqline(.stdresid)))+xlab("Theoretical Quantiles")+ylab("Standardized Residuals") …Run Code Online (Sandbox Code Playgroud) 我有以下数据:
df
rowname repo
1 revrepo 0.888
2 bankrate 0.402
3 CRR 0.250
4 Callrate 0.723
5 WPI 0.049
6 GDP -0.318
7 FED 0.110
8 width 0.209
9 nse 0.059
10 usd 0.185
Run Code Online (Sandbox Code Playgroud)
我正在绘制小节,如下所示:
df %>% mutate(rowname = factor(rowname, levels = rowname[order(repo)])) %>%
ggplot(aes(x = rowname, y = repo)) +
geom_bar(stat = "identity") +
ylab("Correlation with repo") +
xlab("Independent Variable")
Run Code Online (Sandbox Code Playgroud)
我想将负极条涂成红色,将所有正极条涂成灰色。
我正在尝试一个小巧的应用程序,其中我从本地目录加载CSV文件,然后从数据框中选择特定列,并使用此子集化数据框进行进一步的数据分析.
library(shiny)
# Define UI for application that draws a histogram
ui <- fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
fileInput("dataset", "Choose CSV File",
multiple = TRUE,
accept = c("text/csv",
"text/comma-separated-values,text/plain",
".csv")),
# Include clarifying text ----
#helpText(em("Note: This app requires file in csv format only!!")),
helpText(em("Note:Select all the inputs and click on button as given below to exectute the app")),
# Input: Checkbox if file has …Run Code Online (Sandbox Code Playgroud) 我有以下 df:
head(vardata)
Month repo Callrate WPI GDP FED nse usd
1 2001-04-01 9.00 7.49 5.41 4.6 4.50 1125.2 46.79
2 2001-05-01 8.75 8.03 5.60 4.6 4.00 1167.9 46.92
3 2001-06-01 8.50 7.24 5.30 4.6 3.75 1107.9 47.00
4 2001-07-01 8.50 7.19 5.23 5.3 3.75 1072.8 47.14
5 2001-08-01 8.50 6.94 5.41 5.3 3.50 1053.8 47.13
6 2001-09-01 8.50 7.30 4.52 5.3 3.00 913.9 47.65
Run Code Online (Sandbox Code Playgroud)
我想使用以下规则集对所有 7 个变量进行 Box.test、adf.test 和 kpss.test:
假设我将显着性水平设置为 5%。那么规则是:
1) 对于 Box.test,如果 p 值 < 0.05 …
我有一个数据框:
df = pd.DataFrame(
{'title':['a1','a2','a3','a4','a5'],
'genre_name':[
['family', 'animation'],
['action', 'family', 'comedy'],
['family', 'comedy'],
['horror','action'],
['family', 'animation','comedy']]}
)
df
title genre_name
0 a1 ['family', 'animation']
1 a2 ['action', 'family', 'comedy']
2 a3 ['family', 'comedy']
3 a4 ['horror','action]
4 a5 ['family', 'animation','comedy']
Run Code Online (Sandbox Code Playgroud)
我有字典:
dict={'1':'family','2':'animation','3':'action','4':'comedy','5':'horror'}
Run Code Online (Sandbox Code Playgroud)
我想创建一个名为“genre_ids”的新列,它将所有的流派名称映射到字典“dict”中的键。
所需的 df 是:
df
title genre_name genre_ids
0 a1 ['family', 'animation'] [1,2]
1 a2 ['action', 'family', 'comedy'] [3,1,4]
2 a3 ['family', 'comedy'] [1,4]
3 a4 ['horror','action] [5,3]
4 a5 ['family', 'animation','comedy'] [1,2,4]
Run Code Online (Sandbox Code Playgroud)
我怎样才能做到这一点?
我有以下 df:
df<-data.frame(geo_num=c(11,12,22,41,42,43,77,71),
cust_id=c("A","A","B","C","C","C","D","D"),
sales=c(2,3,2,1,2,4,6,3))
> df
geo_num cust_id sales
1 11 A 2
2 12 A 3
3 22 B 2
4 41 C 1
5 42 C 2
6 43 C 4
7 77 D 6
8 71 D 3
Run Code Online (Sandbox Code Playgroud)
需要创建一个新列“geo_num_new”,其中“cust_id”中的每个组都具有“geo_num”中的第一个值,如下所示:
> df_new
geo_num cust_id sales geo_num_new
1 11 A 2 11
2 12 A 3 11
3 22 B 2 22
4 41 C 1 41
5 42 C 2 41
6 43 C 4 41
7 …Run Code Online (Sandbox Code Playgroud)