我试图在数据集的2个不同方面生成2种不同颜色的不同geom_vlines.我这样做是为了强调两个不同方面的方法.
这是数据集:
Pclass Sex Age SibSp Parch Fare Cabin Embarked Survived
3 male 22 1 0 7.25 S 0
1 female 38 1 0 71.2833 C85 C 1
3 female 26 0 0 7.925 S 1
1 female 35 1 0 53.1 C123 S 1
3 male 35 0 0 8.05 S 0
1 male 54 0 0 51.8625 E46 S 0
Run Code Online (Sandbox Code Playgroud)
这是代码:
g<-ggplot(data = train3, aes(x = Age, y = Survived, colour = factor(Pclass)))
g<-g+facet_wrap(~Sex)
g<-g+geom_point(size = 4, alpha …Run Code Online (Sandbox Code Playgroud) 我正在阅读 Hastie 和 Tibshirani 的 R 统计学习简介。我遇到了两个概念:RSE 和 MSE。我的理解是这样的:
RSE = sqrt(RSS/N-2)
MSE = RSS/N
Run Code Online (Sandbox Code Playgroud)
现在我正在为一个问题构建 3 个模型,需要比较它们。虽然 MSE 对我来说很直观,但我也想知道计算RSS/N-2是否有任何用处,根据上面的是RSE^2
我想我不确定在哪里使用哪个?
我正在尝试根据纽约犯罪历史数据制作一个闪亮的应用程序。我正在使用单个闪亮页面方法。这是数据: https //data.world/data-society/nyc-crime-data
出于某种原因,当我选择年份来输出犯罪统计数据时,我的输出仅在 RStudio 的查看器中输出,而不是在 Shiny 弹出窗口的主面板上。这是完整的代码:
# Shiny App exploring New York City Crime Data between 2006-2016
# Data Source: https://data.world/data-society/nyc-crime-data
#########################Global Data######################
# Data Reading
set.seed(123)
library("shiny")
library("lubridate")
library("plotly")
nypd<-read.csv("NYPD_Complaint_Data_Historic.csv")
#Data Massaging
nypd$year<-year(as.Date(nypd$RPT_DT,'%m/%d/%Y'))
nypd$month<-month(as.Date(nypd$RPT_DT,'%m/%d/%Y'))
nypd<-nypd[nypd$OFNS_DESC != "",]
nypd2<-nypd[,c(1,6,8,14,16,17,22,23,25,26)]
ui<-fluidPage(
titlePanel("New York City Crime Data from 2006-2016"),
sidebarLayout(
sidebarPanel(
sliderInput("year","Year of Crime",min=2006,max=2016,value=2008,step = 1)
),
mainPanel(plotOutput("crimeplot"))
)
)
server<-function(input,output){
output$crimeplot<-renderPlot({
nypd_yr_sorted<-nypd2[nypd2$year==input$year,]
agg_data<- aggregate(nypd_yr_sorted$CMPLNT_NUM,by=list(nypd_yr_sorted$OFNS_DESC),FUN=functi on(x)length(unique(x)))
colnames(agg_data)<-c("Crime","Crime count")
bar_data<-agg_data[order(agg_data$`Crime count`, decreasing = TRUE),][1:5,]
plot_ly(bar_data,x=~Crime,y=~`Crime count`,type="bar",color = ~Crime) %>% layout(xaxis= list(showticklabels …Run Code Online (Sandbox Code Playgroud) 几个小时后,我被困在类似的东西上,并在一个Teradata查询中输出了一个不那么混乱的代码,用于输出25%,50%,75%的百分位数.可以进一步扩展以产生" 5点总结 ".根据您的人口估计值,最小和最大变化静态值.
有人要求优雅的方法.分享我的.
这是代码:
SELECT MAX(PER_MIN) AS PER_MIN,
MAX(PER_25) AS PER_25,
MAX(PER_50) AS PER_50,
MAX(PER_75) AS PER_75,
MAX(PER_MAX) AS PER_MAX
FROM (SELECT CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.01 AS INT) THEN DURATION_MACRO_CURR END AS PER_MIN,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.25 AS INT) THEN DURATION_MACRO_CURR END AS PER_25,
CASE WHEN ROW_NUMBER() OVER(ORDER BY DURATION_MACRO_CURR ASC) = CAST(COUNT(*) OVER() * 0.50 AS INT) THEN DURATION_MACRO_CURR END AS PER_50 …Run Code Online (Sandbox Code Playgroud) r ×2
sql ×2
facet-wrap ×1
geom-hline ×1
ggplot2 ×1
percentile ×1
plotly ×1
python ×1
quartile ×1
ranking ×1
regression ×1
shiny ×1
teradata ×1