我无法弄清楚发生了什么 - 一切似乎都有效但我的应用程序没有生成文件 - 尽管看起来确实如此.我在Windows上运行它,在RStudio 0.98.125上,我使用以下行运行它:runApp()下面是一个非常简单的可重现示例:
shinyUI(pageWithSidebar(
headerPanel("My App"),
sidebarPanel(
numericInput('NumRuns','Number of runs',value=3,min=3,max=10,step=1),
actionButton(inputId="goButton","Run!"),
textInput("downloadData","Save My Data Frame:",value="Data Frame 1"),
downloadButton('downloadData','Save my file!')
),
mainPanel(
tabPanel("Some Text",
h4(textOutput("caption2")),
tableOutput("mydf"),
value=3))
))
Run Code Online (Sandbox Code Playgroud)
shinyServer(function(input,output){
# Creating files for download at the end
myout = reactive({
if(input$goButton==0) return(NULL)
nrruns=input$NumRuns
mylist=NULL
for(i in 1:nrruns){
mylist[[i]]<-data.frame(a=rnorm(10),b=runif(10))
names(mylist)[i]<-paste("dataframe",i,sep="")
}
return(mylist)
})
output$mydf <- renderTable({
if(input$goButton==0) return(NULL)
input$goButton
isolate(
myout()$dataframe1
)
})
output$downloadData <- downloadHandler(
filename = function() { paste(input$downloadData, " ",Sys.Date(),".csv",sep="") },
content = …Run Code Online (Sandbox Code Playgroud) 我有一个包含许多列的spark数据帧'mydataframe'.我试图只在两列上运行kmeans:lat和long(纬度和经度),使用它们作为简单值).我想基于这两个列提取7个集群,然后我想将集群asignment附加到我的原始数据帧.我试过了:
from numpy import array
from math import sqrt
from pyspark.mllib.clustering import KMeans, KMeansModel
# Prepare a data frame with just 2 columns:
data = mydataframe.select('lat', 'long')
data_rdd = data.rdd # needs to be an RDD
data_rdd.cache()
# Build the model (cluster the data)
clusters = KMeans.train(data_rdd, 7, maxIterations=15, initializationMode="random")
Run Code Online (Sandbox Code Playgroud)
但一段时间后我收到一个错误:
org.apache.spark.SparkException:作业因阶段失败而中止:阶段5191.0中的任务1失败4次,最近失败:阶段5191.0中丢失任务1.3(TID 260738,10.19.211.69,执行程序1):org.apache. spark.api.python.PythonException:Traceback(最近一次调用最后一次)
我试图分离并重新连接群集.结果相同.我究竟做错了什么?
非常感谢你!
machine-learning k-means pyspark apache-spark-ml apache-spark-mllib
关于 Spark 计算不一致的问题。这存在吗?例如,我两次运行完全相同的命令,例如:
imp_sample.where(col("location").isNotNull()).count()
Run Code Online (Sandbox Code Playgroud)
每次运行时我都会得到略有不同的结果(141,830,然后是 142,314)!或这个:
imp_sample.where(col("location").isNull()).count()
Run Code Online (Sandbox Code Playgroud)
得到 2,587,013,然后是 2,586,943。怎么可能?谢谢!
下面是我的代码。它可能看起来有点长,但实际上它是一个非常简单的应用程序。
用户应该上传一个很小的数据框(如果您在美国,则为 x.csv,如果您在欧洲,则为 x_Europe.csv)。然后用户应该点击按钮开始计算。最后,用户应该能够将这些计算的结果下载为数据框。
我的问题:上传文件后,当我单击“do_it”操作按钮时 - 没有任何反应。我可以看到它,因为我的控制台没有打印任何内容。为什么?毕竟,我的函数“main_calc”应该是 input$do_it 的 eventReactive?为什么 main_calc 中的所有计算仅在用户尝试下载结果后才开始发生?
重要提示:将“数据”函数与 main_calc 分开对我来说很重要。
非常感谢!
首先,在您的工作目录中生成以下 2 个文件之一:
# generate file 'x.csv' to read in later in the app:
write.csv(data.frame(a = 1:4, b = 2:5), "x.csv", row.names = F) # US file
write.csv2(data.frame(a = 1:4, b = 2:5), "x_Europe.csv", row.names = F)
Run Code Online (Sandbox Code Playgroud)
这是闪亮应用程序的代码:
library(shiny)
ui <- fluidPage(
# User should upload file x here:
fileInput("file_x", label = h5("Upload file 'x.csv'!")),
br(),
actionButton("do_it", "Click Here First:"),
br(),
br(),
textInput("user_filename","Save your …Run Code Online (Sandbox Code Playgroud) 我试图只选择没有NA的行:
library(dplyr)
x = data.frame(a = c(NA, 2, 3, 4))
var_a <- "a"
# This works:
x %>% filter(!is.na(a))
# That works too:
var_a <- quo(a)
x %>% filter(!is.na(!!var_a))
# But this doesn't work:
var_a <- "a"
x %>% filter(!is.na(!!var_a))
Run Code Online (Sandbox Code Playgroud)
我应该在最后一行更改它才能工作?因为我必须使用var_a < - "a".非常感谢你!
下面是 git 中传统工作流程的描述。
是否有可能以某种方式在 R 中编写一个脚本,让 git 执行所有脚本?这样做是否明智?非常感谢!
cd <path_to_local_repository>。git add --all以暂存更改。git commit -m '<commit_message>'在命令行输入以将更改提交到本地存储库。git push 在命令行中输入,将更改推送到远程存储库(例如,在 Bitbucket 上)。谁能解释一下,为什么在第一个循环中我的日期向量的每个元素都是一个日期,而在第二个循环中,我的日期向量的每个元素都是数字?谢谢!
x <- as.Date(c("2018-01-01", "2018-01-02", "2018-01-02", "2018-05-06"))
class(x)
# Loop 1 - each element is a Date:
for (i in seq_along(x)) print(class(x[i]))
# Loop 2 - each element is numeric:
for (i in x) print(class(i))
Run Code Online (Sandbox Code Playgroud) 我已经使用管道估计了逻辑回归。
我在拟合逻辑回归之前的最后几行:
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(featuresCol="lr_features", labelCol = "targetvar")
# create assember to include encoded features
lr_assembler = VectorAssembler(inputCols= numericColumns +
[categoricalCol + "ClassVec" for categoricalCol in categoricalColumns],
outputCol = "lr_features")
from pyspark.ml.classification import LogisticRegression
from pyspark.ml import Pipeline
# Model definition:
lr = LogisticRegression(featuresCol = "lr_features", labelCol = "targetvar")
# Pipeline definition:
lr_pipeline = Pipeline(stages = indexStages + encodeStages +[lr_assembler, lr])
# Fit the logistic regression model:
lrModel = lr_pipeline.fit(train_train)
Run Code Online (Sandbox Code Playgroud)
然后我尝试运行模型的摘要。但是,下面的代码行:
trainingSummary …Run Code Online (Sandbox Code Playgroud) 下面是使用 ggplot2 和 ggrepel 的“标签位置优化”示例:
x = c(0.8846, 1.1554, 0.9317, 0.9703, 0.9053, 0.9454, 1.0146, 0.9012,
0.9055, 1.3307)
y = c(0.9828, 1.0329, 0.931, 1.3794, 0.9273, 0.9605, 1.0259, 0.9542,
0.9717, 0.9357)
ShortSci = c("MotAlb", "PruMod", "EriRub", "LusMeg", "PhoOch", "PhoPho",
"SaxRub", "TurMer", "TurPil", "TurPhi")
df <- data.frame(x = x, y = y, z = ShortSci)
library(ggplot2)
library(ggrepel)
ggplot(data = df, aes(x = x, y = y)) + theme_bw() +
geom_text_repel(aes(label = z),
box.padding = unit(0.45, "lines")) +
geom_point(colour = "green", size = 3) …Run Code Online (Sandbox Code Playgroud) I am trying to standardize (mean = 0, std = 1) one column ('age') in my data frame. Below is my code in Spark (Python):
from pyspark.ml.feature import StandardScaler
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
# Make my 'age' column an assembler type:
age_assembler = VectorAssembler(inputCols= ['age'], outputCol = "age_feature")
# Create a scaler that takes 'age_feature' as an input column:
scaler = StandardScaler(inputCol="age_feature", outputCol="age_scaled",
withStd=True, withMean=True)
# Creating a mini-pipeline for those 2 steps:
age_pipeline = Pipeline(stages=[age_assembler, …Run Code Online (Sandbox Code Playgroud)