我有一个非常简单的Java + Gradle项目.它建立得很好.它通过"gradle run"从shell运行良好.但是,如果我尝试在IntelliJ内部运行,我得到:
Cannot start compilation: the output path is not specified for module "xyz" Specify the output path in Configure Project.
Run Code Online (Sandbox Code Playgroud)
我的"编译器输出"设置为"继承项目编译输出路径".我不想要自定义输出路径,无论是什么,只需执行正常的gradle构建和运行.
使用Scala 2.11.8的Spark 2.0(最终版).以下超级简单代码会产生编译错误Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
import org.apache.spark.sql.SparkSession
case class SimpleTuple(id: Int, desc: String)
object DatasetTest {
val dataList = List(
SimpleTuple(5, "abc"),
SimpleTuple(6, "bcd")
)
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession.builder.
master("local")
.appName("example")
.getOrCreate()
val dataset = sparkSession.createDataset(dataList)
}
}
Run Code Online (Sandbox Code Playgroud) scala apache-spark apache-spark-dataset apache-spark-encoders
我正在 ECS 集群上运行 Docker 映像,以对其进行 shell 并运行一些简单的测试。但是当我运行这个时:
aws ecs execute-command \
--cluster MyEcsCluster \
--task $ECS_TASK_ARN \
--container MainContainer \
--command "/bin/bash" \
--interactive
Run Code Online (Sandbox Code Playgroud)
我收到错误:
The Session Manager plugin was installed successfully. Use the AWS CLI to start a session.
An error occurred (TargetNotConnectedException) when calling the ExecuteCommand operation: The execute command failed due to an internal error. Try again later.
Run Code Online (Sandbox Code Playgroud)
我可以确认任务+容器+代理都在运行:
aws ecs describe-tasks \
--cluster MyEcsCluster \
--tasks $ECS_TASK_ARN \
| jq '.'
Run Code Online (Sandbox Code Playgroud)
aws ecs execute-command \
--cluster …
Run Code Online (Sandbox Code Playgroud) 我有一些基本的Kafka Streaming代码,它从一个主题读取记录,进行一些处理,并将记录输出到另一个主题.
Kafka流如何处理并发?一切都在一个线程中运行吗?我没有在文档中看到这一点.
如果它是单线程的,我希望多线程处理的选项能够处理大量数据.
如果它是多线程的,我需要了解它是如何工作的以及如何处理资源,比如SQL数据库连接应该在不同的处理线程中共享.
相对于其他选项(Spark,Akka,Samza,Storm等),Kafka的内置流API是否不推荐用于高容量场景?
我想用Python编写的Spark驱动程序来输出一些基本的日志信息.我可以通过三种方式来做到这一点:
log4jLogger = sc._jvm.org.apache.log4j
LOGGER = log4jLogger.LogManager.getLogger(__name__)
LOGGER.info("pyspark script logger initialized")
只需使用标准控制台打印.
logging
Python标准库模块.这似乎是理想的和最Pythonic方法,但是,至少开箱即用,它不起作用,并且记录的消息似乎不可恢复.当然,这可以配置为记录到py4j-> log4j和/或控制台.
因此,官方编程指南(https://spark.apache.org/docs/1.6.1/programming-guide.html)根本没有提到日志记录.这令人失望.应该有标准的文档建议方法来记录Spark驱动程序.
搜索了这个问题,发现了这个问题:如何从我的Python Spark脚本中进行登录
但该线程的内容并不令人满意.
具体来说,我有以下问题:
C#有using
与IDisposable
接口.Java的7+有相同的功能try
和AutoCloseable
接口.Scala允许您为此问题选择自己的实现.
scala-arm似乎是受欢迎的选择,并由Typesafe员工之一维护.但是,这种简单的行为似乎非常复杂.为了澄清,使用说明很简单,但了解所有代码在内部工作的方式相当复杂.
我刚刚编写了以下超级简单的ARM解决方案:
object SimpleARM {
def apply[T, Q](c: T {def close(): Unit})(f: (T) => Q): Q = {
try {
f(c)
} finally {
c.close()
}
}
}
Run Code Online (Sandbox Code Playgroud)
我正在小型三台服务器Amazon EMR 5(Spark 2.0)集群上运行Spark工作.我的工作运行了一个小时左右,因以下错误而失败.我可以手动重启并运行,处理更多数据,最终再次失败.
我的Spark代码非常简单,不直接使用任何Amazon或S3 API.我的Spark代码将S3文本字符串路径传递给Spark,Spark在内部使用S3.
我的Spark程序在循环中执行以下操作:从S3加载数据 - >处理 - >将数据写入S3上的不同位置.
我的第一个怀疑是,某些内部Amazon或Spark代码未正确处理连接,并且连接池已耗尽.
com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.AmazonClientException: Unable to execute HTTP request: Timeout waiting for connection from pool
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:618)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1015)
at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:991)
at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:212)
at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy44.retrieveMetadata(Unknown Source)
at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:780)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1428)
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.exists(EmrFileSystem.java:313)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:85)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) …
Run Code Online (Sandbox Code Playgroud) 如果我使用的是Alpine 3.8,如何从Alpine Edge存储库中添加特定的程序包?甚至支持吗?据我所知,没有等效的反向端口。
我想添加新版本:https : //pkgs.alpinelinux.org/package/edge/community/armhf/librdkafka
而不是3.8版本中的旧版本:https : //pkgs.alpinelinux.org/package/v3.8/community/s390x/librdkafka
具体kubernetes-cli
.我安装了1.12.0.我需要1.11.x,但我没有安装.
我已经回顾并尝试了这个帖子中的每个答案,没有任何效果:Homebrew安装特定版本的公式?
我试过brew search
但没有tapped版本:
~ brew search kubernetes-cli
==> Formulae
kubernetes-cli ?
Run Code Online (Sandbox Code Playgroud)
我已经尝试brew versions
但该命令已被删除:
~ brew versions
Error: Unknown command: versions
Run Code Online (Sandbox Code Playgroud)
我试过了brew install kubernetes-cli@1.11.0
.1和.2:
~ brew install kubernetes-cli@1.11.0
Error: No available formula with the name "kubernetes-cli@1.11.0"
==> Searching for a previously deleted formula (in the last month)...
Error: No previously deleted formula found.
==> Searching for similarly named formulae...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps …
Run Code Online (Sandbox Code Playgroud) 这应该很简单.我想创建一个Ansible语句来创建一个Postgres用户,该用户具有对特定数据库的连接权限,并为该特定数据库中的所有表选择/插入/更新/删除权限.我尝试了以下方法:
- name: Create postgres user for my app
become: yes
become_user: postgres
postgresql_user:
db: "mydatabase"
name: "myappuser"
password: "supersecretpassword"
priv: CONNECT/ALL:SELECT,INSERT,UPDATE,DELETE
Run Code Online (Sandbox Code Playgroud)
我明白了 relation \"ALL\" does not exist
如果我删除ALL:
,我得到Invalid privs specified for database: INSERT UPDATE SELECT DELETE
apache-spark ×3
scala ×2
alpine-linux ×1
amazon-ecs ×1
amazon-emr ×1
ansible ×1
apache-kafka ×1
aws-fargate ×1
docker ×1
gradle ×1
homebrew ×1
intellij-13 ×1
java ×1
logging ×1
postgresql ×1
pyspark ×1
repository ×1