我将通过 Spring Boot 配置和测试设置以及基本应用程序。
尝试从 InteliJ 运行 Web Spring 启动应用程序:
package com.springbootdemoweb1.demoweb1;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class DemoWeb1Application {
public static void main(String[] args) {
SpringApplication.run(DemoWeb1Application.class, args);
}
}
Run Code Online (Sandbox Code Playgroud)
这是一个非常基本的应用程序,但它显示错误:
/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/bin/java -XX:TieredStopAtLevel=1 -noverify -Dspring.output.ansi.enabled=always -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=53736 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=127.0.0.1 -Dspring.liveBeansView.mbeanDomain -Dspring.application.admin.enabled=true "-javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=53737:/Applications/IntelliJ IDEA.app/Contents/bin" -Dfile.encoding=UTF-8 -classpath /Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/charsets.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/deploy.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/cldrdata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/dnsns.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/jaccess.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/jfxrt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/localedata.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/nashorn.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/sunec.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/sunjce_provider.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/sunpkcs11.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/ext/zipfs.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/javaws.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/jce.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/jfr.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/jfxswt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/jsse.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/management-agent.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/plugin.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/resources.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/jre/lib/rt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/ant-javafx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/dt.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/javafx-mx.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/jconsole.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/packager.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/sa-jdi.jar:/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home/lib/tools.jar:/Users/dnk306/IdeaProjects/demo-web1/target/classes:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-starter-web/2.0.0.RELEASE/spring-boot-starter-web-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-starter/2.0.0.RELEASE/spring-boot-starter-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot/2.0.0.RELEASE/spring-boot-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-autoconfigure/2.0.0.RELEASE/spring-boot-autoconfigure-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-starter-logging/2.0.0.RELEASE/spring-boot-starter-logging-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/ch/qos/logback/logback-classic/1.2.3/logback-classic-1.2.3.jar:/Users/dnk306/.m2/repository/ch/qos/logback/logback-core/1.2.3/logback-core-1.2.3.jar:/Users/dnk306/.m2/repository/org/apache/logging/log4j/log4j-to-slf4j/2.10.0/log4j-to-slf4j-2.10.0.jar:/Users/dnk306/.m2/repository/org/apache/logging/log4j/log4j-api/2.10.0/log4j-api-2.10.0.jar:/Users/dnk306/.m2/repository/org/slf4j/jul-to-slf4j/1.7.25/jul-to-slf4j-1.7.25.jar:/Users/dnk306/.m2/repository/javax/annotation/javax.annotation-api/1.3.2/javax.annotation-api-1.3.2.jar:/Users/dnk306/.m2/repository/org/yaml/snakeyaml/1.19/snakeyaml-1.19.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-starter-json/2.0.0.RELEASE/spring-boot-starter-json-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/core/jackson-databind/2.9.4/jackson-databind-2.9.4.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/core/jackson-annotations/2.9.0/jackson-annotations-2.9.0.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/core/jackson-core/2.9.4/jackson-core-2.9.4.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.9.4/jackson-datatype-jdk8-2.9.4.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.9.4/jackson-datatype-jsr310-2.9.4.jar:/Users/dnk306/.m2/repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.9.4/jackson-module-parameter-names-2.9.4.jar:/Users/dnk306/.m2/repository/org/springframework/boot/spring-boot-starter-tomcat/2.0.0.RELEASE/spring-boot-starter-tomcat-2.0.0.RELEASE.jar:/Users/dnk306/.m2/repository/org/apache/tomcat/embed/tomcat-embed-core/8.5.28/tomcat-embed-core-8.5.28.jar:/Users/dnk306/.m2/repository/org/apache/tomcat/embed/tomcat-embed-el/8.5.28/tomcat-embed-el-8.5.28.jar:/Users/dnk306/.m2/repository/org/apache/tomcat/embed/tomcat-embed-websocket/8.5.28/tomcat-embed-websocket-8.5.28.jar:/Users/dnk306/.m2/repository/org/hibernate/validator/hibernate-validator/6.0.7.Final/hibernate-validator-6.0.7.Final.jar:/Users/dnk306/.m2/repository/javax/validation/validation-api/2.0.1.Final/validation-api-2.0.1.Final.jar:/Users/dnk306/.m2/repository/org/jboss/logging/jboss-logging/3.3.2.Final/jboss-logging-3.3.2.Final.jar:/Users/dnk306/.m2/repository/com/fasterxml/classmate/1.3.4/classmate-1.3.4.jar:/Users/dnk306/.m2/repository/org/springframework/spring-web/5.0.4.RELEASE/spring-web-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-beans/5.0.4.RELEASE/spring-beans-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-webmvc/5.0.4.RELEASE/spring-webmvc-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-aop/5.0.4.RELEASE/spring-aop-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-context/5.0.4.RELEASE/spring-context-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-expression/5.0.4.RELEASE/spring-expression-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/slf4j/slf4j-api/1.7.25/slf4j-api-1.7.25.jar:/Users/dnk306/.m2/repository/org/springframework/spring-core/5.0.4.RELEASE/spring-core-5.0.4.RELEASE.jar:/Users/dnk306/.m2/repository/org/springframework/spring-jcl/5.0.4.RELEASE/spring-jcl-5.0.4.RELEASE.jar com.springbootdemoweb1.demoweb1.DemoWeb1Application
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` …Run Code Online (Sandbox Code Playgroud) 我正在尝试从每列的 CSV 文件中获取所有数据类型。
文件中没有关于数据类型的文档,手动检查需要很长时间(它有 150 列)。
开始使用这种方法:
df = pd.read_csv('/tmp/file.csv')
>>> df.dtypes
a int64
b int64
c object
d float64
Run Code Online (Sandbox Code Playgroud)
上述方法是否足够好,或者有更好的方法来确定数据类型?
此外 - 文件有 150 列。当我输入时df.types- 我只能看到 15 列左右。如何全部看到?
我的确在变化config/elasticsearch.yml,以
xpack.security.enabled: true
Run Code Online (Sandbox Code Playgroud)
现在在启动 elasticsearch ( ./bin/elasticsearch) 然后执行:
curl localhost:9200
得到:
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
Run Code Online (Sandbox Code Playgroud)
然后尝试了这两个:
curl localhost:9200 -u elastic:elastic
curl localhost:9200 -u elastic:changeme
Run Code Online (Sandbox Code Playgroud)
得到:
{"error":{"root_cause":[{"type":"security_exception","reason":"failed to authenticate user [elastic]",
"header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}], "type":"security_exception", "reason":"failed to authenticate user [elastic]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}%
Run Code Online (Sandbox Code Playgroud)
Elasticsearch 7.2.0 的默认用户名/密码是什么?
我有一个数据框:
DF:
1,2016-10-12 18:24:25
1,2016-11-18 14:47:05
2,2016-10-12 21:24:25
2,2016-10-12 20:24:25
2,2016-10-12 22:24:25
3,2016-10-12 17:24:25
Run Code Online (Sandbox Code Playgroud)
如何只保留每个组的最新记录?(上面有 3 个组 (1,2,3))。
结果应该是:
1,2016-11-18 14:47:05
2,2016-10-12 22:24:25
3,2016-10-12 17:24:25
Run Code Online (Sandbox Code Playgroud)
还试图使其高效(例如,在具有 1 亿条记录的中等集群上在几分钟内完成),因此应该以最有效和正确的方式进行排序/排序(如果需要)。
我有 DF1 的架构:
df1 = spark.read.parquet(load_path1)
df1.printSchema()
root
|-- PRODUCT_OFFERING_ID: string (nullable = true)
|-- CREATED_BY: string (nullable = true)
|-- CREATION_DATE: string (nullable = true)
Run Code Online (Sandbox Code Playgroud)
和 DF2:
df2 = spark.read.parquet(load_path2)
df2.printSchema()
root
|-- PRODUCT_OFFERING_ID: decimal(38,10) (nullable = true)
|-- CREATED_BY: decimal(38,10) (nullable = true)
|-- CREATION_DATE: timestamp (nullable = true)
Run Code Online (Sandbox Code Playgroud)
现在我想联合这两个数据帧。
有时,当我尝试联合这两个 DF 时,由于架构不同,它会出现错误。
如何设置 DF2 具有与 DF1 完全相同的架构(在加载期间)?
我尝试过:
df2 = spark.read.parquet(load_path2).schema(df1.schema)
Run Code Online (Sandbox Code Playgroud)
出现错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'StructType' object is …Run Code Online (Sandbox Code Playgroud) 我在 AWS 中有一个 Kinesis 流,可以使用 kinesis 命令向其发送数据 (JSON),并可以使用以下命令从流中取回数据:
SHARD_ITERATOR=$(aws kinesis get-shard-iterator --shard-id shardId-000000000000 --shard-iterator-type TRIM_HORIZON --stream-name mystream --query 'ShardIterator' --profile myprofile)
aws kinesis get-records --shard-iterator $SHARD_ITERATOR --profile myprofile
Run Code Online (Sandbox Code Playgroud)
其输出看起来像这样:
HsKCQkidmlkZW9Tb3VyY2UiOiBbCgkJCXsKCQkJCSJicmFuZGluZyI6IHt9LAoJCQkJInByb21vUG9vbCI6IFtdLAoJCQkJImlkIjogbnVsbAoJCQl9CgkJXSwKCQkiaW1hZ2VTb3VyY2UiOiB7fSwKCQkibWV0YWRhdGFBcHByb3ZlZCI6IHRydWUsCgkJImR1ZURhdGUiOiAxNTgzMzEyNTA0ODAzLAoJCSJwcm9maWxlIjogewoJCQkiY29tcG9uZW50Q291bnQiOiAwLAoJCQkibmFtZSI6ICJTUUVfQVRfUFJPRklMRSIsCgkJCSJpZCI6ICJTUUVfQVRfUFJPRklMRV9JRCIsCgkJCSJwYWNrYWdlQ291bnQiOiAwLAoJCQkicGFja2FnZXMiOiBbCgkJCQl7CgkJCQkJIm5hbWUiOiAiUEVBQ09DSy1MVEEiLAoJCQkJCSJpZCI6ICJmZDk5NTRmZC03NDYwLTRjZjItOTU5Ni05YzBhMjcxNTViODgiCgkJCQl9CgkJCV0KCQl9LAoJCSJ3b3JrT3JkZXJJZCI6ICJTUUVfQVRfSk9CX1NVQk1JU1
Run Code Online (Sandbox Code Playgroud)
如何获取原始格式的实际 JSON 消息(看起来像 JSON)——与我发送消息时的原始格式相同?
谢谢
我有10列的CSV文件。Half String和Half是Integers。
什么是Scala代码以:
到目前为止,我有:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true") // Use first line of all files as header
.option("inferSchema", "true") // Automatically infer data types
.load("cars.csv")
Run Code Online (Sandbox Code Playgroud)
保存该模式的最佳文件格式是什么?是JSON吗?
目标是-我只想创建一次架构,下一次从文件中加载,而不是即时重新创建。
谢谢。
我可以使用Python从Redis获取一个键/值:
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
data = r.get('12345')
Run Code Online (Sandbox Code Playgroud)
如何同时(一次调用)从2个键中获取值?
我尝试过:data = r.get('12345', '54321')但这不起作用..
另外,如何获取基于部分键的所有值?例如data = r.get('123*')
我正在使用 Snowflake 数据库并运行此查询以查找总数、不同记录数和差异:
select
(select count(*) from mytable) as total_count,
(select count(*) from (select distinct * from mytable)) as distinct_count,
(select count(*) from mytable) - (select count(*) from (select distinct * from mytable)) as duplicate_count
from mytable limit 1;
Run Code Online (Sandbox Code Playgroud)
结果:
1,759,867
1,738,924
20,943 (duplicate_count)
Run Code Online (Sandbox Code Playgroud)
但是当尝试使用另一种方法时(将所有列分组并找到计数 > 1 的位置):
select count(*) from (
SELECT
a, b, c, d, e,
COUNT(*)
FROM
mytable
GROUP BY
a, b, c, d, e
HAVING
COUNT(*) > 1
)
Run Code Online (Sandbox Code Playgroud)
我明白了5,436。
为什么重复的数量存在差异?(20,943对比5,436 …
我正在使用 AWS Glue 爬网程序读取 S3 zip 文件(无标头)并填充 Glue 目录。
列默认命名为:col_0, col_1...
如何使用 python boto3 模块更改这些列名称并直接与 AWS Glue 目录交互?
有执行此操作的示例片段吗?
谢谢。