我已将CSV数据加载到Spark DataFrame中。
我需要将此数据帧切成两个不同的数据帧,其中每个数据帧都包含来自原始数据帧的一组列。
如何基于列选择Spark数据框中的子集?
我正在尝试将我的 WPF(.net framework) 项目迁移到 WPF(.net core 3)。所以我已经安装了这个 Visual Studio 扩展,我现在可以创建一个新的 Wpf(.net core) 项目,但是当我添加一个 nuget 包时问题就开始了!, VS 向我抛出此错误:
Unable to find package Microsoft.NETCore.App with version (>= 3.0.0-preview6-27730-01)
- Found 69 version(s) in nuget.org [ Nearest version: 3.0.0-preview5-27626-15 ]
- Found 0 version(s) in Microsoft Visual Studio Offline Packages TestwpfCore C:\Users\sintware\source\repos\TestwpfCore\TestwpfCore\TestwpfCore.csproj 1
Run Code Online (Sandbox Code Playgroud) 我是 mapbox 的新手。我需要使用mapbox 的 supercluster 项目才能在地图中绘制 600 万个 gps。我试图在本地主机中使用演示,但我只得到一张空地图!?
这是我在index.html 中的代码:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Supercluster Leaflet demo</title>
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/1.0.3/leaflet.css" />
<script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/1.0.3/leaflet.js"></script>
<link rel="stylesheet" href="cluster.css" />
<style>
html, body, #map {
height: 100%;
margin: 0;
}
</style>
</head>
<body>
<div id="map"></div>
<script src="index.js"></script>
<script src="https://unpkg.com/supercluster@3.0.2/dist/supercluster.min.js">
var index = supercluster({
radius: 40,
maxZoom: 16
});
index.load(GeoObs.features);
index.getClusters([-180, -85, 180, 85], 2);
</script>
</body>
</html>
Run Code Online (Sandbox Code Playgroud)
注意:GeoObs是我的geojson文件
怎么了 …
我正在尝试在 eclipse 中运行Spark maven Scala项目。
当我运行scala 类时,出现此错误:
Exception in thread "main" java.lang.NumberFormatException: Not a version: 9
at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184)
at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187)
at scala.util.Properties$.isJavaAtLeast(Properties.scala:17)
....
Run Code Online (Sandbox Code Playgroud)
怎么了 ?什么是版本 9?
我试图测试上IntelliJ IDEA的一阶Maven项目
当我跑步时
测试
我收到此错误:
Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile) on project neo4j-spark-connector: Execution scala-compile of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed: For artifact {null:null:null:jar}: The groupId cannot be empty.
Run Code Online (Sandbox Code Playgroud)
这是我刚刚添加了这个依赖项的pom.xml:
<dependency>
<groupId>neo4j-contrib</groupId>
<artifactId>neo4j-spark-connector</artifactId>
<version>2.1.0-M4</version>
</dependency>
Run Code Online (Sandbox Code Playgroud)
这是错误日志:
[ERROR] Failed to execute goal net.alchim31.maven:scala-maven- plugin:3.2.0:compile (scala-compile) on project neo4j-spark-connector: Execution scala-compile of goal net.alchim31.maven:scal
a-maven-plugin:3.2.0:compile failed: For artifact {null:null:null:jar}: The groupId cannot be empty. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile) on project neo4j-spark-connector: Execution …Run Code Online (Sandbox Code Playgroud) 我计划将 Spark 数据帧保存到配置单元表中,以便我可以查询它们并从中提取纬度和经度,因为 Spark 数据帧不可迭代。
\n\n使用 jupyter 中的 pyspark,我编写了以下代码来进行 Spark 会话:
\n\nimport findspark\nfindspark.init()\nfrom pyspark import SparkContext, SparkConf\nfrom pyspark.sql import SparkSession\n\n#readmultiple csv with pyspark\n spark = SparkSession \\\n.builder \\\n.appName("Python Spark SQL basic example") \\\n.config("spark.sql.catalogImplementation=hive").enableHiveSupport() \\\n.getOrCreate()\n\n df = spark.read.csv("Desktop/train/train.csv",header=True);\n\n Pickup_locations=df.select("pickup_datetime","Pickup_latitude",\n "Pickup_longitude")\n\n print(Pickup_locations.count())\nRun Code Online (Sandbox Code Playgroud)\n\n然后我运行 hiveql :
\n\ndf.createOrReplaceTempView("mytempTable") \nspark.sql("create table hive_table as select * from mytempTable");\nRun Code Online (Sandbox Code Playgroud)\n\n我收到这个错误:
\n\n Py4JJavaError: An error occurred while calling o24.sql.\n : org.apache.spark.sql.AnalysisException: Hive support is required to CREATE Hive TABLE (AS SELECT);;\n \'CreateTable `hive_table`, …Run Code Online (Sandbox Code Playgroud) Dataframe.withColumns()仅在数据框末尾附加一个新列,但是我需要一种方法来添加它。
那可能吗 ?
或者唯一的解决方案是使用我的列创建一个数据框,然后附加其余部分?
我有一个Spark项目,该项目最近可以工作。
该项目获得一个CSV,并向其中添加两个字段,然后输出带有JavaPairRddsaveasTextfile()的内容。
我的Spark版本是:2.3.0我的Java版本是:1.8
该项目在Windows 10下的Eclipse IDE中运行。
这是错误:
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2080)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
... 32 more
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) …Run Code Online (Sandbox Code Playgroud)