小编A.H*_*DAD的帖子

从Spark DataFrame中选择特定列

我已将CSV数据加载到Spark DataFrame中。

我需要将此数据帧切成两个不同的数据帧,其中每个数据帧都包含来自原始数据帧的一组列。

如何基于列选择Spark数据框中的子集?

scala apache-spark apache-spark-sql

9
推荐指数
3
解决办法
6万
查看次数

无法找到版本 (>=3.0.0) 的软件包 Microsoft.NETCore.App

我正在尝试将我的 WPF(.net framework) 项目迁移到 WPF(.net core 3)。所以我已经安装了这个 Visual Studio 扩展,我现在可以创建一个新的 Wpf(.net core) 项目,但是当我添加一个 nuget 包时问题就开始了!, VS 向我抛出此错误:

Unable to find package Microsoft.NETCore.App with version (>= 3.0.0-preview6-27730-01)
- Found 69 version(s) in nuget.org [ Nearest version: 3.0.0-preview5-27626-15 ]
- Found 0 version(s) in Microsoft Visual Studio Offline Packages    TestwpfCore C:\Users\sintware\source\repos\TestwpfCore\TestwpfCore\TestwpfCore.csproj   1   
Run Code Online (Sandbox Code Playgroud)

wpf nuget visual-studio-2019 .net-core-3.0

6
推荐指数
1
解决办法
1万
查看次数

如何使用超级集群

我是 mapbox 的新手。我需要使用mapbox 的 supercluster 项目才能在地图中绘制 600 万个 gps。我试图在本地主机中使用演示,但我只得到一张空地图!?

这是我在index.html 中的代码:

<!DOCTYPE html>
<html>
    <head>
        <meta charset="utf-8">
        <title>Supercluster Leaflet demo</title>

        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/leaflet/1.0.3/leaflet.css" />
        <script src="https://cdnjs.cloudflare.com/ajax/libs/leaflet/1.0.3/leaflet.js"></script>

        <link rel="stylesheet" href="cluster.css" />

        <style>
            html, body, #map {
            height: 100%;
            margin: 0;
            }
        </style>
    </head>
    <body>
        <div id="map"></div>
         <script src="index.js"></script>
      <script src="https://unpkg.com/supercluster@3.0.2/dist/supercluster.min.js">

    var index = supercluster({
      radius: 40,
      maxZoom: 16
     });
       index.load(GeoObs.features);
       index.getClusters([-180, -85, 180, 85], 2);
     </script>
    </body>
</html>
Run Code Online (Sandbox Code Playgroud)

注意GeoObs是我的geojson文件

怎么了 …

leaflet mapbox

5
推荐指数
2
解决办法
5897
查看次数

线程“main”中的异常 java.lang.NumberFormatException:不是版本:9

我正在尝试在 eclipse 中运行Spark maven Scala项目。

当我运行scala 类时,出现此错误:

Exception in thread "main" java.lang.NumberFormatException: Not a version: 9
at scala.util.PropertiesTrait$class.parts$1(Properties.scala:184)
at scala.util.PropertiesTrait$class.isJavaAtLeast(Properties.scala:187)
at scala.util.Properties$.isJavaAtLeast(Properties.scala:17)
....
Run Code Online (Sandbox Code Playgroud)

怎么了 ?什么是版本 9?

java eclipse scala eclipse-oxygen

5
推荐指数
1
解决办法
4042
查看次数

无法执行目标 net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile)

我试图测试上IntelliJ IDEA的阶Maven项目

当我跑步时

测试

我收到此错误:

 Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile) on project neo4j-spark-connector: Execution scala-compile of goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile failed: For artifact {null:null:null:jar}: The groupId cannot be empty.
Run Code Online (Sandbox Code Playgroud)

这是我刚刚添加了这个依赖项的pom.xml

 <dependency>
    <groupId>neo4j-contrib</groupId>
    <artifactId>neo4j-spark-connector</artifactId>
     <version>2.1.0-M4</version>
 </dependency>
Run Code Online (Sandbox Code Playgroud)

这是错误日志:

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven- plugin:3.2.0:compile (scala-compile) on project neo4j-spark-connector:   Execution scala-compile of goal net.alchim31.maven:scal
a-maven-plugin:3.2.0:compile failed: For artifact {null:null:null:jar}:  The groupId cannot be empty. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute   goal net.alchim31.maven:scala-maven-plugin:3.2.0:compile (scala-compile) on  project neo4j-spark-connector: Execution …
Run Code Online (Sandbox Code Playgroud)

intellij-idea scala-maven-plugin

5
推荐指数
1
解决办法
1万
查看次数

CREATE Hive TABLE (AS SELECT) 需要 Hive 支持

我计划将 Spark 数据帧保存到配置单元表中,以便我可以查询它们并从中提取纬度和经度,因为 Spark 数据帧不可迭代。

\n\n

使用 jupyter 中的 pyspark,我编写了以下代码来进行 Spark 会话:

\n\n
import findspark\nfindspark.init()\nfrom pyspark import SparkContext, SparkConf\nfrom pyspark.sql import SparkSession\n\n#readmultiple csv with pyspark\n spark = SparkSession \\\n.builder \\\n.appName("Python Spark SQL basic example") \\\n.config("spark.sql.catalogImplementation=hive").enableHiveSupport() \\\n.getOrCreate()\n\n df = spark.read.csv("Desktop/train/train.csv",header=True);\n\n Pickup_locations=df.select("pickup_datetime","Pickup_latitude",\n                          "Pickup_longitude")\n\n print(Pickup_locations.count())\n
Run Code Online (Sandbox Code Playgroud)\n\n

然后我运行 hiveql :

\n\n
df.createOrReplaceTempView("mytempTable") \nspark.sql("create table hive_table as select * from mytempTable");\n
Run Code Online (Sandbox Code Playgroud)\n\n

我收到这个错误:

\n\n
 Py4JJavaError: An error occurred while calling o24.sql.\n : org.apache.spark.sql.AnalysisException: Hive support is required to      CREATE Hive TABLE (AS SELECT);;\n \'CreateTable `hive_table`, …
Run Code Online (Sandbox Code Playgroud)

hiveql pyspark jupyter-notebook

3
推荐指数
1
解决办法
5805
查看次数

如何在 Spark Dataframe 中添加列

Dataframe.withColumns()仅在数据框末尾附加一个新列,但是我需要一种方法来添加它。

那可能吗 ?

或者唯一的解决方案是使用我的列创建一个数据框,然后附加其余部分?

apache-spark apache-spark-sql

2
推荐指数
1
解决办法
1683
查看次数

java.lang.UnsatisfiedLinkError:org.apache.hadoop.io.nativeio.NativeIO $ Windows.createFileWithMode0(Ljava / lang / String; JJJI)Ljava / io / FileDescriptor

我有一个Spark项目,该项目最近可以工作。

该项目获得一个CSV,并向其中添加两个字段,然后输出带有JavaPairRddsaveasTextfile()的内容。

我的Spark版本是:2.3.0我的Java版本是:1.8

该项目在Windows 10下的Eclipse IDE中运行。

这是错误:

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1587)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1586)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1586)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2027)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2048)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2080)
at org.apache.spark.internal.io.SparkHadoopWriter$.write(SparkHadoopWriter.scala:78)
... 32 more
 Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String;JJJI)Ljava/io/FileDescriptor;
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileOutputStreamWithMode(NativeIO.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:219)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:209)
at org.apache.hadoop.fs.RawLocalFileSystem.createOutputStreamWithMode(RawLocalFileSystem.java:307)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:296)
at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.<init>(ChecksumFileSystem.java:398)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911) …
Run Code Online (Sandbox Code Playgroud)

java eclipse apache-spark

1
推荐指数
1
解决办法
1054
查看次数