小编bou*_*sse的帖子

PairRDD的总和值

我有一个类型的RDD:

dataset :org.apache.spark.rdd.RDD[(String, Double)] = MapPartitionRDD[26]
Run Code Online (Sandbox Code Playgroud)

这相当于 (Pedro, 0.0833), (Hello, 0.001828) ...

我想总结所有的价值,0.0833+0.001828..但我找不到合适的解决方案.

scala apache-spark

4
推荐指数
1
解决办法
9158
查看次数

闲置和熊猫 - Python

我试图启动我从IPython笔记本中找到的代码(我还添加了一些代码,如:interactive(True)...)我的问题是当我使用"运行模块"和Idle时它会启动"数据".情节"然后它加载,没有任何反应.data.plot似乎不起作用.

谢谢,如果你有任何想法.

注意:如果没有"交互式(True)",则会显示一个"运行时错误"框

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import interactive
interactive(True)

# read data into a DataFrame
data = pd.read_csv('http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv', index_col=0)
print(data.head())

# print the shape of the DataFrame
print data.shape

# visualize the relationship between the features and the response using scatterplots
fig, axs = plt.subplots(1, 3, sharey=True)
data.plot(kind='scatter', x='TV', y='Sales', ax=axs[0], figsize=(16, 8))
data.plot(kind='scatter', x='Radio', y='Sales', ax=axs[1])
data.plot(kind='scatter', x='Newspaper', y='Sales', ax=axs[2])
Run Code Online (Sandbox Code Playgroud)

python

3
推荐指数
1
解决办法
2801
查看次数

Scala阅读文件与Spark

我试图读取一个看起来像这样的文件:

you 0.0432052044116
i 0.0391075831328
the 0.0328010698268
to 0.0237549924919
a 0.0209682886489
it 0.0198104294359
Run Code Online (Sandbox Code Playgroud)

我想将它存储在RDD(键,值)中(例如,你,0.0432).目前我只做了那个算法

val filename = "freq2.txt"
try {
for (line <- Source.fromFile(filename).getLines()) {
    val tuple = line.split(" ")
    val key = tuple(0)
    val words = tuple(1)
    println(s"${key}")
    println(s"${words}")
  }

} catch {
  case ex: FileNotFoundException => println("Couldn't find that file.")
  case ex: IOException => println("Had an IOException trying to read that file")
}
Run Code Online (Sandbox Code Playgroud)

但我不知道如何存储数据......

scala apache-spark

1
推荐指数
1
解决办法
6499
查看次数

标签 统计

apache-spark ×2

scala ×2

python ×1