我正在尝试从Amazon s3读取JSON文件,以创建一个spark上下文并使用它来处理数据.
Spark基本上位于docker容器中.因此将文件放入docker路径也是PITA.因此把它推到了S3.
下面的代码解释了其他的东西.
from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("first")
sc = SparkContext(conf=conf)
config_dict = {"fs.s3n.awsAccessKeyId":"**",
"fs.s3n.awsSecretAccessKey":"**"}
bucket = "nonamecpp"
prefix = "dataset.json"
filename = "s3n://{}/{}".format(bucket, prefix)
rdd = sc.hadoopFile(filename,
'org.apache.hadoop.mapred.TextInputFormat',
'org.apache.hadoop.io.Text',
'org.apache.hadoop.io.LongWritable',
conf=config_dict)
Run Code Online (Sandbox Code Playgroud)
我收到以下错误 -
Py4JJavaError Traceback (most recent call last)
<ipython-input-2-b94543fb0e8e> in <module>()
9 'org.apache.hadoop.io.Text',
10 'org.apache.hadoop.io.LongWritable',
---> 11 conf=config_dict)
12
/usr/local/spark/python/pyspark/context.pyc in hadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter, valueConverter, conf, batchSize)
558 jrdd = self._jvm.PythonRDD.hadoopFile(self._jsc, path, inputFormatClass, keyClass,
559 valueClass, keyConverter, valueConverter,
--> 560 jconf, …Run Code Online (Sandbox Code Playgroud) 我安装了 django 1.9 和 django-debug-toolbar==1.3.0 。这是我的settings.py内容
# debug_toolbar settings
if DEBUG:
INTERNAL_IPS = ('127.0.0.1',)
MIDDLEWARE_CLASSES += (
'debug_toolbar.middleware.DebugToolbarMiddleware',
)
INSTALLED_APPS += (
'debug_toolbar',
)
DEBUG_TOOLBAR_PANELS = [
'debug_toolbar.panels.versions.VersionsPanel',
'debug_toolbar.panels.timer.TimerPanel',
'debug_toolbar.panels.settings.SettingsPanel',
'debug_toolbar.panels.headers.HeadersPanel',
'debug_toolbar.panels.request.RequestPanel',
'debug_toolbar.panels.sql.SQLPanel',
'debug_toolbar.panels.staticfiles.StaticFilesPanel',
'debug_toolbar.panels.templates.TemplatesPanel',
'debug_toolbar.panels.cache.CachePanel',
'debug_toolbar.panels.signals.SignalsPanel',
'debug_toolbar.panels.logging.LoggingPanel',
'debug_toolbar.panels.redirects.RedirectsPanel',
]
DEBUG_TOOLBAR_CONFIG = {
'INTERCEPT_REDIRECTS': False,
}
Run Code Online (Sandbox Code Playgroud)
我已验证 DEBUG 设置为 True。运行服务器时,我收到以下错误 -
django.core.exceptions.ImproperlyConfigured: Error importing debug panel debug_toolbar.panels.versions: "cannot import name linebreak_iter"
Run Code Online (Sandbox Code Playgroud)
整个堆栈跟踪可以在这里找到 - https://gist.github.com/anonymous/7a48e7c24d530118e5dfc0a75b982be2
出了什么问题?TIA。