小编Elo*_*usk的帖子

从PySpark连接到S3数据

我正在尝试从Amazon s3读取JSON文件,以创建一个spark上下文并使用它来处理数据.

Spark基本上位于docker容器中.因此将文件放入docker路径也是PITA.因此把它推到了S3.

下面的代码解释了其他的东西.

from pyspark import SparkContext, SparkConf
conf = SparkConf().setAppName("first")
sc = SparkContext(conf=conf)

config_dict = {"fs.s3n.awsAccessKeyId":"**",
               "fs.s3n.awsSecretAccessKey":"**"}

bucket = "nonamecpp"
prefix = "dataset.json"
filename = "s3n://{}/{}".format(bucket, prefix)
rdd = sc.hadoopFile(filename,
                    'org.apache.hadoop.mapred.TextInputFormat',
                    'org.apache.hadoop.io.Text',
                    'org.apache.hadoop.io.LongWritable',
                    conf=config_dict)
Run Code Online (Sandbox Code Playgroud)

我收到以下错误 -

Py4JJavaError                             Traceback (most recent call last)
<ipython-input-2-b94543fb0e8e> in <module>()
      9                     'org.apache.hadoop.io.Text',
     10                     'org.apache.hadoop.io.LongWritable',
---> 11                     conf=config_dict)
     12 

/usr/local/spark/python/pyspark/context.pyc in hadoopFile(self, path, inputFormatClass, keyClass, valueClass, keyConverter, valueConverter, conf, batchSize)
    558         jrdd = self._jvm.PythonRDD.hadoopFile(self._jsc, path, inputFormatClass, keyClass,
    559                                               valueClass, keyConverter, valueConverter,
--> 560                                               jconf, …
Run Code Online (Sandbox Code Playgroud)

python hadoop amazon-s3 apache-spark pyspark

10
推荐指数
1
解决办法
2万
查看次数

django 调试工具栏抛出配置错误的异常

我安装了 django 1.9 和 django-debug-toolbar==1.3.0 。这是我的settings.py内容

# debug_toolbar settings
if DEBUG:
    INTERNAL_IPS = ('127.0.0.1',)
    MIDDLEWARE_CLASSES += (
        'debug_toolbar.middleware.DebugToolbarMiddleware',
    )

    INSTALLED_APPS += (
        'debug_toolbar',
    )

    DEBUG_TOOLBAR_PANELS = [
        'debug_toolbar.panels.versions.VersionsPanel',
        'debug_toolbar.panels.timer.TimerPanel',
        'debug_toolbar.panels.settings.SettingsPanel',
        'debug_toolbar.panels.headers.HeadersPanel',
        'debug_toolbar.panels.request.RequestPanel',
        'debug_toolbar.panels.sql.SQLPanel',
        'debug_toolbar.panels.staticfiles.StaticFilesPanel',
        'debug_toolbar.panels.templates.TemplatesPanel',
        'debug_toolbar.panels.cache.CachePanel',
        'debug_toolbar.panels.signals.SignalsPanel',
        'debug_toolbar.panels.logging.LoggingPanel',
        'debug_toolbar.panels.redirects.RedirectsPanel',
    ]

    DEBUG_TOOLBAR_CONFIG = {
        'INTERCEPT_REDIRECTS': False,
    }
Run Code Online (Sandbox Code Playgroud)

我已验证 DEBUG 设置为 True。运行服务器时,我收到以下错误 -

django.core.exceptions.ImproperlyConfigured: Error importing debug panel debug_toolbar.panels.versions: "cannot import name linebreak_iter"
Run Code Online (Sandbox Code Playgroud)

整个堆栈跟踪可以在这里找到 - https://gist.github.com/anonymous/7a48e7c24d530118e5dfc0a75b982be2

出了什么问题?TIA。

python django django-debug-toolbar

4
推荐指数
1
解决办法
2171
查看次数