我试图将网站名称与URL分开.例如 - 如果网址是www.google.com,则输出应为"google".我尝试了下面的代码,一切正常,除了最后一行 - "websites.collect()".
我使用数据帧来存储网站名称,然后将其转换为rdd并对值应用拆分函数以获得我所需的输出.
逻辑似乎很好,但我想我的包配置和安装有一些错误.
错误如下所示: -
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-11-a88287400951> in <module>()
----> 1 websites.collect()
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\rdd.py in collect(self)
822 """
823 with SCCallSiteSync(self.context) as css:
--> 824 port = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
825 return list(_load_from_socket(port, self._jrdd_deserializer))
826
C:\ProgramData\Anaconda3\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1158 answer = self.gateway_client.send_command(command)
1159 return_value = get_return_value(
-> 1160 answer, self.gateway_client, self.target_id, self.name)
1161
1162 for temp_arg in temp_args:
C:\ProgramData\Anaconda3\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
61 def deco(*a, **kw):
62 try:
---> 63 return f(*a, …
Run Code Online (Sandbox Code Playgroud)