我需要比较两个列表,以便创建一个列表中找到的特定元素的新列表,但不能在另一个列表中找到.例如:
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]
Run Code Online (Sandbox Code Playgroud)
我想循环遍历list_1并向list_list追加list_2中找不到的所有元素.
结果应该是:
main_list=["f", "m"]
Run Code Online (Sandbox Code Playgroud)
我怎么能用python做到这一点?
我有一个dict列表,如下所示:
list=[{u'hello':['001', 3], u'word':['003', 1], u'boy':['002', 2]},
{u'dad':['007', 3], u'mom':['005', 3], u'honey':['002', 2]} ]
Run Code Online (Sandbox Code Playgroud)
我需要的是迭代我的列表,以创建这样的元组列表:
new_list=[('hello','001', 3), ('word','003',1), ('boy','002', 2)
('dad','007',3), ('mom', '005', 3), ('honey','002',2)]
Run Code Online (Sandbox Code Playgroud)
注意!零('001',003'等等)的数字必须考虑为字符串.
有没有人可以帮助我?
我正在 python3 环境中使用 pyspark。我有一个数据框,我正在尝试将一列密集 vectos 拆分为多个列值。我的 df 是这样的:
df_vector = kmeansModel_2.transform(finalData).select(['scalaredFeatures',
'prediction'])
df_vector.show()
+--------------------+----------+
| scalaredFeatures|prediction|
+--------------------+----------+
|[0.56785108466505...| 0|
|[1.41962771166263...| 0|
|[2.20042295307707...| 0|
|[0.14196277116626...| 0|
|[1.41962771166263...| 0|
+-------------------------------+
Run Code Online (Sandbox Code Playgroud)
好吧,为了完成我的任务,我使用了以下代码:
def extract(row):
return (row.prediction, ) + tuple(row.scalaredFeatures.toArray().tolist())
df = df_vector.rdd.map(extract)toDF(["prediction"])
Run Code Online (Sandbox Code Playgroud)
不幸的是,我收到一个错误:
Py4JJavaError: An error occurred while calling
z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 52.0 failed 1 times, most recent failure: Lost task
0.0 in stage 52.0 (TID 434, localhost, executor driver):
org.apache.spark.api.python.PythonException: Traceback (most …Run Code Online (Sandbox Code Playgroud) 在网络抓取电子商务网站后,我已将所有数据保存到pandas数据框中.好吧,当我试图将我的pandas数据帧保存到excel文件但是我收到以下错误:
Traceback (most recent call last):
File "<ipython-input-7-3dafdf6b87bd>", line 2, in <module>
sheet_name='Dolci', encoding='iso-8859-1')
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\core\frame.py", line
1466, in to_excel
excel_writer.save()
File "C:\ProgramData\Anaconda2\lib\site-packages\pandas\io\excel.py", line
1502, in save
return self.book.close()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\workbook.py",
line 299, in close
self._store_workbook()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\workbook.py",
line 607, in _store_workbook
xml_files = packager._create_package()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\packager.py",
line 139, in _create_package
self._write_shared_strings_file()
File "C:\ProgramData\Anaconda2\lib\site-packages\xlsxwriter\packager.py",
line 286, in _write_shared_strings_file
sst._assemble_xml_file()
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 53, in _assemble_xml_file
self._write_sst_strings()
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 83, in _write_sst_strings
self._write_si(string)
File "C:\ProgramData\Anaconda2\lib\site-
packages\xlsxwriter\sharedstrings.py", line 110, …Run Code Online (Sandbox Code Playgroud) python ×3
list ×2
dataframe ×1
dictionary ×1
encoding ×1
pyspark ×1
python-3.x ×1
tuples ×1
utf-8 ×1