是否有一种简单的方法可以使用列表推导来展平迭代列表,或者失败,你会认为什么是平衡这样的浅层列表,平衡性能和可读性的最佳方法?
我尝试使用嵌套列表理解来压缩这样的列表,如下所示:
[image for image in menuitem for menuitem in list_of_menuitems]
Run Code Online (Sandbox Code Playgroud)
但我在NameError那里遇到麻烦,因为name 'menuitem' is not defined.谷歌搜索并浏览Stack Overflow后,我得到了一个reduce声明所需的结果:
reduce(list.__add__, map(lambda x: list(x), list_of_menuitems))
Run Code Online (Sandbox Code Playgroud)
但是这个方法相当难以理解,因为我需要那个list(x)调用,因为x是一个Django QuerySet对象.
结论:
感谢所有为此问题做出贡献的人.以下是我学到的内容摘要.我也将其作为社区维基,以防其他人想要添加或更正这些观察结果.
我原来的reduce语句是多余的,用这种方式编写得更好:
>>> reduce(list.__add__, (list(mi) for mi in list_of_menuitems))
Run Code Online (Sandbox Code Playgroud)
这是嵌套列表理解的正确语法(Brilliant summary dF!):
>>> [image for mi in list_of_menuitems for image in mi]
Run Code Online (Sandbox Code Playgroud)
但这些方法都不如使用效率高itertools.chain:
>>> from itertools import chain
>>> list(chain(*list_of_menuitems))
Run Code Online (Sandbox Code Playgroud)
正如@cdleary指出的那样,通过使用chain.from_iterable如下所示来避免*操作符魔术可能是更好的风格:
>>> chain = itertools.chain.from_iterable([[1,2],[3],[5,89],[],[6]])
>>> print(list(chain))
>>> [1, 2, …Run Code Online (Sandbox Code Playgroud) 我正在尝试将多处理与pandas数据帧一起使用,即将数据帧拆分为8个部分.使用apply(每个部分在不同的过程中处理)对每个部分应用一些功能.
编辑:这是我最终找到的解决方案:
import multiprocessing as mp
import pandas.util.testing as pdt
def process_apply(x):
# do some stuff to data here
def process(df):
res = df.apply(process_apply, axis=1)
return res
if __name__ == '__main__':
p = mp.Pool(processes=8)
split_dfs = np.array_split(big_df,8)
pool_results = p.map(aoi_proc, split_dfs)
p.close()
p.join()
# merging parts processed by different processes
parts = pd.concat(pool_results, axis=0)
# merging newly calculated parts to big_df
big_df = pd.concat([big_df, parts], axis=1)
# checking if the dfs were merged correctly
pdt.assert_series_equal(parts['id'], big_df['id'])
Run Code Online (Sandbox Code Playgroud) 有关更多设置,请参阅此问题。我想Toy并行创建大量 class 实例。然后我想将它们写入 xml 树。
import itertools
import pandas as pd
import lxml.etree as et
import numpy as np
import sys
import multiprocessing as mp
def make_toys(df):
l = []
for index, row in df.iterrows():
toys = [Toy(row) for _ in range(row['number'])]
l += [x for x in toys if x is not None]
return l
class Toy(object):
def __new__(cls, *args, **kwargs):
if np.random.uniform() <= 1:
return super(Toy, cls).__new__(cls, *args, **kwargs)
def __init__(self, row):
self.id = …Run Code Online (Sandbox Code Playgroud) 因此,我想并行地遍历pandas df,所以假设我有15行,那么我想并行地遍历它,而不是一个接一个地迭代。
df:-
df = pd.DataFrame.from_records([
{'domain':'dnd','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'hrpd','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'blhp','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'rbswp','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'foxbp','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'rbsxbp','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'dnd','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' },
{'domain':'hrpd','duration':'90','media_file':'testfont.wav','user':'tester_food','channel':'confctl-2' }
])
Run Code Online (Sandbox Code Playgroud)
所以,我遍历df并制作命令行,然后将输出存储在df中并进行数据过滤,最后将其存储到influxdb中。问题是我在迭代过程中一个接一个地做。我想并行遍历所有行。
截至目前,我已经制作了20个脚本,并使用多处理并行处理所有脚本。当我必须在所有20个脚本中进行更改时,这是一种痛苦。我的脚本如下所示:-
for index, row in dff.iterrows():
domain = row['domain']
duration = str(row['duration'])
media_file = row['media_file']
user = row['user']
channel = row['channel']
cmda = './vaa -s https://' + domain + '.www.vivox.com/api2/ -d ' +
duration + ' -f ' + media_file + ' -u .' + user + '. -c
sip:confctl-2@' + domain + '.localhost.com -ati 0ps-host …Run Code Online (Sandbox Code Playgroud)