蟒蛇 3.5。我在一个目录中有几百个 .mat mat 文件(7.3 版)。我正在遍历所有这些以提取数据的两个不同部分。我循环遍历并获取并获取第一批没有任何问题,但是当我再次执行完全相同的操作时,仅提取了数据的不同部分,我收到以下错误:
Traceback (most recent call last):
File "v73_test.py", line 43, in <module>
mrfs_data = extract.convert1simProteinComCountsIntoDataFrame(path2mats)
File "/home/oli/Downloads/PhD/wc/mg/version_73_stuff/functions_for_joshuas_matFiles/extract_matFile_data_v73.py", line 586, in convert1simProteinComCountsIntoDataFrame
raw_data = getMatureProteinComplexs(path2mats, state_no)
File "/home/oli/Downloads/PhD/wc/mg/version_73_stuff/functions_for_joshuas_matFiles/extract_matFile_data_v73.py", line 53, in getMatureProteinComplexs
if len(np.array(state_file['ProteinComplex']['counts']).shape) == 3:
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "/home/oli/virtualenvs/standard_python3.5/lib/python3.5/site-packages/h5py/_hl/dataset.py", line 696, in __array__
self.read_direct(arr)
File "/home/oli/virtualenvs/standard_python3.5/lib/python3.5/site-packages/h5py/_hl/dataset.py", line 657, in read_direct
self.id.read(mspace, fspace, dest, dxpl=self._dxpl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in …
Run Code Online (Sandbox Code Playgroud) 我有一个用 Matlab 编写的复杂模型。该模型不是由我们编写的,最好将其视为“黑匣子”,即为了从内部解决相关问题,需要重新编写整个模型,这需要数年时间。
如果我有一个“令人尴尬的并行”问题,我可以使用一个数组来提交带有选项的相同模拟的 X 变体#SBATCH --array=1-X
。然而,集群通常对最大数组大小有一个(令人沮丧的小)限制。
在使用 PBS/TORQUE 集群时,我通过强制 Matlab 在单个线程上运行,请求多个 CPU,然后在后台运行多个 Matlab 实例来解决这个问题。一个示例提交脚本是:
#!/bin/bash
<OTHER PBS COMMANDS>
#PBS -l nodes=1:ppn=5,walltime=30:00:00
#PBS -t 1-600
<GATHER DYNAMIC ARGUMENTS FOR MATLAB FUNCTION CALLS BASED ON ARRAY NUMBER>
# define Matlab options
options="-nodesktop -noFigureWindows -nosplash -singleCompThread"
for sub_job in {1..5}
do
<GATHER DYNAMIC ARGUMENTS FOR MATLAB FUNCTION CALLS BASED ON LOOP NUMBER (i.e. sub_job)>
matlab ${options} -r "run_model(${arg1}, ${arg2}, ..., ${argN}); exit" &
done
wait
<TIDY UP AND FINISH COMMANDS> …
Run Code Online (Sandbox Code Playgroud) 我一直很喜欢Python3.5中添加的元组理解:
In [128]: *(x for x in range(5)),
Out[128]: (0, 1, 2, 3, 4)
Run Code Online (Sandbox Code Playgroud)
但是,当我return
直接尝试元组理解时,我得到一个错误:
In [133]: def testFunc():
...: return *(x for x in range(5)),
...:
File "<ipython-input-133-e6dd0ba638b7>", line 2
return *(x for x in range(5)),
^
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)
这只是一个轻微的不便,因为我可以简单地将元组理解分配给变量并返回变量.但是,如果我尝试在字典理解中加入元组理解,我会得到同样的错误:
In [130]: {idx: *(x for x in range(5)), for idx in range(5)}
File "<ipython-input-130-3e9a3eee879c>", line 1
{idx: *(x for x in range(5)), for idx in range(5)}
^
SyntaxError: invalid syntax
Run Code Online (Sandbox Code Playgroud)
我觉得这有点问题,因为在某些情况下,压缩对性能很重要.
在这些情况下使用字典和列表推导没有问题.当其他人做的时候,有多少其他情况是元组理解无法发挥作用?或许我使用它错了?
它让我想知道如果它的使用是如此有限或者我做错了什么的重点是什么?如果我没有做错什么,那么创建一个足够多的元组的最快/最pythonic方式是什么,以与列表和字典理解相同的方式使用?
如果我创建一个导入库并用于dill
pickle它的类,当我取消pickle它时我找不到该库:
import dill
from sklearn.metrics.cluster import adjusted_rand_score
import pandas as pd
import random
class Test1():
def __init__(self, df):
self.genomes = df
@staticmethod
def percentageSimilarityDistance(genome1, genome2):
if len(genome1) != len(genome2):
raise ValueError('Genome1 and genome2 must have the same length!')
is_gene_correct = [1 if genome1[idx] == genome2[idx] else 0 for idx in range(len(genome1))]
return (1 - sum(is_gene_correct)/(len(is_gene_correct) * 1.0))
def createDistanceMatrix(self, distance_function):
"""Takes a dictionary of KO sets and returns a distance (or similarity) matrix which is basically how many …
Run Code Online (Sandbox Code Playgroud)