小编w.e*_*ric的帖子

如果作业失败，我如何自动重新排队 SLURM 的 srun 作业？

我必须运行 300 个相同模型的作业（黑匣子）。然而，有时模型内部会出现分段错误并显示以下错误消息：

srun: error: nodexyz: task 0: Segmentation fault

Run Code Online (Sandbox Code Playgroud)

集群使用 SLURM 作为资源管理器，如果它失败，我想自动重新排队这项工作。

slurm

w.e*_*ric

lucky-day

3
推荐指数

1
解决办法

2470
查看次数

火花笛卡尔积

我必须比较坐标才能获得距离。因此，我用sc.textFile（）加载数据并制成笛卡尔积。文本文件中大约有2.000.000行，因此需要比较2.000.000 x 2.000.000坐标。

我用大约2.000的坐标测试了代码，并且在几秒钟内运行良好。但是使用大文件似乎在某个时刻停止了，我不知道为什么。该代码如下所示：

def concat(x,y):
    if(isinstance(y, list)&(isinstance(x,list))):
        return x + y
    if(isinstance(x,list)&isinstance(y,tuple)):
        return x + [y]
    if(isinstance(x,tuple)&isinstance(y,list)):
        return [x] + y
    else: return [x,y]

def haversian_dist(tuple):
    lat1 = float(tuple[0][0])
    lat2 = float(tuple[1][0])
    lon1 = float(tuple[0][2])
    lon2 = float(tuple[1][2])
    p = 0.017453292519943295
    a = 0.5 - cos((lat2 - lat1) * p)/2 + cos(lat1 * p) * cos(lat2 * p) * (1 - cos((lon2 - lon1) * p)) / 2
    print(tuple[0][1])
    return (int(float(tuple[0][1])), (int(float(tuple[1][1])),12742 * asin(sqrt(a))))

def sort_val(tuple):
    dtype = [("globalid", …

Run Code Online (Sandbox Code Playgroud)

python cartesian-product apache-spark

w.e*_*ric

2016 08-08

2
推荐指数

1
解决办法

1656
查看次数