标签: pipeline

如何立即将所有 Snakemake 作业提交到 slurm 集群

我正在snakemake构建一个可以在 SLURM 集群上运行的变体调用管道。集群有登录节点和计算节点。srun任何真正的计算都应该以or作业的形式在计算节点上完成sbatch。作业的运行时间限制为 48 小时。我的问题是，处理许多样本，尤其是当队列繁忙时，将需要超过 48 小时来处理每个样本的所有规则。传统的集群执行使snakemake主线程保持运行，仅在所有规则的依赖项完成运行后才将规则提交到队列。我应该在计算节点上运行这个主程序，因此这将整个管道的运行时间限制为 48 小时。

我知道 SLURM 作业有依赖指令，告诉作业等待运行，直到其他作业完成。由于snakemake工作流是一个 DAG，是否可以一次提交所有作业，并且每个作业都具有由 DAG 中的规则依赖项定义的依赖项？提交所有作业后，主线程将完成，从而绕过 48 小时的限制。这是否可能snakemake，如果可以，它是如何工作的？我找到了--immediate-submit命令行选项，但我不确定这是否具有我正在寻找的行为以及如何使用该命令，因为我的集群在Submitted batch job [id]作业提交到队列后打印，而不仅仅是作业 ID。

python pipeline bioinformatics slurm snakemake

6
推荐指数

1
解决办法

1766
查看次数

工作流程文件无效

我收到错误：“一个步骤不能同时具有uses和run键”，但我没有看到一个步骤同时具有uses和run。有人可以帮我弄清楚这有什么问题吗？

on:
  pull_request:
    branches:
    - master

env:
  IMAGE_NAME: api

jobs:
  build:
    name: Application build
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repository (#1)
      uses: actions/checkout@v2
    - name: Setup .NET Core
      uses: actions/setup-dotnet@v1
      with:
        dotnet-version: 3.1.101
    - name: Build API
      run: dotnet build --configuration Release

  tests:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout repository (#2)
      uses: actions/checkout@v2
    - name: Setup .NET Core
      uses: actions/setup-dotnet@v1
      with:
        dotnet-version: 3.1.101
    - name: Run API Tests
      run: dotnet test …

Run Code Online (Sandbox Code Playgroud)

pipeline github github-actions

6
推荐指数

0
解决办法

6789
查看次数

MongoDB $lookup 管道：这是否使用索引？

我有一个查询，它利用 $lookup 的管道功能，并且还使用 $expr。嗯，它可以工作，但性能不是很好。它在包含大约 4000 个文档的集合中查找内容，并连接其他 2 个集合（使用 $lookup 块）。尽管每个集合中只有几千个文档，但运行时间大约为 2000 毫秒。

该查询看起来像这样：

            {
                $match: {
                   language: 'str'
                }
            },
            {
                $lookup: {
                    from: 'somecollection',
                    let: { someId: '$someId' },
                    pipeline: [
                        {
                            $match: {
                                $expr: {
                                    $and: [
                                        {
                                            $eq: [
                                                '$_id',
                                                '$$someId'
                                            ]
                                        },
                                        {
                                            $gte: ['$field',value]
                                        },
                                        {
                                            $lte: ['$field2',value]
                                        }
                                       ....
                                       // some more conditions..

                                    ]
                                }
                            }
                        }
                    ]

Run Code Online (Sandbox Code Playgroud)

对此运行解释（）仅提供有关第一个 $match 块的信息。但是如何判断管道中的 $expr 是否使用索引呢？

我尝试向管道中使用的所有字段添加索引，并且还尝试创建复合索引，但我无法使其更快。

我怎样才能提高性能？

我的查询的结构：

match (filter by language),
lookup (col1 join)
lookup …

Run Code Online (Sandbox Code Playgroud)

pipeline join aggregate mongodb

6
推荐指数

1
解决办法

4126
查看次数

节点 - 在管道之后正确关闭流

假设我有以下代码：

try {
    let size = 0;

    await pipeline(
        fs.createReadStream('lowercase.txt'),
        async function* (source) {
            for await (const chunk of source) {
                size += chunk.length;
           
                if (size >= 1000000) {
                    throw new Error('File is too big');
                }

                yield String(chunk).toUpperCase();
            }
        },
        fs.createWriteStream('uppercase.txt')
    );

    console.log('Pipeline succeeded.');
} catch (error) {
    console.log('got error:', error);
}

Run Code Online (Sandbox Code Playgroud)

如何确保在每种情况下都正确关闭流？节点文档没有多大帮助——它们只是告诉我，我将有悬空事件侦听器：

Stream.pipeline() 将在所有流上调用stream.destroy(err)，除了：

已发出“结束”或“关闭”的可读流。

已发出“完成”或“关闭”信号的可写流。

调用回调后，stream.pipeline() 在流上留下悬空事件侦听器。在失败后重用流的情况下，这可能会导致事件侦听器泄漏和吞没错误。

javascript pipeline pipe stream node.js

6
推荐指数

2
解决办法

8939
查看次数

sklearn Pipeline：“ColumnTransformer”类型的参数不可迭代

我正在尝试使用管道来提供集成投票分类器，因为我希望集成学习器使用在不同特征集上训练的模型。为此，我遵循了[1]中提供的教程。

以下是迄今为止我可以开发的代码。

y = df1.index
x = preprocessing.scale(df1)

phy_features = ['A', 'B', 'C']
phy_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
phy_processer = ColumnTransformer(transformers=[('phy', phy_transformer, phy_features)])

fa_features = ['D', 'E', 'F']
fa_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
fa_processer = ColumnTransformer(transformers=[('fa', fa_transformer, fa_features)])


pipe_phy = Pipeline(steps=[('preprocessor', phy_processer ),('classifier', SVM)])
pipe_fa = Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])

ens = VotingClassifier(estimators=[pipe_phy, pipe_fa])

cv = KFold(n_splits=10, random_state=None, shuffle=True)
for train_index, test_index in cv.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    ens.fit(x_train,y_train)
    print(ens.score(x_test, …

Run Code Online (Sandbox Code Playgroud)

python pipeline feature-selection scikit-learn ensemble-learning

6
推荐指数

1
解决办法

4123
查看次数

Typescript 无法仅找到 CI 类

我有一个Angular项目，当我在本地运行“ng test”时，一切正常，但是在 bitbucket 管道中我遇到了这个错误。

src/app/services/user-store/user.service.ts:7:22 中出现错误 - 错误 TS2307：找不到模块“../../models/user”或其相应的类型声明。7 从 '../../models/user' 导入 { User };

我的 tsconfig.base

{
  "compileOnSave": false,
  "compilerOptions": {
    "baseUrl": "./",
    "outDir": "./dist/out-tsc",
    "sourceMap": true,
    "declaration": false,
    "downlevelIteration": true,
    "experimentalDecorators": true,
    "moduleResolution": "node",
    "importHelpers": true,
    "target": "es2015",
    "module": "es2020",
    "lib": [
      "es2018",
      "dom"
    ]
  }
}

Run Code Online (Sandbox Code Playgroud)

我的 tsconfig.spec.json

{
  "extends": "./tsconfig.base.json",
  "compilerOptions": {
    "outDir": "./out-tsc/spec",
    "types": [
      "jasmine"
    ]
  },
  "files": [
    "src/test.ts",
    "src/polyfills.ts"
  ],
  "include": [
    "src/**/*.spec.ts",
    "src/**/*.d.ts"
  ]
}

Run Code Online (Sandbox Code Playgroud)

我认为与我的 tsconfig.base 中的“目标”相关，但我不确定，也许我的 yaml …

pipeline bitbucket ecmascript-5 typescript bitbucket-pipelines

6
推荐指数

0
解决办法

665
查看次数

如何创建手动运行的 GitLab 管道作业？

我想知道如何手动触发项目 CI 管道中的特定作业。由于只有一个 gitlab-ci.yml 文件，因此我可以定义许多作业来依次执行。但是，如果我想启动一个仅执行一项工作的手动 CI 管道该怎么办？据我了解，每次管道运行时，它将运行所有作业，除非我使用许多only类似的参数。例如，当我有这个简单的管道配置时：

stages:
    - build

build:
    stage: build
    script:
        - npm i
        - npm run build
        - echo "successful build"

Run Code Online (Sandbox Code Playgroud)

如果我只想运行一个echo运行简单echo "hello"脚本的作业，但仅在手动运行它时执行此操作，我该怎么办？像这样的工作没有“触发点”，据我所知。这有可能吗？

感谢您的澄清！

pipeline gitlab gitlab-ci

6
推荐指数

1
解决办法

5297
查看次数

Gitlab ci 问题是通过触发器将工件传递到下游管道并需要关键字

我正在开发一个多管道项目，并使用trigger关键字触发下游管道，但我无法传递在上游项目中创建的工件。我用来needs获取这样的工件：

获取工件的下游管道块：

needs:
    - project: workspace/build
        job: build
        ref: master
        artifacts: true

Run Code Online (Sandbox Code Playgroud)

要触发的上游管道块：

build:
    stage: build
    artifacts:
    paths:
        - ./policies
    expire_in: 2h
    only:
    - master
    script:
    - echo 'Test'
    allow_failure: false

triggerUpstream:
    stage: deploy
    only:
    - master
    trigger:
    project: workspace/deploy

Run Code Online (Sandbox Code Playgroud)

但我收到以下错误：

This job depends on other jobs with expired/erased artifacts:

Run Code Online (Sandbox Code Playgroud)

我不确定出了什么问题。

pipeline gitlab gitlab-ci gitlab-pipelines

6
推荐指数

1
解决办法

1万
查看次数

"Hello" |> printfn 在 F# 中生成错误

https://tryfsharp.fsbolero.io/

printfn "Hello"

Run Code Online (Sandbox Code Playgroud)

但是，使用管道运算符可以按预期工作，没有错误

"Hello" |> printfn

Run Code Online (Sandbox Code Playgroud)

类型“string”与类型“Printf.TextWriterFormat”不兼容

我了解管道操作员的行为：

f(a)相当于a |> f

为什么后者会产生错误？谢谢。

f# pipeline function

6
推荐指数

1
解决办法

161
查看次数

Azure DevOps 管道任务 `task: gitversion/execute@0` 失败，并出现意外错误“##[error]SyntaxError: Unexpected end of JSON input”

我正在运行管道，执行一个简单的任务来安装 git 并通过以下命令检查版本tasks。

一切都运行良好，直到我使用相同的 yaml 文件创建另一个临时管道以进行一些额外的测试和开发。

我没有对 yaml 文件进行任何更改，该文件仍然以绿色运行，具有相同的任务。

但是新管道中的执行任务失败并出现“意外”错误

  steps:
    - task: gitversion/setup@0
      displayName: Install GitVersion
      inputs:
        versionSpec: "5.10.x"
    
    - task: gitversion/execute@0
      displayName: Determine Version
      inputs:
        useConfigFile: true
        configFilePath: ./gitversion.yml

Run Code Online (Sandbox Code Playgroud)

输出如下所示：


Command: dotnet-gitversion /agent/_work/26/s /output json /output buildserver /config /agent/_work/26/s/gitversion.yml
/opt/hostedtoolcache/GitVersion.Tool/5.10.3/x64/dotnet-gitversion /agent/_work/26/s /output json /output buildserver /config /agent/_work/26/s/gitversion.yml
  ERROR [09/20/22 12:54:22:24] An unexpected error occurred:
System.NullReferenceException: Object reference not set to an instance of an object.
   at LibGit2Sharp.Core.Handles.ObjectHandle.op_Implicit(ObjectHandle handle) in /_/LibGit2Sharp/Core/Handles/Objects.cs:line 509
   at LibGit2Sharp.Core.Proxy.git_commit_author(ObjectHandle obj) in /_/LibGit2Sharp/Core/Proxy.cs:line …

Run Code Online (Sandbox Code Playgroud)

pipeline gitversion azure-devops azure-yaml-pipelines

6
推荐指数

1
解决办法

8891
查看次数

标签统计

azure-devops ×1

azure-yaml-pipelines ×1

bioinformatics ×1

bitbucket-pipelines ×1

ecmascript-5 ×1

ensemble-learning ×1

f# ×1

feature-selection ×1

github-actions ×1

gitlab-pipelines ×1

join ×1

pipe ×1

scikit-learn ×1

«
1
…
17
18
19
20
21
…
62
»