GATK Cromwell +WDL学习

2024-03-28 21:18

文章标签 学习 gatk cromwell wdl

本文主要是介绍GATK Cromwell +WDL学习，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

WDL （一个workflow description language）+ Cromwell（an execution engine that can run WDL scripts）是目前可以更好使用GATK的一套工具。这里学习wdl的快速入门教程。
我这里使用sublime text3，因此设置新的wdl对应的高亮。根据package control 下载package control包。在sublime text里Preference-> package control -> install package -> wdl syntax 安装。

WDL

Base structure

Top-level components: workflow, task and call
workflow在顶层，用calls去执行tasks。tasks在workflow模块外被定义。

workflow myWorkFlowName {call task_Acall task_B
}
task task_A{...}
task task_B{...}

Core task-level components: command and output
task的核心成分：command被运行，output明确指出哪一部分命令组成了输出。

task task_A {command {...}output {...}
}

Add Variables

有2种不同层次的variables，一种存在于task，一种存在于整个workflow。variable也可以从一个task传递到下一个task。
- Adding task-level variables

task task_A {File refFile inString idcommand {do_stuff R=${ref} I=${in} O=${id}.ext}output {File out="${id}.ext"}
}

Adding workflow-level variables

workflow myWorkflowName {File my_refFile my_inputString namecall task_A {input: ref=my_ref, in=my_input, id=name}call task_B {input: ref=my_ref, in=task_A.out}
}
task task_A{...}
task task_B{...}

Adding Plumbing

simple connections linear or simple branching and merging
Switching and iterating logic switch between alternate pathways and iterate over sets of data, either in series or in parallel. scatter-gather parallelism
Efficiency through code re-use
- Linear Chaining
在call里面就要开始定义前一个的结果

call stepB {input: in=stepA.out}
call stepC {input: in=stepB.out}

Multi-input/Multi-output

call stepC {input: in1=stepB.out1, in2=stepB.out2}

Branch & Merge

call stepB {input: in=stepA.out}
call stepC {input: in=stepA.out}
call stepD {input: in1=stepB.out, in2=stepC.out}

Scatter-Gather

Parallelism即平行可以使得任务更快，而非顺序进行。我们使用了基于WDL standard library的scatter，会产生可平行的任务（成为一列Input，array），并且会输出结果（也是array）。Scatter这个过程是外显的（explicit），而gather这个过程是不外显的（implicit）。

Array[file] inputFilessactter (oneFile in inputFiles) {call stepA {input: in=oneFile}}call stepB {input: files=stepA.out}

Task Aliasing

如果有一些copy-paste的任务，可以用task aliasing避免重复粘贴命令。

call stepA as firstSample {input: in=firstInput}
call stepA as secondSample {input: in=secondInput}
call stepB {input: in=firstSample.out}
call stepB {input: in=secondSample.out}

Validate Syntax

用validate于脚本：

$java -jar wdltool.jar validate myWorkflow.wdl

例子：error if a call references a task that doesn’t exist:

$java -jar wdltool.jar validate myWorkflow.wdl
ERROR: Call references a taks (BADps) that doesn't exist (line 22, col 8)call BADps^

Specify Inputs

相比于直接输入input，这里可以用JSON文件为所有输入变量规定值。
- Generating the template JSON
我们可以用wdltool inputs函数：

java -jar wdltool.jar inputs myWorkflow.wdl > myWorkflow_inputs.json

结果：

{"<workflow name>.<task name>.<variable name>": "<variable type>"
}

Customizing the inputs file for a particular run
要注意<variable type>在最初得到的template里只是提醒变量的类型，在实际使用中要改为相应的文件或者名字。但是，也要取名比较好辨认，这样避免回到脚本里对看究竟是哪个变量。比如：

{"myWorkflowName.stepA.input_file": "File""myWorkflowName.stepA.sample_name": "String"
}

那么，如果一个文件叫input.bam，样本名字叫NA12878，那么我们应该对应修改如下：

{"myWorkflowName.stepA.input_file": "~/path/to/input.bam""myWorkflowName.stepA.sample_name": "NA12878"
}

Execute!

Cromwell
使用Cromwell。Cromwell是一个开源的由Java编写支持WDL的执行工具。可以在3个平台支持支持WDL的执行：本地机器，本地服务器集群，云平台。需要java 8
基本命令：

java -jar cromwell.jar <action> <parameters>

Running WDL on Cromwell locally

java -jar Cromwell.jar run myWorkflow.wdl --inputs myWorkflow_inputs.json

这篇关于GATK Cromwell +WDL学习的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

GATK Cromwell +WDL学习

WDL

Base structure

Add Variables

Adding Plumbing

Validate Syntax

Specify Inputs

Execute!

相关文章

Go学习记录之runtime包深入解析

Android学习总结之Java和kotlin区别超详细分析

重新对Java的类加载器的学习方式

Java学习手册之Filter和Listener使用方法

Java进阶学习之如何开启远程调式

Java深度学习库DJL实现Python的NumPy方式

HarmonyOS学习(七)——UI（五）常用布局总结

Ilya-AI分享的他在OpenAI学习到的15个提示工程技巧

【前端学习】AntV G6-08 深入图形与图形分组、自定义节点、节点动画（下）

学习hash总结