开箱即用版本 满分室间质评之GATK Somatic SNV+Indel+CNV+SV

2023-10-07 13:28

本文主要是介绍开箱即用版本 满分室间质评之GATK Somatic SNV+Indel+CNV+SV,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

最近准备为sliverworkspace 图形化生信平台开发报告设计器,需要一个较为复杂的pipeline作为测试数据,就想起来把之前的 满分室间质评之GATK Somatic SNV+Indel+CNV+SV(下)性能优化翻出来用一下。跑了一遍发现还是各种问题,于是想把pipeline改造成免部署、首次运行初始化环境的版本,以便需要时候能够直接运行起来,于是有了本文。

一句话描述就是:开箱即用的pipeline,能够自动初始化环境、安装所需软件、下载ref文件和数据库的版本

为了让pipeline能够直接运行,无需部署,这里使用docker容器保证运行环境的一致性:见文章:基于docker的生信基础环境镜像构建,这里采用的方案是带ssh服务的docker+conda环境,整个pipeline在一个通用容器中运行。

本文代码较长,可能略复杂,想直接运行的可以下载 workflow文件,导入sliverworkspace图形化生信平台直接运行。

相关代码和workflow文件可以访问笔者github项目地址或这gitee项目地址

导入操作

在这里插入图片描述

分析流程整体概览

在这里插入图片描述

docker 镜像拉取及部署配置

# 拉取docker镜像
docker     pull     doujiangbaozi/sliverworkspace:1.10

docker-compose.yml配置文件

version: "3"
services:GATK:image: doujiangbaozi/sliverworkspace:1.10container_name: GATKvolumes:- /home/sliver/Data/data:/opt/data:rw                                #挂载输入数据目录- /home/sliver/Manufacture/gatk/envs:/root/mambaforge-pypy3/envs     #挂载envs目录- /home/sliver/Manufacture/sliver/ref:/opt/ref:rw                    #挂载reference目录- /home/sliver/Manufacture/gatk/result:/opt/result:rw                #挂载中间文件和结果目录environment:- TZ=Asia/Shanghai                                                   #设置时区- PS=20191124                                                        #设置ssh密码- PT=9024                                                            #设置ssh连接端口

docker 镜像部署运行

# 在docker-compose.yml文件同级目录下运行
docker-compose up -d# 或者docker-compose -f docker-compose.yml所在目录
docker-compose -f somedir/docker-compose.yml up -d# 可以通过ssh连接到docker 运行pipeline命令了,连接端口和密码见docker-compose.yml配置文件相关字段
ssh root@127.0.0.1 -p9024

测试数据

测试数据来自2017年卫计委室间质评提供的bed文件(pipeline会自动下载)和测试数据,修改命名以匹配pipeline输入端,也可以替换为自己的数据文件,因为室间质评目前参考基因组还停留在hg19版本,所以本流程仍然使用hg19(GRCH37),有需要的可以自行替换。后期会提供hg38(GRCH38)版本

文件名(按照需要有调整)文件大小MD5
B1701_R1.fq.gz4.85G07d3cdccee41dbb3adf5d2e04ab28e5b
B1701_R2.fq.gz4.77Gc2aa4a8ab784c77423e821b9f7fb00a7
B1701NC_R1.fq.gz3.04G4fc21ad05f9ca8dc93d2749b8369891b
B1701NC_R2.fq.gz3.11Gbc64784f2591a27ceede1727136888b9

变量名称

# 变量初始化赋值sn=1701                               #样本编号pn=GS03                               #pipeline 代号version_openjdk=8.0.332               #java openjdk 版本                         version_cnvkit=0.9.10                 #cnvkit 版本version_manta=1.6.0                   #manta 版本version_gatk=4.3.0.0                  #gatk 版本                                 version_sambamba=1.0.1                #sambamba 版本                             version_samtools=1.17                 #samtools 版本                             version_bwa=0.7.17                    #bwa 版本                                  version_fastp=0.23.2                  #fastp 版本                                version_vep=108                       #vep 版本                                  envs=/root/mambaforge-pypy3/envs	    #mamba envs 目录                           threads=32                         	#最大线程数                                   memory=32G                        	#内存占用                                    scatter=8                          	#Mutect2 分拆并行运行interval list 份数          event=2                          	    #gatk FilterMutectCalls --max-events-in-region 数值snv_tlod=16.00                      	#snv 过滤参数 tload 值                        snv_vaf=0.01                       	#snv 过滤参数 丰度/突变频率                        snv_depth=500                        	#snv 过滤参数 支持reads数/depth 测序深度            cnv_dep=1000                       	#cnv 过滤参数 支持reads数/depth 测序深度            cnv_min=-0.5                       	#cnv 过滤参数 log2最小值                        cnv_max=0.5                        	#cnv 过滤参数 log2 最大值                       sv_score=200                        	#sv  过滤参数 score                           # 以上变量个可以具体根据需求调整

表格:

变量名变量值备注
sn1701样本编号
pnGS03pipeline 代号 GATK Somatic 03版本
version_openjdk8.0.332java openjdk 版本
version_cnvkit0.9.10cnvkit 版本
version_manta1.6.0manta版本
version_gatk4.3.0.0gatk 版本
version_sambamba1.0.1sambamba 版本
version_samtools1.17samtools 版本
version_bwa0.7.17bwa 版本
version_fastp0.23.2fastp 版本
version_vep108vep 版本
envs/root/mambaforge-pypy3/envsmamba envs 目录
threads32最大线程数
memory32G内存占用
scatter8Mutect2 分拆并行运行interval list 份数
event2gatk FilterMutectCalls --max-events-in-region 数值
snv_tlod16.00snv 过滤参数 tload 值
snv_vaf0.01snv 过滤参数 丰度/突变频率
snv_depth500snv 过滤参数 支持reads数/depth 测序深度
cnv_dep1000cnv 过滤参数 支持reads数/depth 测序深度
cnv_min-0.5cnv 过滤参数 log2最小值
cnv_max0.5cnv 过滤参数 log2 最大值
sv_score200sv 过滤参数 score

Pipeline/workflow 具体步骤:

  1. fastp 默认参数过滤,看下原始数据质量,clean data

    #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
    if [ ! -d "${envs}/${pn}.fastp" ]; thenecho "Creating the environment ${pn}.fastp"mamba create -n ${pn}.fastp -y fastp=${version_fastp}
    fimamba	activate ${pn}.fastpmkdir -p ${result}/${sn}/trimmedfastp -w 16 \-i ${data}/GS03/${sn}_tumor_R1.fq.gz  \-I ${data}/GS03/${sn}_tumor_R2.fq.gz  \-o ${result}/${sn}/trimmed/${sn}_tumor_R1_trimmed.fq.gz \-O ${result}/${sn}/trimmed/${sn}_tumor_R2_trimmed.fq.gz \-h ${result}/${sn}/trimmed/${sn}_tumor_fastp.html \-j ${result}/${sn}/trimmed/${sn}_tumor_fastp.json &fastp -w 16 \-i ${data}/GS03/${sn}_normal_R1.fq.gz  \-I ${data}/GS03/${sn}_normal_R2.fq.gz  \-o ${result}/${sn}/trimmed/${sn}_normal_R1_trimmed.fq.gz \-O ${result}/${sn}/trimmed/${sn}_normal_R2_trimmed.fq.gz \-h ${result}/${sn}/trimmed/${sn}_normal_fastp.html \-j ${result}/${sn}/trimmed/${sn}_normal_fastp.json &
    waitmamba	deactivate
    
  2. normal文件fastq比对到参考基因组,sort 排序,mark duplicate 得到 marked.bam

    #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
    if [ ! -d "${envs}/${pn}.align" ]; thenmamba create -n ${pn}.align -y bwa=${version_bwa} samtools=${version_samtools} 
    fi#从github下载sambamba static 比 mamba 安装的版本速度快1倍以上.这是个很诡异的地方
    if [ ! -f "${envs}/${pn}.align/bin/sambamba" ]; thenaria2c https://github.com/biod/sambamba/releases/download/v${version_sambamba}/sambamba-${version_sambamba}-linux-amd64-static.gz -d ${envs}/${pn}.align/bingzip -cdf ${envs}/${pn}.align/bin/sambamba-${version_sambamba}-linux-amd64-static.gz  >  ${envs}/${pn}.align/bin/sambamba chmod a+x ${envs}/${pn}.align/bin/sambamba
    fimamba	activate ${pn}.alignmkdir	-p /opt/ref/hg19
    #如果没有检测到参考基因组序列,则下载序列并使用bwa创建索引
    if [ ! -f "/opt/ref/hg19/hg19.fasta" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -d /opt/ref/hg19 -o hg19.fasta.gzcd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gzif  [ ! -f /opt/ref/hg19.fasta.amb ] ||[ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||[ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||[ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||[ ! -f /opt/ref/hg19/hg19.fasta.sa ]; thenbwa index /opt/ref/hg19/hg19.fastafi
    elif [ -f "/opt/ref/hg19/ucsc.hg19.fasta.gz.aria2" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -c -d /opt/ref/hg19 -o hg19.fasta.gzcd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gzif  [ ! -f /opt/ref/hg19.fasta.amb ] ||[ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||[ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||[ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||[ ! -f /opt/ref/hg19/hg19.fasta.sa ]; thenbwa index /opt/ref/hg19/hg19.fastafi
    fi
    #检测samtools索引是否存在,如不存在则使用samtools创建参考基因组索引
    if [ ! -f "/opt/ref/hg19/hg19.fasta.fai" ]; thensamtools faidx /opt/ref/hg19/hg19.fasta
    fimkdir -p ${result}/${sn}/aligned
    #比对基因组管道输出成bam文件,管道输出排序
    bwa mem \-t ${threads} -M \-R "@RG\\tID:${sn}_normal\\tLB:${sn}_normal\\tPL:Illumina\\tPU:Miseq\\tSM:${sn}_normal" \/opt/ref/hg19/hg19.fasta  ${result}/${sn}/trimmed/${sn}_normal_R1_trimmed.fq.gz ${result}/${sn}/trimmed/${sn}_normal_R2_trimmed.fq.gz \| sambamba view -S -f bam -l 0 /dev/stdin \| sambamba sort -t ${threads} -m 2G --tmpdir=${result}/${sn}/aligned -o ${result}/${sn}/aligned/${sn}_normal_sorted.bam /dev/stdin#防止linux打开文件句柄数超过限制,报错退出
    ulimit -n 10240#使用sambamba对sorted bam文件标记重复
    sambamba markdup \--tmpdir ${result}/${sn}/aligned \-t ${threads} ${result}/${sn}/aligned/${sn}_normal_sorted.bam ${result}/${sn}/aligned/${sn}_normal_marked.bam #修改marked bam文件索引名,gatk和sambamba索引文件名需要保持一致
    mv  ${result}/${sn}/aligned/${sn}_normal_marked.bam.bai ${result}/${sn}/aligned/${sn}_normal_marked.bai
    #删除sorted bam文件
    rm -f ${result}/${sn}/aligned/${sn}_normal_sorted.bam*mamba	deactivate
    
  3. tumor文件fastq比对到参考基因组,sort 排序,mark duplicate 得到 marked.bam,与第2步类似

    if [ ! -d "${envs}/${pn}.align" ]; thenmamba create -n ${pn}.align -y bwa=${version_bwa} samtools=${version_samtools} 
    fi#从github下载sambamba static 比 mamba 安装的版本速度快1倍以上.
    if [ ! -f "${envs}/${pn}.align/bin/sambamba" ]; thenaria2c https://github.com/biod/sambamba/releases/download/v${version_sambamba}/sambamba-${version_sambamba}-linux-amd64-static.gz -d ${envs}/${pn}.align/bingzip -cdf ${envs}/${pn}.align/bin/sambamba-${version_sambamba}-linux-amd64-static.gz  >  ${envs}/${pn}.align/bin/sambamba chmod a+x ${envs}/${pn}.align/bin/sambamba
    fimamba	activate ${pn}.alignmkdir	-p /opt/ref/hg19if [ ! -f "/opt/ref/hg19/hg19.fasta" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -d /opt/ref/hg19 -o hg19.fasta.gzcd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gzif  [ ! -f /opt/ref/hg19.fasta.amb ] ||[ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||[ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||[ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||[ ! -f /opt/ref/hg19/hg19.fasta.sa ]; thenbwa index /opt/ref/hg19/hg19.fastafi
    elif [ -f "/opt/ref/hg19/ucsc.hg19.fasta.gz.aria2" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/ucsc.hg19.fasta.gz -c -d /opt/ref/hg19 -o hg19.fasta.gzcd /opt/ref/hg19 && gzip -d /opt/ref/hg19/hg19.fasta.gzif  [ ! -f /opt/ref/hg19.fasta.amb ] ||[ ! -f /opt/ref/hg19/hg19.fasta.ann ] ||[ ! -f /opt/ref/hg19/hg19.fasta.bwt ] ||[ ! -f /opt/ref/hg19/hg19.fasta.pac ] ||[ ! -f /opt/ref/hg19/hg19.fasta.sa ]; thenbwa index /opt/ref/hg19/hg19.fastafi
    fiif [ ! -f "/opt/ref/hg19/hg19.fasta.fai" ]; thensamtools faidx /opt/ref/hg19/hg19.fasta
    fimkdir	-p ${result}/${sn}/alignedbwa	mem \-t ${threads} -M \-R "@RG\\tID:${sn}_tumor\\tLB:${sn}_tumor\\tPL:Illumina\\tPU:Miseq\\tSM:${sn}_tumor" \/opt/ref/hg19/hg19.fasta  ${result}/${sn}/trimmed/${sn}_tumor_R1_trimmed.fq.gz ${result}/${sn}/trimmed/${sn}_tumor_R2_trimmed.fq.gz \| sambamba view -S -f bam -l 0 /dev/stdin \| sambamba sort -t ${threads} -m 2G --tmpdir=${result}/${sn}/aligned -o ${result}/${sn}/aligned/${sn}_tumor_sorted.bam /dev/stdin
    ulimit -n 10240
    sambamba  markdup \--tmpdir ${result}/${sn}/aligned \-t ${threads} ${result}/${sn}/aligned/${sn}_tumor_sorted.bam ${result}/${sn}/aligned/${sn}_tumor_marked.bam
    mv  ${result}/${sn}/aligned/${sn}_tumor_marked.bam.bai ${result}/${sn}/aligned/${sn}_tumor_marked.bai
    rm  -f ${result}/${sn}/aligned/${sn}_tumor_sorted.bam*mamba	deactivate
    
  4. 对上述bam文件生成重新校准表,为后续BQSR使用;Generates recalibration table for Base Quality Score Recalibration (BQSR)

    #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
    if [ ! -f "${envs}/gatk/bin/gatk" ]; thenmkdir -p ${envs}/gatk/bin#替代下载地址#https://github.com/broadinstitute/gatk/releases/download/${version_gatk}/gatk-${version_gatk}.zipif [ -f ${envs}/gatk/bin/gatk.zip.aria2 ]; thenaria2c -x 16 -s 32 https://download.yzuu.cf/broadinstitute/gatk/releases/download/${version_gatk}/gatk-${version_gatk}.zip -c -d ${envs}/gatk/bin -o gatk.zipelse aria2c -x 16 -s 32 https://download.yzuu.cf/broadinstitute/gatk/releases/download/${version_gatk}/gatk-${version_gatk}.zip -d ${envs}/gatk/bin -o gatk.zipfiapt-get install -y unzipcd ${envs}/gatk/bin unzip -o gatk.zip mv ${envs}/gatk/bin/gatk-${version_gatk}/* ${envs}/gatk/bin/rm -rf ${envs}/gatk/bin/gatk-${version_gatk}#chmod +x ${envs}/bin/gatkcd ${result}
    fiif [ ! -x "$(command -v python)" ]; thenmamba env create -f ${envs}/gatk/bin/gatkcondaenv.yml
    fiif [ ! -x "$(command -v java)" ]; thenmamba install -y openjdk=${version_openjdk}
    fiif [ ! -x "$(command -v tabix)" ]; thenmamba install -y tabix
    fimamba activate gatk#这里有个巨坑,broadinstitute ftp站点bundle目录提供的hg19版本参考文件,默认格式运行会报错,提示没有索引,使用gatk创建索引仍然报错,其实是gz格式需要使用bgzip重新压缩并且使用tabix创建索引才行
    if [ ! -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz -d /opt/ref/hg19gzip -k -f -d /opt/ref/hg19/dbsnp_138.hg19.vcf.gz > /opt/ref/hg19/dbsnp_138.hg19.vcfbgzip -f --threads ${threads} /opt/ref/hg19/dbsnp_138.hg19.vcf > /opt/ref/hg19/dbsnp_138.hg19.vcf.gztabix -f /opt/ref/hg19/dbsnp_138.hg19.vcf.gz
    else if [ -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz.aria2" ]; thenecho 'download continue...'aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/dbsnp_138.hg19.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "/opt/ref/hg19/dbsnp_138.hg19.vcf.gz.tbi" ]; thengzip -k -f -d /opt/ref/hg19/dbsnp_138.hg19.vcf.gz > /opt/ref/hg19/dbsnp_138.hg19.vcfbgzip -f --threads ${threads} /opt/ref/hg19/dbsnp_138.hg19.vcf > /opt/ref/hg19/dbsnp_138.hg19.vcf.gztabix -f /opt/ref/hg19/dbsnp_138.hg19.vcf.gzfi
    fiif [ ! -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz -d /opt/ref/hg19gzip -k -f -d /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gztabix -f /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz
    else if [ -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz.aria2" ]; thenecho 'download continue...'aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "/opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz.tbi" ]; thengzip -k -f -d /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf > /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gztabix -f /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gzfi
    fiif [ ! -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz -d /opt/ref/hg19gzip -k -f -d /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gztabix -f /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz
    else if [ -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz.aria2" ]; thenecho 'download continue...'aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "/opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz.tbi" ]; thengzip -k -f -d /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gztabix -f /opt/ref/hg19/1000G_phase1.snps.high_confidence.hg19.sites.vcf.gzfi
    fiif [ ! -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz -d /opt/ref/hg19gzip -k -f -d /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gztabix -f /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz
    else if [ -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz.aria2" ]; thenecho 'download continue...'aria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg19/1000G_phase1.indels.hg19.sites.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "/opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz.tbi" ]; thengzip -k -f -d /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcfbgzip -f --threads ${threads} /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf > /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gztabix -f /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gzfi
    fi
    #创建参考序列hg19的dict字典文件
    if [ ! -f "/opt/ref/hg19/hg19.dict" ]; thengatk CreateSequenceDictionary -R /opt/ref/hg19/hg19.fasta -O /opt/ref/hg19/hg19.dict
    fi
    #根据下载的Illumina_pt2.bed 文件创建interval list文件,坐标转换,其实坐标0修改为1
    if [ ! -f "/opt/ref/hg19/Illumina_pt2.interval_list" ]; then#sed 's/chr//; s/\t/ /g' /opt/ref/hg19/Illumina_pt2.bed > /opt/ref/hg19/Illumina_pt2.processed.bedmkdir  -p /opt/ref/hg19rm     -f /opt/ref/hg19/Illumina_pt2.bedaria2c  https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/projects/Illumina_pt2.bed -d /opt/ref/hg19#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/projects/Illumina_pt2.bed -d /opt/ref/hg19gatk BedToIntervalList \-I	/opt/ref/hg19/Illumina_pt2.bed \-SD	/opt/ref/hg19/hg19.dict \-O	/opt/ref/hg19/Illumina_pt2.interval_list
    fimkdir -p ${result}/${sn}/recalgatk BaseRecalibrator \--known-sites /opt/ref/hg19/dbsnp_138.hg19.vcf.gz \--known-sites /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz \--known-sites /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz \-L /opt/ref/hg19/Illumina_pt2.interval_list \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \-O ${result}/${sn}/recal/${sn}_tumor_recal.table &gatk BaseRecalibrator \--known-sites /opt/ref/hg19/dbsnp_138.hg19.vcf.gz \--known-sites /opt/ref/hg19/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf.gz \--known-sites /opt/ref/hg19/1000G_phase1.indels.hg19.sites.vcf.gz \-L /opt/ref/hg19/Illumina_pt2.interval_list \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/aligned/${sn}_normal_marked.bam \-O ${result}/${sn}/recal/${sn}_normal_recal.table &waitmamba deactivate
    
  5. 使用校准表对bam碱基质量校准,因为这一步gatk效率感人,所以同时计算insertsize,拆分interval list(后续mutect2并行运行需要),运行cnvkit batch,运行samtools depth计算测序深度,samtools flagstat 统计mapping比例及质量

    mkdir -p ${result}/${sn}/bqsr
    mkdir -p ${result}/${sn}/stat
    mkdir -p ${result}/${sn}/cnv
    mkdir -p ${result}/${sn}/intervalmamba activate gatkgatk ApplyBQSR \--bqsr-recal-file ${result}/${sn}/recal/${sn}_tumor_recal.table \-L /opt/ref/hg19/Illumina_pt2.interval_list \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \-O ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam &gatk ApplyBQSR \--bqsr-recal-file ${result}/${sn}/recal/${sn}_normal_recal.table \-L /opt/ref/hg19/Illumina_pt2.interval_list \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/aligned/${sn}_normal_marked.bam \-O ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam &gatk CollectInsertSizeMetrics \-I ${result}/${sn}/aligned/${sn}_tumor_marked.bam \-O ${result}/${sn}/stat/${sn}_tumor_insertsize_metrics.txt \-H ${result}/${sn}/stat/${sn}_tumor_insertsize_histogram.pdf &gatk CollectInsertSizeMetrics \-I ${result}/${sn}/aligned/${sn}_normal_marked.bam \-O ${result}/${sn}/stat/${sn}_normal_insertsize_metrics.txt \-H ${result}/${sn}/stat/${sn}_normal_insertsize_histogram.pdf &rm -f ${result}/${sn}/interval/*.interval_list
    gatk SplitIntervals \-L /opt/ref/hg19/Illumina_pt2.interval_list \-R /opt/ref/hg19/hg19.fasta \-O ${result}/${sn}/interval \--scatter-count ${scatter} &mamba deactivateif [ ! -d "${envs}/${pn}.cnvkit" ]; thenmamba create -n ${pn}.cnvkit -y cnvkit=${version_cnvkit}
    fiif [ ! -f "/opt/ref/hg19/refFlat.txt" ]; thenaria2c -x 16 -s 16 http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refFlat.txt.gz -d /opt/ref/hg19cd /opt/ref/hg19 && gzip -d refFlat.txt.gz
    fimamba activate ${pn}.cnvkitcnvkit.py batch \${result}/${sn}/aligned/${sn}_tumor_marked.bam \--normal ${result}/${sn}/aligned/${sn}_normal_marked.bam \--method hybrid \--targets /opt/ref/hg19/Illumina_pt2.bed \--annotate /opt/ref/hg19/refFlat.txt \--output-reference ${result}/${sn}/cnv/${sn}_reference.cnn \--output-dir ${result}/${sn}/cnv/ \--diagram \-p ${threads} &mamba deactivatemamba activate ${pn}.alignsamtools depth -a -b /opt/ref/hg19/Illumina_pt2.bed  --threads ${threads} \${result}/${sn}/aligned/${sn}_tumor_marked.bam > \${result}/${sn}/stat/${sn}_tumor_marked.depth  &samtools depth -a -b /opt/ref/hg19/Illumina_pt2.bed  --threads ${threads} \${result}/${sn}/aligned/${sn}_normal_marked.bam > \${result}/${sn}/stat/${sn}_normal_marked.depth &samtools flagstat --threads ${threads} \${result}/${sn}/aligned/${sn}_tumor_marked.bam  > \${result}/${sn}/stat/${sn}_tumor_marked.flagstat   &samtools flagstat --threads ${threads} \${result}/${sn}/aligned/${sn}_normal_marked.bam > \${result}/${sn}/stat/${sn}_normal_marked.flagstat &mamba deactivatewait
    
  6. 计算堆叠数据( pileup metrics )以便后续评估污染,也可以根据拆分的interval list并行处理,处理之后合并。

    #官方巨坑,默认提供的small_exac_common_3_b37.vcf.gz默认染色体坐标不是以chr开头而是数字
    mamba activate gatk
    #这里有个巨坑,从broadinstitute ftp 站点bundle Mutect2目录下载的参考文件,与同样下载的参考序列基因组坐标系不一致,参考基因组参考序列是chr1这种格式,这个af-only-gnomad是1,2,3这种格式,需要编写脚本处理
    if [ ! -f "/opt/ref/hg19/small_exac_common_3_b37.processed.vcf.gz" ]; thenif [ ! -f "/opt/ref/hg19/small_exac_common_3_b37.vcf.gz" ]; thenaria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz -d /opt/ref/hg19elif [ -f "/opt/ref/hg19/small_exac_common_3_b37.vcf.gz.aria2" ]; thenaria2c ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/GetPileupSummaries/small_exac_common_3_b37.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "${envs}/VcfProcessUtil.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/chmod a+x ${envs}/VcfProcessUtil.pyfi${envs}/VcfProcessUtil.py \-f /opt/ref/hg19/small_exac_common_3_b37.vcf.gz \-o /opt/ref/hg19/small_exac_common_3_b37.processed.vcfcd /opt/ref/hg19bgzip -f --threads ${threads} small_exac_common_3_b37.processed.vcftabix -f small_exac_common_3_b37.processed.vcf.gz
    fifor i in `ls ${result}/${sn}/interval/*.interval_list`;
    doecho $igatk GetPileupSummaries \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam \-O ${i%.*}-tumor-pileups.table \-V /opt/ref/hg19/small_exac_common_3_b37.processed.vcf.gz \-L $i \--interval-set-rule INTERSECTION &gatk GetPileupSummaries \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam \-O ${i%.*}-normal-pileups.table \-V /opt/ref/hg19/small_exac_common_3_b37.processed.vcf.gz \-L $i \--interval-set-rule INTERSECTION &done
    waittables=
    for i in `ls ${result}/${sn}/interval/*-tumor-pileups.table`;
    dotables="$tables -I $i"
    donegatk GatherPileupSummaries \--sequence-dictionary /opt/ref/hg19/hg19.dict \$tables \-O ${result}/${sn}/stat/${sn}_tumor_pileups.tablenctables=
    for i in `ls ${result}/${sn}/interval/*-normal-pileups.table`;
    donctables="$nctables -I $i"
    donegatk GatherPileupSummaries \--sequence-dictionary /opt/ref/hg19/hg19.dict \$nctables \-O ${result}/${sn}/stat/${sn}_normal_pileups.tablemamba deactivate
    
  7. 使用GetPileupSummaries计算结果评估跨样本污染,结果用于后面 FilterMutectCall 过滤Mutect2输出结果

    mamba activate gatkgatk CalculateContamination \-matched ${result}/${sn}/stat/${sn}_normal_pileups.table \-I ${result}/${sn}/stat/${sn}_tumor_pileups.table \-O ${result}/${sn}/stat/${sn}_contamination.tablemamba deactivate
    
  8. Mutect2 call 突变,使用拆分的interval list,结束后将结果合并;同时并行运行manta call sv突变

    mkdir -p ${result}/${sn}/sv
    mkdir -p ${result}/${sn}/snvmamba activate gatk
    #这里有个巨坑,从broadinstitute ftp 站点bundle Mutect2目录下载的参考文件,与同样下载的参考序列基因组坐标系不一致,参考基因组参考序列是chr1这种格式,这个af-only-gnomad是1,2,3这种格式,需要编写脚本处理;hg38貌似没有这个问题,hg19的数据都不维护了么?
    if [ ! -f "/opt/ref/hg19/af-only-gnomad.raw.sites.b37.processed.vcf.gz" ]; thenif [ ! -f "/opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz -d /opt/ref/hg19elif [ -f "/opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz.aria2" ]; thenaria2c -x 16 -s 32 ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/Mutect2/af-only-gnomad.raw.sites.b37.vcf.gz -c -d /opt/ref/hg19fiif [ ! -f "${envs}/VcfProcessUtil.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/VcfProcessUtil.py -d ${envs}/chmod a+x ${envs}/VcfProcessUtil.pyfi${envs}/VcfProcessUtil.py \-f /opt/ref/hg19/af-only-gnomad.raw.sites.b37.vcf.gz \-o /opt/ref/hg19/af-only-gnomad.raw.sites.b37.processed.vcfcd /opt/ref/hg19bgzip -f --threads ${threads} af-only-gnomad.raw.sites.b37.processed.vcftabix -f af-only-gnomad.raw.sites.b37.processed.vcf.gz
    fiif [ ! -f "/opt/ref/hg19/Illumina_pt2.bed.gz" ]; thenbgzip -f -c /opt/ref/hg19/Illumina_pt2.bed > /opt/ref/hg19/Illumina_pt2.bed.gztabix -f -p bed /opt/ref/hg19/Illumina_pt2.bed.gz
    elseif [ ! -f "/opt/ref/hg19/Illumina_pt2.bed.gz.tbi" ]; thentabix -f -p bed /opt/ref/hg19/Illumina_pt2.bed.gzfi
    fi
    mamba deactivateif [ ! -d "${envs}/${pn}.manta" ]; thenmamba create -n ${pn}.manta -y manta=1.6.0
    fimamba activate ${pn}.mantarm -f ${result}/${sn}/sv/runWorkflow.py*
    configManta.py  \--normalBam ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam \--tumorBam  ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam \--referenceFasta /opt/ref/hg19/hg19.fasta \--exome \--callRegions /opt/ref/hg19/Illumina_pt2.bed.gz \--runDir ${result}/${sn}/svrm -rf ${result}/${sn}/sv/workspace
    python ${result}/${sn}/sv/runWorkflow.py -m local -j ${threads} &mamba deactivatemamba activate gatkrm -f ${result}/${sn}/snv/vcf-file.list
    touch ${result}/${sn}/snv/vcf-file.list
    for i in `ls ${result}/${sn}/interval/*.interval_list`;
    dorm -f ${i%.*}_bqsr.vcf.gzgatk Mutect2 \-R /opt/ref/hg19/hg19.fasta \-I ${result}/${sn}/bqsr/${sn}_tumor_bqsr.bam  -tumor  ${sn}_tumor  \-I ${result}/${sn}/bqsr/${sn}_normal_bqsr.bam -normal ${sn}_normal \-L $i \-O ${i%.*}_bqsr.vcf.gz \--max-mnp-distance 10 \--germline-resource /opt/ref/hg19/af-only-gnomad.raw.sites.b37.processed.vcf.gz \--native-pair-hmm-threads ${threads} &echo ${i%.*}_bqsr.vcf.gz >> ${result}/${sn}/snv/vcf-file.list
    done
    waitrm -f ${result}/${sn}/snv/${sn}_bqsr.vcf.gz.stats
    stats=
    for z in `ls ${result}/${sn}/interval/*_bqsr.vcf.gz.stats`;
    dostats="$stats -stats $z"
    donegatk MergeMutectStats $stats \-O ${result}/${sn}/snv/${sn}_bqsr.vcf.gz.statsgatk MergeVcfs \-I ${result}/${sn}/snv/vcf-file.list \-O ${result}/${sn}/snv/${sn}_bqsr.vcf.gzmamba deactivate
    
  9. FilterMutectCalls 对Mutect结果突变过滤

    mamba activate gatkgatk FilterMutectCalls \--max-events-in-region ${event} \--contamination-table ${result}/${sn}/stat/${sn}_contamination.table \-R /opt/ref/hg19/hg19.fasta \-V ${result}/${sn}/snv/${sn}_bqsr.vcf.gz \-O ${result}/${sn}/snv/${sn}_filtered.vcf.gzmamba deactivate
    
  10. 使用Vep注释过滤结果

    #conda检测环境是否存在,首次运行不存在创建该环境并安装软件
    if [ ! -d "${envs}/${pn}.vep" ]; thenecho "Creating the environment ${pn}.vep"mamba create -n ${pn}.vep -y ensembl-vep=${version_vep}
    fimkdir -p /opt/result/${sn}/vcf
    #检测vep注释数据库是否存在如果不存在则先下载
    if [ ! -d "/opt/ref/vep-cache/homo_sapiens/${version_vep}_GRCh37" ]; thenaria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -d /opt/ref/tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
    elif [ -f "/opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz.aria2" ]; thenaria2c -x 16 -s 48 https://ftp.ensembl.org/pub/release-${version_vep}/variation/indexed_vep_cache/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -c -d /opt/ref/tar -zxvf /opt/ref/homo_sapiens_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
    fiif [ ! -d "/opt/ref/vep-cache/homo_sapiens_refseq/${version_vep}_GRCh37" ]; thenaria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -d /opt/ref/tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
    elif [ -f "/opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz.aria2" ]; thenaria2c -x 16 -s 48 http://ftp.ensembl.org/pub/release-${version_vep}/variation/vep/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -c -d /opt/ref/tar -zxvf /opt/ref/homo_sapiens_refseq_vep_${version_vep}_GRCh37.tar.gz -C /opt/ref/vep-cache/
    fimamba activate ${pn}.vepmkdir -p ${result}/${sn}/annotation
    vep \-i ${result}/${sn}/snv/${sn}_filtered.vcf.gz  \-o ${result}/${sn}/annotation/${sn}_filtered_vep.tsv \--offline \--cache \--cache_version ${version_vep} \--everything \--dir_cache /opt/ref/vep-cache/ \--dir_plugins /opt/ref/vep-cache/Plugins \--species homo_sapiens \--assembly GRCh37 \--fasta /opt/ref/hg19/hg19.fasta \--refseq \--force_overwrite \--format vcf \--tab \--shift_3prime 1  \--offlinemamba deactivate
    
  11. 使用脚本处理注释结果和过滤vcf结果,输出和室间质评要求格式的数据表格

    mamba activate ${pn}.cnvkitif [ ! -f "${envs}/MatchedSnvVepAnnotationFilter.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedSnvVepAnnotationFilter.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedSnvVepAnnotationFilter.py -d ${envs}/chmod a+x ${envs}/MatchedSnvVepAnnotationFilter.py
    fi${envs}/MatchedSnvVepAnnotationFilter.py \-e normal_artifact   \-e germline   \-i strand_bias   \-i clustered_events   \--min-vaf=${snv_vaf} \--min-tlod=${snv_tlod} \--min-depth=${snv_depth} \-v ${result}/${sn}/snv/${sn}_filtered.vcf.gz   \-a ${result}/${sn}/annotation/${sn}_filtered_vep.tsv   \-o ${result}/${sn}/annotation/${sn}.result.SNV.tsvmamba deactivate
    
  12. 使用cnvkit提供工具输出分布图和热图

    mamba activate ${pn}.cnvkitcnvkit.py scatter ${result}/${sn}/cnv/${sn}_tumor_marked.cnr \-s ${result}/${sn}/cnv/${sn}_tumor_marked.cns \-i ' ' \-n ${sn}_normal \-o ${result}/${sn}/cnv/${sn}_cnv_scatter.png -t  &&cnvkit.py heatmap ${result}/${sn}/cnv/${sn}_tumor_marked.cns \-o ${result}/${sn}/cnv/${sn}_cnv_heatmap.pngmamba deactivate
    
  13. 使用cnvkit call 根据cnvkit batch输出结果推算拷贝数

    mamba activate ${pn}.cnvkitcnvkit.py call ${result}/${sn}/cnv/${sn}_tumor_marked.cns \-o ${result}/${sn}/cnv/${sn}_tumor_marked.call.cnsmamba deactivate
    
  14. 编写脚本处理cnvkit输出,计算cnv基因,exon位置,gain/lost,cn数

    mamba activate ${pn}.cnvkitif [ ! -f "${envs}/CnvAnnotationFilter.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/CnvAnnotationFilter.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/CnvAnnotationFilter.py -d ${envs}/chmod a+x ${envs}/CnvAnnotationFilter.py
    fiif [ ! -f "/opt/ref/hg19/hg19_refGene.txt" ]; thenaria2c -x 16 -s 16 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz -d /opt/ref/hg19 -o hg19_refGene.txt.gzcd /opt/ref/hg19 && gzip -d hg19_refGene.txt.gz
    fipython ${envs}/CnvAnnotationFilter.py  \-r /opt/ref/hg19/hg19_refGene.txt \-i ${cnv_min} \-x ${cnv_max} \-D ${cnv_dep} \-f ${result}/${sn}/cnv/${sn}_tumor_marked.call.cns \-o ${result}/${sn}/cnv/${sn}.result.CNV.tsvmamba deactivate
    
  15. 编写脚本处理manta的输出,获取最终sv输出结果,起始位置,基因、频率等

    mamba activate ${pn}.cnvkitif [ ! -f "${envs}/SvAnnotationFilter.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/SvAnnotationFilter.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/SvAnnotationFilter.py -d ${envs}/chmod a+x ${envs}/SvAnnotationFilter.pyfiif [ ! -f "/opt/ref/hg19/hg19_refGene.txt" ]; thenaria2c -x 16 -s 16 http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz -d /opt/ref/hg19 -o hg19_refGene.txt.gzcd /opt/ref/hg19 && gzip -d hg19_refGene.txt.gzfi${envs}/SvAnnotationFilter.py \-r /opt/ref/hg19/hg19_refGene.txt \-s ${sv_score} \-f ${result}/${sn}/sv/results/variants/somaticSV.vcf.gz \-o ${result}/${sn}/sv/${sn}.result.SV.tsvmamba deactivate
  1. 根据之前fastp,samtools depth,samtools flagstat,gatk CollectInsertSizeMetrics等输出,给出综合 QC数据
    mamba activate ${pn}.cnvkitif [ ! -f "${envs}/MatchedQcProcessor.py" ]; thenaria2c https://raw.fgit.cf/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedQcProcessor.py -d ${envs}/#aria2c https://raw.githubusercontent.com/doujiangbaozi/sliverworkspace-util/main/somatic/MatchedQcProcessor.py -d ${envs}/chmod a+x ${envs}/MatchedQcProcessor.pyfi${envs}/MatchedQcProcessor.py  --bed /opt/ref/hg19/Illumina_pt2.bed \--out ${result}/${sn}/stat/${sn}.result.QC.tsv \--sample-fastp=${result}/${sn}/trimmed/${sn}_tumor_fastp.json \--sample-depth=${result}/${sn}/stat/${sn}_tumor_marked.depth \--sample-flagstat=${result}/${sn}/stat/${sn}_tumor_marked.flagstat \--sample-insertsize=${result}/${sn}/stat/${sn}_tumor_insertsize_metrics.txt \--normal-fastp=${result}/${sn}/trimmed/${sn}_normal_fastp.json \--normal-depth=${result}/${sn}/stat/${sn}_normal_marked.depth \--normal-flagstat=${result}/${sn}/stat/${sn}_normal_marked.flagstat  \--normal-insertsize=${result}/${sn}/stat/${sn}_normal_insertsize_metrics.txtmamba deactivate

最终输出

文件名备注
1701/1701.result.SNV.tsvSNV最终突变结果
1701/1701/cnv/1701_cnv_heatmap.pngCNV结果热图
1701/cnv/1701_cnv_scatter.pngCNV结果分布图
1701/cnv/1701.result.CNV.tsvCNV最终结果
1701.result.SV.tsvSV最终结果
1701.result.QC.tsv最终质控结果

这篇关于开箱即用版本 满分室间质评之GATK Somatic SNV+Indel+CNV+SV的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/158040

相关文章

Android实现任意版本设置默认的锁屏壁纸和桌面壁纸(两张壁纸可不一致)

客户有些需求需要设置默认壁纸和锁屏壁纸  在默认情况下 这两个壁纸是相同的  如果需要默认的锁屏壁纸和桌面壁纸不一样 需要额外修改 Android13实现 替换默认桌面壁纸: 将图片文件替换frameworks/base/core/res/res/drawable-nodpi/default_wallpaper.*  (注意不能是bmp格式) 替换默认锁屏壁纸: 将图片资源放入vendo

PostgreSQL中的多版本并发控制(MVCC)深入解析

引言 PostgreSQL作为一款强大的开源关系数据库管理系统,以其高性能、高可靠性和丰富的功能特性而广受欢迎。在并发控制方面,PostgreSQL采用了多版本并发控制(MVCC)机制,该机制为数据库提供了高效的数据访问和更新能力,同时保证了数据的一致性和隔离性。本文将深入解析PostgreSQL中的MVCC功能,探讨其工作原理、使用场景,并通过具体SQL示例来展示其在实际应用中的表现。 一、

InnoDB的多版本一致性读的实现

InnoDB是支持MVCC多版本一致性读的,因此和其他实现了MVCC的系统如Oracle,PostgreSQL一样,读不会阻塞写,写也不会阻塞读。虽然同样是MVCC,各家的实现是不太一样的。Oracle通过在block头部的事务列表,和记录中的锁标志位,加上回滚段,个人认为实现上是最优雅的方式。 而PostgreSQL则更是将多个版本的数据都放在表中,而没有单独的回滚段,导致的一个结果是回滚非

JeecgBoot 升级springboot版本到2.6.0

1. 环境描述 Jeecgboot 3.0,他所依赖的springboot版本为2.3.5Release,将springboot版本升级为2.6.0。过程全纪录,从2开始描述。 2. 修改springboot版本号 <parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-pare

sv之svlib2

接上篇博客 systemverilog 的svlib的用法1-CSDN博客 已经介绍了关于解析固定格式文件到class里面,但是如果数据比较复杂,不规范,就需要用到一些特殊处理方法。 forexample 1: 获取如下txt的数据 2 #0;0 #2;1 #29;0 #1;0 #3;1 #40;0 #51;1 可以使用如下code去解析: int file_handle;

Cmake之3.0版本重要特性及用法实例(十三)

简介: CSDN博客专家、《Android系统多媒体进阶实战》一书作者 新书发布:《Android系统多媒体进阶实战》🚀 优质专栏: Audio工程师进阶系列【原创干货持续更新中……】🚀 优质专栏: 多媒体系统工程师系列【原创干货持续更新中……】🚀 优质视频课程:AAOS车载系统+AOSP14系统攻城狮入门视频实战课 🚀 人生格言: 人生从来没有捷径,只有行动才是治疗恐惧

Windows 10 各版本

对应于服务选项的 Windows 10 当前版本 Version服务选项上市日期OS build最后修订日期1803半年频道7/10/201817134.1917/24/2018Microsoft 建议使用1803半年频道(定向)4/30/201817134.1917/24/20181709半年频道1/18/201816299.5797/24/20181709半年频道(定向)10/17/2017

hector_quadrotor编译总结 | ubuntu 16.04 ros-kinetic版本

hector_quadrotor编译总结 | ubuntu 16.04 ros-kinetic版本 基于Ubuntu 16.04 LTS系统所用ROS版本为 Kinetic hector_quadrotor ROS包主要用于四旋翼无人机的建模、控制和仿真。 1.安装依赖库 所需系统及依赖库 Ubuntu 16.04|ros-kinetic|Gazebo|gazebo_ros_pkgs|ge

hector_quadrotor编译总结 | ubuntu 14.04 ros-indigo版本

hector_quadrotor编译总结 | ubuntu 14.04 ros-indigo版本 基于Ubuntu 14.04 LTS系统所用ROS版本为 Indigo hector_quadrotor ROS包主要用于四旋翼无人机的建模、控制和仿真。 备注:两种安装方式可选:install the binary packages | install the source files

微信小程序uniappvue3版本-控制tabbar某一个的显示与隐藏

1. 首先在pages.json中配置tabbar信息 2. 在代码根目录下添加 tabBar 代码文件 直接把微信小程序文档里面的四个文件复制到自己项目中就可以了   3. 根据自己的需求更改index.js文件 首先我这里需要判断什么时候隐藏某一个元素,需要引入接口 然后在切换tabbar时,改变tabbar当前点击的元素 import getList from '../