通过RSeQC判断RNA-seq测序数据文库类型和链特异性,指导Stringtie参数使用

本文主要是介绍通过RSeQC判断RNA-seq测序数据文库类型和链特异性,指导Stringtie参数使用,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

 1、snakemake运行infer_experiment.py

rule check_ss:input:bed = config["REF"]["genome_bed12"],bam = rules.bam_index.output.bam,output:txt = "result/QC/check_ss/{sample}.txt",params:name = "{sample}"shell: """echo "sample name: {params.name}" > {output.txt}infer_experiment.py -r {input.bed} -i {input.bam} \-s 1000000 &> {output.txt}"""

2、运行结果解读

This is PairEnd Data
Fraction of reads failed to determine: 0.0072
Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487

结果解读看官方文档:

官网:RSeQC: An RNA-seq Quality Control Package — RSeQC documentationhttp://rseqc.sourceforge.net/#infer-experiment-py

3、判断建库类型

对于双端测序,1++,1--,2+-,2-+与1+-,1-+,2++,2--的比例可以体现出文库类型;对于单端测序,++,--和+-,-+的比例亦然:

二者比例接近1时,文库为unstranded
前者明显大于后者时,文库为FR/fr-secondstrand stranded
后者明显大于前者时,文库为RF/fr-firststrand stranded

4、文库类型对应的软件参数

表格来源于Strand Settings | Griffith LabStrand-related settings There are various strand-related settings for RNA-seq tools that must be adjusted to account for library construction strategy. The following table provides read orientation codes and software settings for commonly used RNA-seq analysis tools including: IGV, TopHat, HISAT2, HTSeq, Picard, Kallisto, StringTie, and others. Each of these explanations/settings is provided for several commonly used RNA-seq library construction kits that produce either stranded or unstranded data. NOTE: A useful tool to infer strandedness of your raw sequence data is the check_strandedness tool. We provide a tutorial for using this tool here. NOTE: In the table below, the list of methods/kits for specific strand settings assumes that these kits are used as specified by their manufacturer. It is very possible that a sequencing provider/core may make modifications to these kits. For example, in one case we obtained RNAseq data processed with NEBNext Ultra II Directional kit (dUTP method). However instead of using the NEB hairpin adapters, IDT xGen UDI-UMI adapters were substituted, and this results in the insert strandedness being flipped (from RF/fr-firststrand to FR/fr-secondstrand). Because this level of detail is not always provided it is highly recommended to confirm your data’s strandedness empirically. Tool RF/fr-firststrand stranded (dUTP) FR/fr-secondstrand stranded (Ligation) Unstranded   check_strandedness (output) RF/fr-firststrand FR/fr-secondstrand unstranded   IGV (5p to 3p read orientation code) F2R1 F1R2 F2R1 or F1R2   TopHat (–library-type parameter) fr-firststrand fr-secondstrand fr-unstranded   HISAT2 (–rna-strandness parameter) R/RF F/FR NONE   HTSeq (–stranded/-s parameter) reverse yes no   STAR n/a (STAR doesn’t use library strandedness info for mapping) NONE NONE   Picard CollectRnaSeqMetrics (STRAND_SPECIFICITY parameter) SECOND_READ_TRANSCRIPTION_STRAND FIRST_READ_TRANSCRIPTION_STRAND NONE   Kallisto quant (parameter) –rf-stranded –fr-stranded NONE   StringTie (parameter) –rf –fr NONE   FeatureCounts (-s parameter) 2 1 0   RSEM (–forward-prob parameter) 0 1 0.5   Salmon (–libType parameter) ISR (assuming paired-end with inward read orientation) ISF (assuming paired-end with inward read orientation) IU (assuming paired-end with inward read orientation)   Trinity (–SS_lib_type parameter) RF FR NONE   MGI CWL YAML (strand parameter) first second NONE   RegTools (strand parameter) -s 1 -s 2 -s 0     Example methods/kits: dUTP, NSR, NNSR, Illumina TruSeq Strand Specific Total RNA, NEBNext Ultra II Directional Example methods/kits: Ligation, Standard SOLiD, NuGEN Encore, 10X 5’ scRNA data Example kits/data: Standard Illumina, NuGEN OvationV2, SMARTer universal low input RNA kit (TaKara), GDC normalized TCGA data   Notes To identify which ‘–library-type’ setting to use with TopHat, Illumina specifically documents the types in the ‘RNA Sequencing Analysis with TopHat’ Booklet. For the TruSeq RNA Sample Prep Kit, the appropriate library type is ‘fr-unstranded’. For TruSeq stranded sample prep kits, the library type is specified as ‘fr-firststrand’. These posts are also very informative: How to tell which library type to use (fr-firststrand or fr-secondstrand)? and How to determine if a library Is strand-specific and Strandness in RNASeq by Hong Zheng. Another suggestion is to view aligned reads in IGV and determine the read orientation by one of two methods. First, you can have IGV color alignments according to strand using the ‘Color alignments’ by ‘First-of-pair strand’ setting. Second, to get more detailed information you can hover your cursor over a read aligned to an exon. ‘F2 R1’ means the second read in the pair aligns to the forward strand and the first read in the pair aligns to the reverse strand. For a positive DNA strand transcript (5’ to 3’) this would denote a fr-firststrand setting in TopHat, i.e. “the right-most end of the fragment (in transcript coordinates) is the first sequenced”. For a negative DNA strand transcript (3’ to 5’) this would denote a fr-secondstrand setting in TopHat. ‘F1 R2’ means the first read in the pair aligns to the forward strand and the second read in the pair aligns to the reverse strand. See above for the complete definitions, but its simply the inverse for ‘F1 R2’ mapping. Anything other than FR orientation is not covered here and discussion with the individual responsible for library creation would be required. Typically ‘RF’ orientation is reserved for large-insert mate-pair libraries. Other orientations like ‘FF’ and ‘RR’ seem impossible with Illumina sequence technology and suggest structural variation between the sample and reference. Additional details are provided in the TopHat manual. For HTSeq, the htseq-count manual indicates that for the ‘–stranded’ option, ‘stranded=no’ means that a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For ‘stranded=yes’ and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For ‘stranded=reverse’, these rules are reversed. For the ‘CollectRnaSeqMetrics’ sub-command of Picard, the Picard manual indicates that one should use ‘FIRST_READ_TRANSCRIPTION_STRAND’ if the reads are expected to be on the transcription strand.https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/

ToolRF/fr-firststrand strandedFR/fr-secondstrand strandedUnstranded
StringTie (parameter)–rf–frNONE

本次实验数据check_ss解读:

This is PairEnd Data
Fraction of reads failed to determine: 0.0322
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0267
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9410

判断文库为RF/fr-firststrand stranded

stringtie应该使用 --rf参数

家益师兄确实用的 --rf !!!!

这篇关于通过RSeQC判断RNA-seq测序数据文库类型和链特异性,指导Stringtie参数使用的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/322530

相关文章

C++ Sort函数使用场景分析

《C++Sort函数使用场景分析》sort函数是algorithm库下的一个函数,sort函数是不稳定的,即大小相同的元素在排序后相对顺序可能发生改变,如果某些场景需要保持相同元素间的相对顺序,可使... 目录C++ Sort函数详解一、sort函数调用的两种方式二、sort函数使用场景三、sort函数排序

Java String字符串的常用使用方法

《JavaString字符串的常用使用方法》String是JDK提供的一个类,是引用类型,并不是基本的数据类型,String用于字符串操作,在之前学习c语言的时候,对于一些字符串,会初始化字符数组表... 目录一、什么是String二、如何定义一个String1. 用双引号定义2. 通过构造函数定义三、St

SpringBoot利用@Validated注解优雅实现参数校验

《SpringBoot利用@Validated注解优雅实现参数校验》在开发Web应用时,用户输入的合法性校验是保障系统稳定性的基础,​SpringBoot的@Validated注解提供了一种更优雅的解... 目录​一、为什么需要参数校验二、Validated 的核心用法​1. 基础校验2. php分组校验3

Pydantic中Optional 和Union类型的使用

《Pydantic中Optional和Union类型的使用》本文主要介绍了Pydantic中Optional和Union类型的使用,这两者在处理可选字段和多类型字段时尤为重要,文中通过示例代码介绍的... 目录简介Optional 类型Union 类型Optional 和 Union 的组合总结简介Pyd

Vue3使用router,params传参为空问题

《Vue3使用router,params传参为空问题》:本文主要介绍Vue3使用router,params传参为空问题,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐... 目录vue3使用China编程router,params传参为空1.使用query方式传参2.使用 Histo

使用Python自建轻量级的HTTP调试工具

《使用Python自建轻量级的HTTP调试工具》这篇文章主要为大家详细介绍了如何使用Python自建一个轻量级的HTTP调试工具,文中的示例代码讲解详细,感兴趣的小伙伴可以参考一下... 目录一、为什么需要自建工具二、核心功能设计三、技术选型四、分步实现五、进阶优化技巧六、使用示例七、性能对比八、扩展方向建

使用Python实现一键隐藏屏幕并锁定输入

《使用Python实现一键隐藏屏幕并锁定输入》本文主要介绍了使用Python编写一个一键隐藏屏幕并锁定输入的黑科技程序,能够在指定热键触发后立即遮挡屏幕,并禁止一切键盘鼠标输入,这样就再也不用担心自己... 目录1. 概述2. 功能亮点3.代码实现4.使用方法5. 展示效果6. 代码优化与拓展7. 总结1.

使用Python开发一个简单的本地图片服务器

《使用Python开发一个简单的本地图片服务器》本文介绍了如何结合wxPython构建的图形用户界面GUI和Python内建的Web服务器功能,在本地网络中搭建一个私人的,即开即用的网页相册,文中的示... 目录项目目标核心技术栈代码深度解析完整代码工作流程主要功能与优势潜在改进与思考运行结果总结你是否曾经

Linux中的计划任务(crontab)使用方式

《Linux中的计划任务(crontab)使用方式》:本文主要介绍Linux中的计划任务(crontab)使用方式,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、前言1、linux的起源与发展2、什么是计划任务(crontab)二、crontab基础1、cro

kotlin中const 和val的区别及使用场景分析

《kotlin中const和val的区别及使用场景分析》在Kotlin中,const和val都是用来声明常量的,但它们的使用场景和功能有所不同,下面给大家介绍kotlin中const和val的区别,... 目录kotlin中const 和val的区别1. val:2. const:二 代码示例1 Java