next-generation sequencing analysis method——paper1

2024-03-16 10:59

本文主要是介绍next-generation sequencing analysis method——paper1,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

半路出家会有很多困惑,我想若要踏实基础,一步步了解二代测序所有过程,读paper应该是正统。因此今天在Web of Science中检索"next-generation sequencing analysis method",找到多篇关于二代测序的发展历史,分析方法及应用等方面的文章,并在读后记录下来心得,应该会有所提高。

第一篇:

在这里插入图片描述
来自于:Omics Technologies and Bio-engineering: Towards Improving Quality of Life

文中主要介绍了,sequencing platforms, the characteristics of the data produced by each of them, the main tools for de novo and reference genome assembly, the effects of the assembly process on genome annotation will be discussed. The concepts related to RNA-Seq data analysis with the most used software, ChIP-Seq technology, and the protocols and pipelines used in metagenomic approaches will be discussed. The sequencing chemistry of some of these NGS technologies is described below.

11.1 主流平台
Illumina Platform, Ion Torrent Platform, PacBio Platform
在这里插入图片描述
STRUCTURAL GENOMICS
11.3.1 De Novo Assembly(不依赖于参考基因组的直接拼接)
Different computational approaches:
greedy algorithms, overlap-layout-consensus (OLC), and De Bruijn graphs
11.3.1.1 greedy algorithms
software: SSAKE, SHARCGS and VCAKE,对计算机要求高。

目前通常将这种方法与后两者结合使用
11.3.1.2 OLC
分为三步:识别reads中可能有overlap的区域;基于overlaps作图;根据算法生成最终序列。

software: Newbler、Mira、Edena
11.3.1.3 De Bruijn
最常用于组装原核基因组(short reads)的软件:Velvet
SPADES用于Ion Torrent platform产生的random lenghts reads
常用语真核基因组拼装的软件:SOAPdenovo、ALL-PATHS-LG

拼装过程中产生的gaps通常用“N”表示,这些区域通常是由于基因组中重复序列产生的,可以通过不同手段解决这些区域。如Paired genomic libraries(paired-end and mate-pair libraries)

用于基因组拼接之后,产生scaffolds并解决gap区域的软件:
其一通过分析contigs的末尾区域之间的overlap,确定两侧序列,如SSPACE
其二通过paired libraries解决gaps,如GAPFILLER

拼接结果评估:
(1)assembly process:metrics such as N50(评估产生序列的长度,contigs的平均长度和数量,最长和最短contigs)
(2)assembled contigs:通过mapping of paired reads进行评估。
在assembly sequence中寻找真核关键基因: Genome Assembly Gold-standard Evaluation (GAGE) tool

11.3.2 Reference Assembly
Software:
(1)Bowtie: efficient in terms of memory management but has some issues regarding reads that do not exhibit a perfect match, though its parameters can be adjusted
(2)BWA: Burrows-Wheeler transformation algorithm to increase mapping speed
(3)SHRiMP: compatible with data in letter space and color space format produced by the SOLiD platform
(4)SOAP2:single nucleotide polymorphism
(5)TopHat2:Ion Torrent platform
(6)mrsFAST:examines all possibilities for mapping to the reference genome, making it useful for variance detection studies
在这里插入图片描述
11.3.3 Genome Annotation
Describing the function of the product of a predicted gene

Bioinformatics software:
(1) signal sensors (e.g., for TATA box, start and stop codon, or poly-A signal detection),
(2) content sensors (e.g., for G + C content, codon usage, or dicodon frequency detection), and
(3) similarity detection (e.g., between proteins from closely related organisms, mRNA from the same organism, or reference genomes)

Three basic categories:
(1)nucleotidelevel annotation, which seeks to identify the physical location of DNA sequences to determine where components such as genes, RNAs, and repetitive elements are located. Sequencing and/or assembly errors at this stage can result in false pseudogenes through indels.
(2)protein-level annotation, which seeks to determine the possible functions of genes, identifying which one a given organism does or does not have.
(3)process-level annotation, which aims to identify the pathways and processes in which different genes interact, assembling an efficient functional annotation.

11.4.1 RNA-Seq: De Novo and Reference-Based Approaches
Identification, measurement, and comparison of gene expression in a target transcriptome
Applied to functional studies, such as expression profile analysis, annotation correction, characterization of differentially expressed genes, and gene prediction
(1)Annotated reference:
choose 1:
mapping reads software: Bioscope (quantifies the expression of each gene)
analyzing results software: DEGseq (statistical analysis of gene expression under the different conditions studied, to identify differentially expressed genes)

choose 2:
TopHat: maps the reads to the annotated reference genes, generating a mapping file in bam format.
Cufflinks: calculate the expression of the genes and identify the genes that are differentially expressed between the analyzed samples
(2)De novo:
most widely used software: SOAPdenovo-Trans、Trans-AByss、Trinity
Trinity: produce a high-quality assembly of a transcriptome with a low error rate and identify multiple isoforms.
** Trans-Abyss**: generates optimized assemblies, high coverage transcripts, assemblies with different k-mers.
SOAPdenovo-Trans: greatest transcript contiguity, least amount of redundancy, fastest of three.
(3)correction of genome annotations:
Mapping of the reads to the annotation (mapping coverage to be evaluated for all annotated genes and intergenic regions, allowing the identification of potential new transcripts)

software: Cufflinks and Scripture
(4)gene prediction:
identify new genes and incorporate into an existing annotation

software: GeneMark-ET tool

11.4.2 ChIP-Seq
MACS: empirically calculates the change in the coverage of ChIP-Seq reads and uses this measure to improve the resolution of the prediction of the binding sites
** ChIPDiff**、ODIN: identify the significant differences in two ChIP-Seq signals under different biological conditions using hidden Markov models
在这里插入图片描述

这篇关于next-generation sequencing analysis method——paper1的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/815266

相关文章

你的华为手机升级了吗? 鸿蒙NEXT多连推5.0.123版本变化颇多

《你的华为手机升级了吗?鸿蒙NEXT多连推5.0.123版本变化颇多》现在的手机系统更新可不仅仅是修修补补那么简单了,华为手机的鸿蒙系统最近可是动作频频,给用户们带来了不少惊喜... 为了让用户的使用体验变得很好,华为手机不仅发布了一系列给力的新机,还在操作系统方面进行了疯狂的发力。尤其是近期,不仅鸿蒙O

模版方法模式template method

学习笔记,原文链接 https://refactoringguru.cn/design-patterns/template-method 超类中定义了一个算法的框架, 允许子类在不修改结构的情况下重写算法的特定步骤。 上层接口有默认实现的方法和子类需要自己实现的方法

LLVM入门2:如何基于自己的代码生成IR-LLVM IR code generation实例介绍

概述 本节将通过一个简单的例子来介绍如何生成llvm IR,以Kaleidoscope IR中的例子为例,我们基于LLVM接口构建一个简单的编译器,实现简单的语句解析并转化为LLVM IR,生成对应的LLVM IR部分,代码如下,文件名为toy.cpp,先给出代码,后面会详细介绍每一步分代码: #include "llvm/ADT/APFloat.h"#include "llvm/ADT/S

Python安装llama库出错“metadata-generation-failed”

Python安装llama库出错“metadata-generation-failed” 1. 安装llama库时出错2. 定位问题1. 去官网下载llama包 2.修改配置文件2.1 解压文件2.2 修改配置文件 3. 本地安装文件 1. 安装llama库时出错 2. 定位问题 根据查到的资料,发现时llama包中的execfile函数已经被下线了,需要我们手动修改代码后

【鸿蒙HarmonyOS NEXT】页面之间相互传递参数

【鸿蒙HarmonyOS NEXT】页面之间相互传递参数 一、环境说明二、页面之间相互传参 一、环境说明 DevEco Studio 版本: API版本:以12为主 二、页面之间相互传参 说明: 页面间的导航可以通过页面路由router模块来实现。页面路由模块根据页面url找到目标页面,从而实现跳转。通过页面路由模块,可以使用不同的url访问不同的页面,包括跳转到U

leetcode#496. Next Greater Element I

题目 You are given two arrays (without duplicates) nums1 and nums2 where nums1’s elements are subset of nums2. Find all the next greater numbers for nums1’s elements in the corresponding places of nums

OpenAI澄清:“GPT Next”不是新模型。

不,”GPT Next” 并不是OpenAI的下一个重要项目。 本周早些时候,OpenAI 日本业务的负责人长崎忠男在日本 KDDI 峰会上分享了一场演讲,似乎在暗示一个名为 “GPT Next” 的新模型即将出现。 但OpenAI的一位发言人已向Mashable证实,幻灯片中用引号括起来的”GPT Next”一词只是一个假设性占位符,旨在表明OpenAI的模型如何随着时间呈指数级进化。发言人

AI跟踪报道第55期-新加坡内哥谈技术-本周AI新闻: GPT NEXT (x100倍)即将在2024推出

每周跟踪AI热点新闻动向和震撼发展 想要探索生成式人工智能的前沿进展吗?订阅我们的简报,深入解析最新的技术突破、实际应用案例和未来的趋势。与全球数同行一同,从行业内部的深度分析和实用指南中受益。不要错过这个机会,成为AI领域的领跑者。点击订阅,与未来同行! 订阅:https://rengongzhineng.io/ 点击下面视频观看在B站本周AI更新: B 站 链接 观看: 本周AI

兔子--The method setLatestEventInfo(Context, CharSequence, CharSequence, PendingIntent) from the type

notification.setLatestEventInfo(context, title, message, pendingIntent);     不建议使用 低于API Level 11版本,也就是Android 2.3.3以下的系统中,setLatestEventInfo()函数是唯一的实现方法。  Intent  intent = new Intent(

【鸿蒙HarmonyOS NEXT】调用后台接口及List组件渲染

【鸿蒙HarmonyOS NEXT】调用后台接口及List组件渲染 一、环境说明二、调用后台接口及List组件渲染三、总结 一、环境说明 DevEco Studio 版本: API版本:以12为主 二、调用后台接口及List组件渲染 后台接口及返回数据分析 JSON数据格式如下: {"code": 0,"data": {"total": 6,"pageSize"