本文主要是介绍【深入UCSC Genome Browser】Repeats-Self Chain,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
人类基因组大约有3,000,000,000个碱基对其中50%-69%是重复序列,包括转座子(SINES、LINES、Long Terminal Repeats
)以及低复杂区域(比如 homopolymers 和 CAG重复)和假基因(大片段重复引起)。Self Chain 就是UCSC中查看大片段重复的工具。
描述
self chain 是染色体之间相似性的比较,相比如segdup,它修改了gap-extension的打分方式,使得可以接受更多的gap,因此它可以匹配到更多的同源序列。self chain 首先会过滤到相同染色体比较时产生的琐碎序列( "trivial" alignments ),同时排除性染色体chrX、chrY之间的比较(参考基因组中的chrY有很大部分是直接copy的chrX)。
self chain 在UCSC中以box和line的形式显示,boxes表示比对到的区域,single line 表示因为target序列的插入或者query序列的缺失引起的gap。double line 表示更加复杂的gap,在query和target区域都存在大量的gap,他们可能是由倒位、重叠缺失、变异富集、甚至是参考基因组上的gap(N)引起的。在这种情况下,当多个区域同时比对到基因组的一个区域时,single line表示的 gap 通常是因为处理过的假基因导致的,而 double line 表示的gap通常是因为旁系同源或者未处理了的假基因(这里不是很理解,原文是: In cases where multiple chains align over a particular region of the human genome, the chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes.)
举例
我们先来简单看下, UCSC中的Repeats-Chained Self Alignments,以hg38中的线粒体基因组(chrMT)为例。线粒体基因组全长16,569,在全基因组中有很多同源序列(大片段重复)。
Methods
The genome was aligned to itself using blastz. Trivial alignments were filtered out, and the remaining alignments were converted into axt format using the lavToAxt program. The axt alignments were fed into axtChain, which organizes all alignments between a single target chromosome and a single query chromosome into a group and creates a kd-tree out of the gapless subsections (blocks) of the alignments. A dynamic program was then run over the kd-trees to find the maximally scoring chains of these blocks. Chains scoring below a threshold were discarded; the remaining chains are displayed in this track.
Credits
Blastz was developed at Pennsylvania State University by Minmei Hou, Scott Schwartz, Zheng Zhang, and Webb Miller with advice from Ross Hardison.
Lineage-specific repeats were identified by Arian Smit and his RepeatMasker program.
The axtChain program was developed at the University of California at Santa Cruz by Jim Kent with advice from Webb Miller and David Haussler.
The browser display and database storage of the chains were generated by Robert Baertsch and Jim Kent.
References
Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput 2002, 115-26 (2002).
Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9.
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7.
这篇关于【深入UCSC Genome Browser】Repeats-Self Chain的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!