GATK ReadsPathDataSource类介绍

2024-08-24 23:20

本文主要是介绍GATK ReadsPathDataSource类介绍,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

GATK(Genome Analysis Toolkit)是一个广泛使用的基因组分析工具包,它的核心库之一是htsjdk,用于处理高通量测序数据。在GATK中,ReadsPathDataSource类是负责管理和提供读取高通量测序数据文件(如BAM、SAM、CRAM)的类。

常见使用场景

  • 数据加载:在GATK的基因组分析工具链中,ReadsPathDataSource 经常被用来从指定路径加载测序数据。
  • 数据过滤:通过 ReadsPathDataSource,可以方便地在加载数据的同时进行预过滤,如按特定标准选择感兴趣的序列记录。
  • 多文件支持:支持同时从多个文件中加载数据,使得分析多个样本的数据更加便捷。

类关系

ReadsPathDataSource源码

package org.broadinstitute.hellbender.engine;import com.google.common.annotations.VisibleForTesting;
import htsjdk.samtools.MergingSamRecordIterator;
import htsjdk.samtools.SAMException;
import htsjdk.samtools.SAMFileHeader;
import htsjdk.samtools.SAMRecord;
import htsjdk.samtools.SAMSequenceDictionary;
import htsjdk.samtools.SamFileHeaderMerger;
import htsjdk.samtools.SamInputResource;
import htsjdk.samtools.SamReader;
import htsjdk.samtools.SamReaderFactory;
import htsjdk.samtools.util.CloseableIterator;
import htsjdk.samtools.util.IOUtil;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.broadinstitute.hellbender.exceptions.GATKException;
import org.broadinstitute.hellbender.exceptions.UserException;
import org.broadinstitute.hellbender.utils.IntervalUtils;
import org.broadinstitute.hellbender.utils.SimpleInterval;
import org.broadinstitute.hellbender.utils.Utils;
import org.broadinstitute.hellbender.utils.gcs.BucketUtils;
import org.broadinstitute.hellbender.utils.iterators.SAMRecordToReadIterator;
import org.broadinstitute.hellbender.utils.iterators.SamReaderQueryingIterator;
import org.broadinstitute.hellbender.utils.read.GATKRead;
import org.broadinstitute.hellbender.utils.read.ReadConstants;import java.io.IOException;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Iterator;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.function.Function;
import java.util.stream.Collectors;/*** Manages traversals and queries over sources of reads which are accessible via {@link Path}s* (for now, SAM/BAM/CRAM files only).** Two basic operations are available:** -Iteration over all reads, optionally restricted to reads that overlap a set of intervals* -Targeted queries by one interval at a time*/
public final class ReadsPathDataSource implements ReadsDataSource {private static final Logger logger = LogManager.getLogger(ReadsPathDataSource.class);/*** Mapping from SamReaders to iterators over the reads from each reader. Only one* iterator can be open from a given reader at a time (this is a restriction* in htsjdk). Iterator is set to null for a reader if no iteration is currently* active on that reader.*/private final Map<SamReader, CloseableIterator<SAMRecord>> readers;/*** Hang onto the input files so that we can print useful errors about them*/private final Map<SamReader, Path> backingPaths;/*** Only reads that overlap these intervals (and unmapped reads, if {@link #traverseUnmapped} is set) will be returned* during a full iteration. Null if iteration is unbounded.** Individual queries are unaffected by these intervals -- only traversals initiated via {@link #iterator} are affected.*/private List<SimpleInterval> intervalsForTraversal;/*** If true, restrict traversals to unmapped reads (and reads overlapping any {@link #intervalsForTraversal}, if set).* False if iteration is unbounded or bounded only by our {@link #intervalsForTraversal}.** Note that this setting covers only unmapped reads that have no position -- unmapped reads that are assigned the* position of their mates will be returned by queries overlapping that position.** Individual queries are unaffected by this setting  -- only traversals initiated via {@link #iterator} are affected.*/private boolean traverseUnmapped;/*** Used to create a merged Sam header when we're dealing with multiple readers. Null if we only have a single reader.*/private final SamFileHeaderMerger headerMerger;/*** Are indices available for all files?*/private boolean indicesAvailable;/*** Has it been closed already.*/private boolean isClosed;/*** Initialize this data source with a single SAM/BAM file and validation stringency SILENT.** @param samFile SAM/BAM file, not null.*/public ReadsPathDataSource( final Path samFile ) {this(samFile != null ? Arrays.asList(samFile) : null, (SamReaderFactory)null);}/*** Initialize this data source with multiple SAM/BAM files and validation stringency SILENT.** @param samFiles SAM/BAM files, not null.*/public ReadsPathDataSource( final List<Path> samFiles ) {this(samFiles, (SamReaderFactory)null);}/*** Initialize this data source with a single SAM/BAM file and a custom SamReaderFactory** @param samPath path to SAM/BAM file, not null.* @param customSamReaderFactory SamReaderFactory to use, if null a default factory with no reference and validation*                               stringency SILENT is used.*/public ReadsPathDataSource( final Path samPath, SamReaderFactory customSamReaderFactory ) {this(samPath != null ? Arrays.asList(samPath) : null, customSamReaderFactory);}/*** Initialize this data source with multiple SAM/BAM files and a custom SamReaderFac

这篇关于GATK ReadsPathDataSource类介绍的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1103893

相关文章

zookeeper端口说明及介绍

《zookeeper端口说明及介绍》:本文主要介绍zookeeper端口说明,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、zookeeper有三个端口(可以修改)aVNMqvZ二、3个端口的作用三、部署时注意总China编程结一、zookeeper有三个端口(可以

Python中win32包的安装及常见用途介绍

《Python中win32包的安装及常见用途介绍》在Windows环境下,PythonWin32模块通常随Python安装包一起安装,:本文主要介绍Python中win32包的安装及常见用途的相关... 目录前言主要组件安装方法常见用途1. 操作Windows注册表2. 操作Windows服务3. 窗口操作

c++中的set容器介绍及操作大全

《c++中的set容器介绍及操作大全》:本文主要介绍c++中的set容器介绍及操作大全,本文通过实例代码给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友参考下吧... 目录​​一、核心特性​​️ ​​二、基本操作​​​​1. 初始化与赋值​​​​2. 增删查操作​​​​3. 遍历方

HTML img标签和超链接标签详细介绍

《HTMLimg标签和超链接标签详细介绍》:本文主要介绍了HTML中img标签的使用,包括src属性(指定图片路径)、相对/绝对路径区别、alt替代文本、title提示、宽高控制及边框设置等,详细内容请阅读本文,希望能对你有所帮助... 目录img 标签src 属性alt 属性title 属性width/h

MybatisPlus service接口功能介绍

《MybatisPlusservice接口功能介绍》:本文主要介绍MybatisPlusservice接口功能介绍,本文给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋友... 目录Service接口基本用法进阶用法总结:Lambda方法Service接口基本用法MyBATisP

MySQL复杂SQL之多表联查/子查询详细介绍(最新整理)

《MySQL复杂SQL之多表联查/子查询详细介绍(最新整理)》掌握多表联查(INNERJOIN,LEFTJOIN,RIGHTJOIN,FULLJOIN)和子查询(标量、列、行、表子查询、相关/非相关、... 目录第一部分:多表联查 (JOIN Operations)1. 连接的类型 (JOIN Types)

java中BigDecimal里面的subtract函数介绍及实现方法

《java中BigDecimal里面的subtract函数介绍及实现方法》在Java中实现减法操作需要根据数据类型选择不同方法,主要分为数值型减法和字符串减法两种场景,本文给大家介绍java中BigD... 目录Java中BigDecimal里面的subtract函数的意思?一、数值型减法(高精度计算)1.

Pytorch介绍与安装过程

《Pytorch介绍与安装过程》PyTorch因其直观的设计、卓越的灵活性以及强大的动态计算图功能,迅速在学术界和工业界获得了广泛认可,成为当前深度学习研究和开发的主流工具之一,本文给大家介绍Pyto... 目录1、Pytorch介绍1.1、核心理念1.2、核心组件与功能1.3、适用场景与优势总结1.4、优

Java实现本地缓存的常用方案介绍

《Java实现本地缓存的常用方案介绍》本地缓存的代表技术主要有HashMap,GuavaCache,Caffeine和Encahche,这篇文章主要来和大家聊聊java利用这些技术分别实现本地缓存的方... 目录本地缓存实现方式HashMapConcurrentHashMapGuava CacheCaffe

Spring Security介绍及配置实现代码

《SpringSecurity介绍及配置实现代码》SpringSecurity是一个功能强大的Java安全框架,它提供了全面的安全认证(Authentication)和授权(Authorizatio... 目录简介Spring Security配置配置实现代码简介Spring Security是一个功能强