UVA1368 DNA Consensus String

2023-12-02 17:20
文章标签 dna string consensus uva1368

本文主要是介绍UVA1368 DNA Consensus String,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

DNA Consensus String

The Hamming distance is the number of different characters at each position from two strings of equal length. For example, assume we are given the two strings “AGCAT” and “GGAAT.” The Hamming distance of these two strings is 2 because the 1st and the 3rd characters of the two strings are different. Using the Hamming distance, we can define a representative string for a set of multiple strings of equal length. Given a set of strings S = {s1, …, sm} of length n, the consensus error between a string y of length n and the set S is the sum of the Hamming distances between y and each si in S. If the consensus error between y and S is the minimum among all possible strings y of length n, y is called a consensus string of S. For example, given the three strings “AGCAT” 、“AGACT” and “GGAAT” the consensus string of the given strings is “AGAAT” because the sum of the Hamming distances between “AGAAT” and the three strings is 3 which is minimal. (In this case, the consensus string is unique, but in general, there can be more than one consensus string.) We use the consensus string as a representative of the DNA sequence. For the example of Figure 1 above, a consensus string of gene X is “GCAAATGGCTGTGCA” and the consensus error is 7.

image-20231202140726494

Input: Your program is to read from standard input. The input consists of T test cases. The number of test cases T is given in the first line of the input. Each test case starts with a line containing two integers m and n which are separated by a single space. The integer m(4 ≤ m ≤ 50) represents the number of DNA sequences and n(4 ≤ n ≤ 1000) represents the length of the DNA sequences, respectively. In each of the next m lines, each DNA sequence is given.

Output: Your program is to write to standard output. Print the consensus string in the first line of each case and the consensus error in the second line of each case. If there exists more than one consensus string, print the lexicographically smallest consensus string.

DNA序列 DNA Consensus String

题面翻译

输入 m m m个长度均为 n n n DNA \text{DNA} DNA序列,求一个 DNA \text{DNA} DNA序列,到所有序列的总 Hamming \text{Hamming} Hamming距离尽量小。两个等长字符串的 Hamming \text{Hamming} Hamming距离等于字符不同的位置个数,如 ACGT \text{ACGT} ACGT GCGA \text{GCGA} GCGA Hamming \text{Hamming} Hamming距离为 2 2 2(左数第 1 1 1 4 4 4个字符不同)。

输入整数 m m m n n n 4 ≤ m ≤ 50 4\leq m \leq 50 4m50 4 ≤ n ≤ 1000 4\leq n \leq 1000 4n1000),以及 m m m个长度为 n n n DNA \text{DNA} DNA序列,(只包含字母 A A A C C C G G G T T T),输出到 m m m个序列的 H a m m i n g Hamming Hamming距离和最小的 DNA \text{DNA} DNA序列和对应的距离。如有多解,要求字典序最小的解。

题目描述

PDF

输入格式

输出格式

Solution

采用贪心策略

str[i][j]为第i个字符串的第j个元素,因为字符串序列是等长的首先比较相同位置的字符,并记录其A、G、C、T的个数,当所有m个字符串的第i个位置遍历完毕,计算A、G、C、T个数最大,当数目相同时使用字典序最小的的并作为std_s[i]

//
// Created by Gowi on 2023/12/2.
//#include <iostream>
#include <string>#define N 55using namespace std;int main() {string str[N];int T;cin >> T;while (T--) {int m, n;string std_s;cin >> m >> n;for (int i = 0; i < m; ++i) {cin >> str[i];}int sum = 0;for (int i = 0; i < n; ++i) {int a = 0, c = 0, g = 0, t = 0, index_n = 0;char index_c = 'Z';for (int j = 0; j < m; ++j) {if (str[j][i] == 'A') {a++;} else if (str[j][i] == 'C') {c++;} else if (str[j][i] == 'G') {g++;} else {t++;}index_n = max(a, max(c, max(g, t)));}if (t == index_n && 'T' < index_c) {index_c = 'T';}if (g == index_n && 'G' < index_c) {index_c = 'G';}if (c == index_n && 'C' < index_c) {index_c = 'C';}if (a == index_n && 'A' < index_c) {index_c = 'A';}std_s += index_c;if (index_c != 'A') {sum += a;}if (index_c != 'G') {sum += g;}if (index_c != 'C') {sum += c;}if (index_c != 'T') {sum += t;}}std_s[n] = '\0';cout << std_s << endl;cout << sum << endl;}
}

这篇关于UVA1368 DNA Consensus String的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/446174

相关文章

string字符会调用new分配堆内存吗

gcc的string默认大小是32个字节,字符串小于等于15直接保存在栈上,超过之后才会使用new分配。

OBItools:Linux下的DNA条形码分析神器

在生物信息学领域,DNA条形码分析是一种非常常见的研究方法,用于物种鉴定、生态学和进化生物学研究。今天要介绍的工具就是专为此设计的——OBItools。这个工具集专门用于处理生态学和进化生物学中的DNA条形码数据,在Linux环境下运行。无论你是本科生还是刚入门的科研人员,OBItools都能为你提供可靠的帮助。 OBItools的功能亮点 OBItools是一个强大的工具包,特别适合DNA条形

hdu2072(string的应用)

单词数 Time Limit: 1000/1000 MS (Java/Others)    Memory Limit: 32768/32768 K (Java/Others) Total Submission(s): 25447    Accepted Submission(s): 5957 Problem Description lily的好朋友xiaoou333最近很空,他

【UVA】10739 - String to Palindrome(动态规划)

比较水的动态规划 dp[i][j] 将原串 i ~ j 之内的字符转化为回文字符所需要的最小操作次数 其中删除操作和添加操作本质上是一样的。 三个状态转移方程: dp[i][j] = min(dp[i][j] ,dp[i + 1][j]); dp[i][j] = min(dp[i][j] ,dp[i + 1][j - 1]); dp[i][j] = min(dp[i][j] ,dp[

理解String的compareTo()方法返回值

compareTo()的返回值是整型,它是先比较对应字符的大小(ASCII码顺序), 如果第一个字符和参数的第一个字符不等,结束比较,返回他们之间的差值。 如果第一个字符和参数的第一个字符相等,则以第二个字符和参数的第二个字符作比较, 以此类推,直至比较的字符或被比较的字符有一方全比较完,这时就比较字符的长度。 我们可以通过阅读源码加深对compareTo()的理解: comp

【JavaScript】基本数据类型与引用数据类型区别(及为什么String、Boolean、Number基本数据类型会有属性和方法?)

基本数据类型   JavaScript基本数据类型包括:undefined、null、number、boolean、string。基本数据类型是按值访问的,就是说我们可以操作保存在变量中的实际的值。 1)基本数据类型的值是不可变的 任何方法都无法改变一个基本类型的值,比如一个字符串: var name = "change";name.substr();//hangconsole.log

leetcode#541. Reverse String II

题目 Given a string and an integer k, you need to reverse the first k characters for every 2k characters counting from the start of the string. If there are less than k characters left, reverse all of

Java中Map取值转String Null值处理

Map<String, Object> 直接取值转String String value = (String)map.get("key") 当map.get(“key”)为Null值时会报错。 使用String类的valueOf静态方法可以解决这个问题 String value = String.valueOf(map.get("key"))

Qt的QString和C++string之间的转换

QString qstr; string str; //将QString转化为C++的string str = qstr.toStdString(); //将C++的string转化为QString qstr = QString::fromStdString(str);