LeetCode--393. UTF-8 Validation

本文主要是介绍LeetCode--393. UTF-8 Validation，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

问题链接：https://leetcode.com/problems/utf-8-validation/

UTF-8的编码规则比较复杂，要仔细阅读确定编码规则，自己想的算法也比较暴力朴素，易于理解：

1.将十进制数转为长度为8的定长字符串 2.使用双指针法检查是否合法，代码如下

class Solution {public static boolean validUtf8(int[] data) {// Integer.toBinaryString()String[] strs=new String[data.length];for(int i=0;i<data.length;i++)strs[i]=strs[i]=Integer.toBinaryString(1<<8 | data[i]).substring(1);;int i=0;while(i<data.length){int count=afterBytes(strs[i]);if(count>3 || count<0)return false;for(int k=1;k<=count;k++){if(i+k>=data.length || !isValid(strs[i+k]))return false;}i=i+count+1;}          return true;}public static int afterBytes(String str){int i=0;while(i< str.length()&& str.charAt(i)=='1')i++;if(i==0)return 0;if(i==1)return -1;return i-1;}public static boolean isValid(String str){return str.charAt(0)=='1' && str.charAt(1)=='0';}
}

Solutions里面提供了两种基于Bit Manipulation的精彩算法，代码如下：

class Solution {public boolean validUtf8(int[] data) {// Number of bytes in the current UTF-8 characterint numberOfBytesToProcess = 0;// For each integer in the data array.for (int i = 0; i < data.length; i++) {// Get the binary representation. We only need the least significant 8 bits// for any given number.String binRep = Integer.toBinaryString(data[i]);binRep =binRep.length() >= 8? binRep.substring(binRep.length() - 8): "00000000".substring(binRep.length() % 8) + binRep;// If this is the case then we are to start processing a new UTF-8 character.if (numberOfBytesToProcess == 0) {// Get the number of 1s in the beginning of the string.for (int j = 0; j < binRep.length(); j++) {if (binRep.charAt(j) == '0') {break;}numberOfBytesToProcess += 1;}// 1 byte charactersif (numberOfBytesToProcess == 0) {continue;}// Invalid scenarios according to the rules of the problem.if (numberOfBytesToProcess > 4 || numberOfBytesToProcess == 1) {return false;}} else {// Else, we are processing integers which represent bytes which are a part of// a UTF-8 character. So, they must adhere to the pattern `10xxxxxx`.if (!(binRep.charAt(0) == '1' && binRep.charAt(1) == '0')) {return false;}}// We reduce the number of bytes to process by 1 after each integer.numberOfBytesToProcess -= 1;}// This is for the case where we might not have the complete data for// a particular UTF-8 character.return numberOfBytesToProcess == 0;}
}

class Solution {public boolean validUtf8(int[] data) {// Number of bytes in the current UTF-8 characterint numberOfBytesToProcess = 0;// Masks to check two most significant bits in a byte.int mask1 = 1 << 7;int mask2 = 1 << 6;// For each integer in the data array.for(int i = 0; i < data.length; i++) {// If this is the case then we are to start processing a new UTF-8 character.if (numberOfBytesToProcess == 0) {int mask = 1 << 7;while ((mask & data[i]) != 0) {numberOfBytesToProcess += 1;mask = mask >> 1;}// 1 byte charactersif (numberOfBytesToProcess == 0) {continue;}// Invalid scenarios according to the rules of the problem.if (numberOfBytesToProcess > 4 || numberOfBytesToProcess == 1) {return false;}} else {// data[i] should have most significant bit set and// second most significant bit unset. So, we use the two masks// to make sure this is the case.if (!((data[i] & mask1) != 0 && (mask2 & data[i]) == 0)) {return false;}}// We reduce the number of bytes to process by 1 after each integer.numberOfBytesToProcess -= 1;}// This is for the case where we might not have the complete data for// a particular UTF-8 character.return numberOfBytesToProcess == 0;}
}

这篇关于LeetCode--393. UTF-8 Validation的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！