本文主要是介绍google speech command dataset的生成,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
GitHub - hyperconnect/TC-ResNet: Code for Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
模型要分出几类:
def prepare_words_list(wanted_words):
"""Prepends common tokens to the custom word list.
前置: silence and unknown, 不管wanted words是什么都会有这个两个class
Args:
wanted_words: List of strings containing the custom words.
Returns:
List with the standard silence and unknown tokens added.
"""
return [SILENCE_LABEL, UNKNOWN_WORD_LABEL] + wanted_words
SILENCE_LABEL = '_silence_'
UNKNOWN_WORD_LABEL = '_unknown_'
silence/unknown class又有什么特殊的吗?相对custom word list是有比例的,毕竟他们不是模型的最终目的。根据silence/unknown的比例形成数据集,下一步再把他们分成testing/validation and training 训练用的数据集。
文件属于哪个数据集?
根据文件名得到所在的数据集,这里的方法有点特殊,这么做的目的是某个文件始终在某个数据集不会在training和testing等 set间变换。
文件名不变对应的hash值就不变,且val/test percentage不变则最终得到的set 值不变
percentage_hash = ((int(hash_name_hashed, 16) % (MAX_NUM_WAVS_PER_CLASS + 1)) * (100.0 / MAX_NUM_WAVS_PER_CLASS))
percentage_hash = 80.49486488472569
def which_set(filename, validation_percentage, testing_percentage):
"""Determines which data partition the file should belong to.
Args:
filename: File path of the data sample.
validation_percentage: How much of the data set to use for validation.
testing_percentage: How much of the data set to use for testing.
Returns:
String, one of 'training', 'validation', or 'testing'.
"""
base_name = os.path.basename(filename)
hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
percentage_hash = ((int(hash_name_hashed, 16) %
(MAX_NUM_WAVS_PER_CLASS + 1)) *
(100.0 / MAX_NUM_WAVS_PER_CLASS))
if percentage_hash < validation_percentage:
result = 'validation'
elif percentage_hash < (testing_percentage + validation_percentage):
result = 'testing'
else:
result = 'training'
return result
怎样生成google speech command数据集
AudioProcessor.py 对数据文件进行预处理,__init__实现数据集的创建和建立进行预处理的graph, 当调用get_data时生成预处理后的数据集
model_settings的描述
样本的描述(resample rate, clip length),
怎样进行数据处理(window size, window stride, feature bins, preprocess-对频谱的后处理),
生成什么(label_count)
def prepare_model_settings(label_count, sample_rate, clip_duration_ms,
window_size_ms, window_stride_ms, feature_bin_count,
preprocess):
"""Calculates common settings needed for all models.
Args:
label_count: How many classes are to be recognized. (包括: silence/unknown and wanted words)
sample_rate: Number of audio samples per second.
clip_duration_ms: Length of each audio clip to be analyzed. //样本的长度
window_size_ms: Duration of frequency analysis window. //分帧: 每帧的长度和步长
window_stride_ms: How far to move in time between frequency windows.
feature_bin_count: Number of frequency bins to use for analysis.//每帧取的特征数
preprocess: How the spectrogram is processed to produce features.
Returns:
Dictionary containing common settings.
Raises:
ValueError: If the preprocessing mode isn't recognized.
"""
desired_samples = int(sample_rate * clip_duration_ms / 1000)
window_size_samples = int(sample_rate * window_size_ms / 1000)
window_stride_samples = int(sample_rate * window_stride_ms / 1000)
length_minus_window = (desired_samples - window_size_samples)
if length_minus_window < 0:
spectrogram_length = 0
else: # window stride samples not window size for overlap
spectrogram_length = 1 + int(length_minus_window / window_stride_samples)
if preprocess == 'mfcc':
average_window_width = -1
fingerprint_width = feature_bin_count
elif preprocess == 'micro':
average_window_width = -1
fingerprint_width = feature_bin_count
else:
raise ValueError('Unknown preprocess mode "%s" (should be "mfcc",'
' "average", or "micro")' % (preprocess))
fingerprint_size = fingerprint_width * spectrogram_length
return {
'desired_samples': desired_samples,
'window_size_samples': window_size_samples,
'window_stride_samples': window_stride_samples,
'spectrogram_length': spectrogram_length, // 描述有多少帧
'fingerprint_width': fingerprint_width, // 每帧的特征数
'fingerprint_size': fingerprint_size, // 每个样本生成的特征数
'label_count': label_count,
'sample_rate': sample_rate,
'preprocess': preprocess,
'average_window_width': average_window_width,
}
prepare_data_index
prepare data and word 对应的index, 不是生成了(data, index)这样的样本
赋值了成员变量:data_index, word_to_index
def prepare_data_index(self, silence_percentage, unknown_percentage,
wanted_words, validation_percentage,
testing_percentage):
"""Prepares a list of the samples organized by set and label.
The training loop needs a list of all the available data, organized by
which partition it should belong to, and with ground truth labels attached.
This function analyzes the folders below the `data_dir`, figures out the
right labels for each file based on the name of the subdirectory it belongs to,
and uses a stable hash to assign it to a data set partition.
Args: silence/unknown percentage相对wanted word而言的
silence_percentage: How much of the resulting data should be background.
unknown_percentage: How much should be audio outside the wanted classes.
wanted_words: Labels of the classes we want to be able to recognize.
validation_percentage: How much of the data set to use for validation.
testing_percentage: How much of the data set to use for testing.
Returns:
Dictionary containing a list of file information for each set partition,
and a lookup map for each class to determine its numeric index.
Raises:
Exception: If expected files are not found.
"""
# Make sure the shuffling and picking of unknowns is deterministic.
random.seed(RANDOM_SEED) #next used for shuffle(用于随机得到unknown样本)
# wanted_words_index: directory, key: string word, value: index of list
wanted_words_index = {}
for index, wanted_word in enumerate(wanted_words):
wanted_words_index[wanted_word] = index + 2
# data member: data_index
self.
这篇关于google speech command dataset的生成的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!