Looking to Listen at the Cocktail Party 代码详解

2023-10-24 22:58

本文主要是介绍Looking to Listen at the Cocktail Party 代码详解,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

这个是清华某位大佬对论文《Looking to Listen at the Cocktail Party 》的一个复现。代码链接

网络结构如下图:

在这里插入图片描述
由于AVSpeech这个数据集里是一些视频的片段,而输入网络的是视频中的人脸区域。所以先要做人脸识别,并把人脸截取。
这个代码中使用了Python的一个pretrained的mtcnn的包直接做的。

def face_detect(file,detector,frame_path,cat_train,output_dir):name = file.replace('.jpg', '').split('-')log = cat_train.iloc[int(name[0])]x = log[3]y = log[4]img = cv2.imread('%s%s'%(frame_path,file))x = img.shape[1] * xy = img.shape[0] * yfaces = detector.detect_faces(img)# check if detected facesif(len(faces)==0):print('no face detect: '+file)return #no facebounding_box = bounding_box_check(faces,x,y)if(bounding_box == None):print('face is not related to given coord: '+file)returnprint(file," ",bounding_box)print(file," ",x, y)crop_img = img[bounding_box[1]:bounding_box[1] + bounding_box[3],bounding_box[0]:bounding_box[0]+bounding_box[2]]crop_img = cv2.resize(crop_img,(160,160))cv2.imwrite('%s/frame_'%output_dir + name[0] + '_' + name[1] + '.jpg', crop_img)#crop_img = cv2.cvtColor(crop_img, cv2.COLOR_BGR2RGB)#plt.imshow(crop_img)#plt.show()

以下是AV的model代码:

from keras.models import Sequential
from keras.layers import Input, Dense, Convolution2D,Bidirectional, concatenate
from keras.layers import Flatten, BatchNormalization, ReLU, Reshape, Lambda, TimeDistributed
from keras.models import Model
from keras.layers.recurrent import LSTM
from keras.initializers import he_normal, glorot_uniform
import tensorflow as tfdef AV_model(people_num=2):def UpSampling2DBilinear(size):return Lambda(lambda x: tf.image.resize(x, size, method=tf.image.ResizeMethod.BILINEAR))def sliced(x, index):return x[:, :, :, index]# --------------------------- AS start ---------------------------audio_input = Input(shape=(298, 257, 2))print('as_0:', audio_input.shape)as_conv1 = Convolution2D(96, kernel_size=(1, 7), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='as_conv1')(audio_input)as_conv1 = BatchNormalization()(as_conv1)as_conv1 = ReLU()(as_conv1)print('as_1:', as_conv1.shape)as_conv2 = Convolution2D(96, kernel_size=(7, 1), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='as_conv2')(as_conv1)as_conv2 = BatchNormalization()(as_conv2)as_conv2 = ReLU()(as_conv2)print('as_2:', as_conv2.shape)as_conv3 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='as_conv3')(as_conv2)as_conv3 = BatchNormalization()(as_conv3)as_conv3 = ReLU()(as_conv3)print('as_3:', as_conv3.shape)as_conv4 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(2, 1), name='as_conv4')(as_conv3)as_conv4 = BatchNormalization()(as_conv4)as_conv4 = ReLU()(as_conv4)print('as_4:', as_conv4.shape)as_conv5 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(4, 1), name='as_conv5')(as_conv4)as_conv5 = BatchNormalization()(as_conv5)as_conv5 = ReLU()(as_conv5)print('as_5:', as_conv5.shape)as_conv6 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(8, 1), name='as_conv6')(as_conv5)as_conv6 = BatchNormalization()(as_conv6)as_conv6 = ReLU()(as_conv6)print('as_6:', as_conv6.shape)as_conv7 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(16, 1), name='as_conv7')(as_conv6)as_conv7 = BatchNormalization()(as_conv7)as_conv7 = ReLU()(as_conv7)print('as_7:', as_conv7.shape)as_conv8 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(32, 1), name='as_conv8')(as_conv7)as_conv8 = BatchNormalization()(as_conv8)as_conv8 = ReLU()(as_conv8)print('as_8:', as_conv8.shape)as_conv9 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='as_conv9')(as_conv8)as_conv9 = BatchNormalization()(as_conv9)as_conv9 = ReLU()(as_conv9)print('as_9:', as_conv9.shape)as_conv10 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(2, 2), name='as_conv10')(as_conv9)as_conv10 = BatchNormalization()(as_conv10)as_conv10 = ReLU()(as_conv10)print('as_10:', as_conv10.shape)as_conv11 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(4, 4), name='as_conv11')(as_conv10)as_conv11 = BatchNormalization()(as_conv11)as_conv11 = ReLU()(as_conv11)print('as_11:', as_conv11.shape)as_conv12 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(8, 8), name='as_conv12')(as_conv11)as_conv12 = BatchNormalization()(as_conv12)as_conv12 = ReLU()(as_conv12)print('as_12:', as_conv12.shape)as_conv13 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(16, 16), name='as_conv13')(as_conv12)as_conv13 = BatchNormalization()(as_conv13)as_conv13 = ReLU()(as_conv13)print('as_13:', as_conv13.shape)as_conv14 = Convolution2D(96, kernel_size=(5, 5), strides=(1, 1), padding='same', dilation_rate=(32, 32), name='as_conv14')(as_conv13)as_conv14 = BatchNormalization()(as_conv14)as_conv14 = ReLU()(as_conv14)print('as_14:', as_conv14.shape)as_conv15 = Convolution2D(8, kernel_size=(1, 1), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='as_conv15')(as_conv14)as_conv15 = BatchNormalization()(as_conv15)as_conv15 = ReLU()(as_conv15)print('as_15:', as_conv15.shape)AS_out = Reshape((298, 8 * 257))(as_conv15)print('AS_out:', AS_out.shape)# --------------------------- AS end ---------------------------# --------------------------- VS_model start ---------------------------VS_model = Sequential()VS_model.add(Convolution2D(256, kernel_size=(7, 1), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='vs_conv1'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Convolution2D(256, kernel_size=(5, 1), strides=(1, 1), padding='same', dilation_rate=(1, 1), name='vs_conv2'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Convolution2D(256, kernel_size=(5, 1), strides=(1, 1), padding='same', dilation_rate=(2, 1), name='vs_conv3'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Convolution2D(256, kernel_size=(5, 1), strides=(1, 1), padding='same', dilation_rate=(4, 1), name='vs_conv4'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Convolution2D(256, kernel_size=(5, 1), strides=(1, 1), padding='same', dilation_rate=(8, 1), name='vs_conv5'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Convolution2D(256, kernel_size=(5, 1), strides=(1, 1), padding='same', dilation_rate=(16, 1), name='vs_conv6'))VS_model.add(BatchNormalization())VS_model.add(ReLU())VS_model.add(Reshape((75, 256, 1)))VS_model.add(UpSampling2DBilinear((298, 256)))VS_model.add(Reshape((298, 256)))# --------------------------- VS_model end ---------------------------video_input = Input(shape=(75, 1, 1792, people_num))AVfusion_list = [AS_out]for i in range(people_num):single_input = Lambda(sliced, arguments={'index': i})(video_input)VS_out = VS_model(single_input)AVfusion_list.append(VS_out)AVfusion = concatenate(AVfusion_list, axis=2)AVfusion = TimeDistributed(Flatten())(AVfusion)print('AVfusion:', AVfusion.shape)lstm = Bidirectional(LSTM(400, input_shape=(298, 8 * 257), return_sequences=True), merge_mode='sum')(AVfusion)print('lstm:', lstm.shape)fc1 = Dense(600, name="fc1", activation='relu', kernel_initializer=he_normal(seed=27))(lstm)print('fc1:', fc1.shape)fc2 = Dense(600, name="fc2", activation='relu', kernel_initializer=he_normal(seed=42))(fc1)print('fc2:', fc2.shape)fc3 = Dense(600, name="fc3", activation='relu', kernel_initializer=he_normal(seed=65))(fc2)print('fc3:', fc3.shape)complex_mask = Dense(257 * 2 * people_num, name="complex_mask", kernel_initializer=glorot_uniform(seed=87))(fc3)print('complex_mask:', complex_mask.shape)complex_mask_out = Reshape((298, 257, 2, people_num))(complex_mask)print('complex_mask_out:', complex_mask_out.shape)AV_model = Model(inputs=[audio_input, video_input], outputs=complex_mask_out)# # compile AV_model# AV_model.compile(optimizer='adam', loss='mse')return AV_model

这个大佬太强了,自愧不如。

这篇关于Looking to Listen at the Cocktail Party 代码详解的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/278322

相关文章

MySQL中的交叉连接、自然连接和内连接查询详解

《MySQL中的交叉连接、自然连接和内连接查询详解》:本文主要介绍MySQL中的交叉连接、自然连接和内连接查询,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录一、引入二、交php叉连接(cross join)三、自然连接(naturalandroid join)四

Go 语言中的select语句详解及工作原理

《Go语言中的select语句详解及工作原理》在Go语言中,select语句是用于处理多个通道(channel)操作的一种控制结构,它类似于switch语句,本文给大家介绍Go语言中的select语... 目录Go 语言中的 select 是做什么的基本功能语法工作原理示例示例 1:监听多个通道示例 2:带

mysql的基础语句和外键查询及其语句详解(推荐)

《mysql的基础语句和外键查询及其语句详解(推荐)》:本文主要介绍mysql的基础语句和外键查询及其语句详解(推荐),本文给大家介绍的非常详细,对大家的学习或工作具有一定的参考借鉴价值,需要的朋... 目录一、mysql 基础语句1. 数据库操作 创建数据库2. 表操作 创建表3. CRUD 操作二、外键

Spring Boot项目部署命令java -jar的各种参数及作用详解

《SpringBoot项目部署命令java-jar的各种参数及作用详解》:本文主要介绍SpringBoot项目部署命令java-jar的各种参数及作用的相关资料,包括设置内存大小、垃圾回收... 目录前言一、基础命令结构二、常见的 Java 命令参数1. 设置内存大小2. 配置垃圾回收器3. 配置线程栈大小

鸿蒙中@State的原理使用详解(HarmonyOS 5)

《鸿蒙中@State的原理使用详解(HarmonyOS5)》@State是HarmonyOSArkTS框架中用于管理组件状态的核心装饰器,其核心作用是实现数据驱动UI的响应式编程模式,本文给大家介绍... 目录一、@State在鸿蒙中是做什么的?二、@Spythontate的基本原理1. 依赖关系的收集2.

jupyter代码块没有运行图标的解决方案

《jupyter代码块没有运行图标的解决方案》:本文主要介绍jupyter代码块没有运行图标的解决方案,具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全的地方,望不吝赐教... 目录jupyter代码块没有运行图标的解决1.找到Jupyter notebook的系统配置文件2.这时候一般会搜索到

Redis实现延迟任务的三种方法详解

《Redis实现延迟任务的三种方法详解》延迟任务(DelayedTask)是指在未来的某个时间点,执行相应的任务,本文为大家整理了三种常见的实现方法,感兴趣的小伙伴可以参考一下... 目录1.前言2.Redis如何实现延迟任务3.代码实现3.1. 过期键通知事件实现3.2. 使用ZSet实现延迟任务3.3

C语言函数递归实际应用举例详解

《C语言函数递归实际应用举例详解》程序调用自身的编程技巧称为递归,递归做为一种算法在程序设计语言中广泛应用,:本文主要介绍C语言函数递归实际应用举例的相关资料,文中通过代码介绍的非常详细,需要的朋... 目录前言一、递归的概念与思想二、递归的限制条件 三、递归的实际应用举例(一)求 n 的阶乘(二)顺序打印

Python Faker库基本用法详解

《PythonFaker库基本用法详解》Faker是一个非常强大的库,适用于生成各种类型的伪随机数据,可以帮助开发者在测试、数据生成、或其他需要随机数据的场景中提高效率,本文给大家介绍PythonF... 目录安装基本用法主要功能示例代码语言和地区生成多条假数据自定义字段小结Faker 是一个 python

Python通过模块化开发优化代码的技巧分享

《Python通过模块化开发优化代码的技巧分享》模块化开发就是把代码拆成一个个“零件”,该封装封装,该拆分拆分,下面小编就来和大家简单聊聊python如何用模块化开发进行代码优化吧... 目录什么是模块化开发如何拆分代码改进版:拆分成模块让模块更强大:使用 __init__.py你一定会遇到的问题模www.