图像验证码识别,字母数字汉子均可cnn+lstm+ctc

2024-05-26 15:18

本文主要是介绍图像验证码识别,字母数字汉子均可cnn+lstm+ctc,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

图形验证码如下:

 

训练两轮时的准确率:上边显示的是未识别的 

 

 config_demo.yaml

System:GpuMemoryFraction: 0.7TrainSetPath: 'train/'TestSetPath: 'test/'ValSetPath: 'dev/'LabelRegex: '([\u4E00-\u9FA5]{4,8}).jpg'MaxTextLenth: 8IMG_W: 200IMG_H: 100ModelName: 'captcha2.h5'Alphabet: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'NeuralNet:RNNSize: 256Dropout: 0.25TrainParam:EarlyStoping:monitor: 'val_acc'patience: 10mode: 'auto'baseline: 0.02Epochs: 10BatchSize: 100TestBatchSize: 10

  train.py

# coding=utf-8
"""
将三通道的图片转为灰度图进行训练
"""
import itertools
import os
import re
import random
import string
from collections import Counter
from os.path import join
import yaml
import cv2
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping, Callback
from keras.layers import Input, Dense, Activation, Dropout, BatchNormalization, Reshape, Lambda
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.merge import add, concatenate
from keras.layers.recurrent import GRU
from keras.models import Model, load_modelf = open('./config/config_demo.yaml', 'r', encoding='utf-8')
cfg = f.read()
cfg_dict = yaml.load(cfg)config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)TRAIN_SET_PTAH = cfg_dict['System']['TrainSetPath']
VALID_SET_PATH = cfg_dict['System']['TrainSetPath']
TEST_SET_PATH = cfg_dict['System']['TestSetPath']
IMG_W = cfg_dict['System']['IMG_W']
IMG_H = cfg_dict['System']['IMG_H']
MODEL_NAME = cfg_dict['System']['ModelName']
LABEL_REGEX = cfg_dict['System']['LabelRegex']RNN_SIZE = cfg_dict['NeuralNet']['RNNSize']
DROPOUT = cfg_dict['NeuralNet']['Dropout']MONITOR = cfg_dict['TrainParam']['EarlyStoping']['monitor']
PATIENCE = cfg_dict['TrainParam']['EarlyStoping']['patience']
MODE = cfg_dict['TrainParam']['EarlyStoping']['mode']
BASELINE = cfg_dict['TrainParam']['EarlyStoping']['baseline']
EPOCHS = cfg_dict['TrainParam']['Epochs']
BATCH_SIZE = cfg_dict['TrainParam']['BatchSize']
TEST_BATCH_SIZE = cfg_dict['TrainParam']['TestBatchSize']letters_dict = {}
MAX_LEN = 0def get_maxlen():global MAX_LENmaxlen = 0lines = open("train.csv", "r", encoding="utf-8").readlines()for line in lines:name,label = line.strip().split(",")if len(label)>maxlen:maxlen = len(label)MAX_LEN = maxlenreturn maxlendef get_letters():global letters_dictletters = ""lines = open("train.csv","r",encoding="utf-8").readlines()maxlen = get_maxlen()for line in lines:name,label = line.strip().split(",")letters = letters+labelif len(label) < maxlen:label = label + '_' * (maxlen - len(label))letters_dict[name] = labelif os.path.exists("letters.txt"):letters = open("letters.txt","r",encoding="utf-8").read()return lettersreturn "".join(set(letters))letters = get_letters()
f_W = open("letters.txt","w",encoding="utf-8")
f_W.write("".join(letters))
class_num = len(letters) + 1   # plus 1 for blank
print('Letters:', ''.join(letters))
print("letters_num:",class_num)def labels_to_text(labels):return ''.join([letters[int(x)] if int(x) != len(letters) else '' for x in labels])def text_to_labels(text):return [letters.find(x) if letters.find(x) > -1 else len(letters) for x in text]def is_valid_str(s):for ch in s:if not ch in letters:return Falsereturn Trueclass TextImageGenerator:def __init__(self,dirpath,tag,img_w, img_h,batch_size,downsample_factor,):global letters_dictself.img_h = img_hself.img_w = img_wself.batch_size = batch_sizeself.downsample_factor = downsample_factorself.letters_dict = letters_dictself.n = len(self.letters_dict)self.indexes = list(range(self.n))self.cur_index = 0self.imgs = np.zeros((self.n, self.img_h, self.img_w))self.texts = []for i, (img_filepath, text) in enumerate(self.letters_dict.items()):img_filepath = dirpath+img_filepathif i == 0:img_filepath = "train/0.jpg"img = cv2.imread(img_filepath)img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)     # cv2默认是BGR模式img = cv2.resize(img, (self.img_w, self.img_h))img = img.astype(np.float32)img /= 255self.imgs[i, :, :] = imgself.texts.append(text)print(len(self.texts),len(self.imgs),self.n)@staticmethoddef get_output_size():return len(letters) + 1def next_sample(self):   #每次返回一个数据和对应标签self.cur_index += 1if self.cur_index >= self.n:self.cur_index = 0random.shuffle(self.indexes)return self.imgs[self.indexes[self.cur_index]], self.texts[self.indexes[self.cur_index]]def next_batch(self):   #while True:# width and height are backwards from typical Keras convention# because width is the time dimension when it gets fed into the RNNif K.image_data_format() == 'channels_first':X_data = np.ones([self.batch_size, 1, self.img_w, self.img_h])else:X_data = np.ones([self.batch_size, self.img_w, self.img_h, 1])Y_data = np.ones([self.batch_size, MAX_LEN])input_length = np.ones((self.batch_size, 1)) * (self.img_w // self.downsample_factor - 2)label_length = np.zeros((self.batch_size, 1))source_str = []for i in range(self.batch_size):img, text = self.next_sample()img = img.Tif K.image_data_format() == 'channels_first':img = np.expand_dims(img, 0)     #增加一个维度else:img = np.expand_dims(img, -1)X_data[i] = imgY_data[i] = text_to_labels(text)source_str.append(text)text = text.replace("_", "")  # important steplabel_length[i] = len(text)inputs = {'the_input': X_data,'the_labels': Y_data,'input_length': input_length,'label_length': label_length,# 'source_str': source_str}outputs = {'ctc': np.zeros([self.batch_size])}yield (inputs, outputs)# # Loss and train functions, network architecture
def ctc_lambda_func(args):    #ctc损失是时间序列损失函数y_pred, labels, input_length, label_length = args# the 2 is critical here since the first couple outputs of the RNN# tend to be garbage:y_pred = y_pred[:, 2:, :]return K.ctc_batch_cost(labels, y_pred, input_length, label_length)downsample_factor = 4def train(img_w=IMG_W, img_h=IMG_H, dropout=DROPOUT, batch_size=BATCH_SIZE, rnn_size=RNN_SIZE):# Input Parameters# Network parametersconv_filters = 16kernel_size = (3, 3)pool_size = 2time_dense_size = 32if K.image_data_format() == 'channels_first':input_shape = (1, img_w, img_h)else:input_shape = (img_w, img_h, 1)global downsample_factordownsample_factor = pool_size ** 2tiger_train = TextImageGenerator(TRAIN_SET_PTAH, 'train', img_w, img_h, batch_size, downsample_factor)tiger_val = TextImageGenerator(VALID_SET_PATH, 'val', img_w, img_h, batch_size, downsample_factor)act = 'relu'input_data = Input(name='the_input', shape=input_shape, dtype='float32')inner = Conv2D(conv_filters, kernel_size, padding='same',activation=None, kernel_initializer='he_normal',name='conv1')(input_data)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)inner = Conv2D(conv_filters, kernel_size, padding='same',activation=None, kernel_initializer='he_normal',name='conv2')(inner)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)# cuts down input size going into RNN:inner = Dense(time_dense_size, activation=None, name='dense1')(inner)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)if dropout:inner = Dropout(dropout)(inner)  # 防止过拟合# Two layers of bidirecitonal GRUs# GRU seems to work as well, if not better than LSTM:gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)gru1_merged = add([gru_1, gru_1b])gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)inner = concatenate([gru_2, gru_2b])if dropout:inner = Dropout(dropout)(inner)  # 防止过拟合# transforms RNN output to character activations:inner = Dense(tiger_train.get_output_size(), kernel_initializer='he_normal',name='dense2')(inner)y_pred = Activation('softmax', name='softmax')(inner)base_model = Model(inputs=input_data, outputs=y_pred)base_model.summary()labels = Input(name='the_labels', shape=[MAX_LEN], dtype='float32')input_length = Input(name='input_length', shape=[1], dtype='int64')label_length = Input(name='label_length', shape=[1], dtype='int64')# Keras doesn't currently support loss funcs with extra parameters# so CTC loss is implemented in a lambda layerloss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])model = Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)# the loss calc occurs elsewhere, so use a dummy lambda func for the lossmodel.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')earlystoping = EarlyStopping(monitor=MONITOR, patience=PATIENCE, verbose=1, mode=MODE, baseline=BASELINE)train_model_path = './tmp/train_' + MODEL_NAMEcheckpointer = ModelCheckpoint(filepath=train_model_path,verbose=1,save_best_only=True)if os.path.exists(train_model_path):model.load_weights(train_model_path)print('load model weights:%s' % train_model_path)evaluator = Evaluate(model)model.fit_generator(generator=tiger_train.next_batch(),steps_per_epoch=tiger_train.n,epochs=EPOCHS,initial_epoch=1,validation_data=tiger_val.next_batch(),validation_steps=tiger_val.n,callbacks=[checkpointer, earlystoping, evaluator])print('----train end----')# For a real OCR application, this should be beam search with a dictionary
# and language model.  For this example, best path is sufficient.
def decode_batch(out):ret = []for j in range(out.shape[0]):out_best = list(np.argmax(out[j, 2:], 1))out_best = [k for k, g in itertools.groupby(out_best)]outstr = ''for c in out_best:if c < len(letters):outstr += letters[c]ret.append(outstr)return retclass Evaluate(Callback):def __init__(self, model):self.accs = []self.model = modeldef on_epoch_end(self, epoch, logs=None):acc = evaluate(self.model)self.accs.append(acc)# Test on validation images
def evaluate(model):global downsample_factortiger_test = TextImageGenerator(VALID_SET_PATH, 'test', IMG_W, IMG_H, TEST_BATCH_SIZE, downsample_factor)net_inp = model.get_layer(name='the_input').inputnet_out = model.get_layer(name='softmax').outputpredict_model = Model(inputs=net_inp, outputs=net_out)equalsIgnoreCaseNum = 0.00equalsNum = 0.00totalNum = 0.00for inp_value, _ in tiger_test.next_batch():batch_size = inp_value['the_input'].shape[0]X_data = inp_value['the_input']net_out_value = predict_model.predict(X_data)pred_texts = decode_batch(net_out_value)labels = inp_value['the_labels']texts = []for label in labels:text = labels_to_text(label)texts.append(text)for i in range(batch_size):totalNum += 1if pred_texts[i] == texts[i]:equalsNum += 1if pred_texts[i].lower() == texts[i].lower():equalsIgnoreCaseNum += 1else:print('Predict: %s ---> Label: %s' % (pred_texts[i], texts[i]))if totalNum >= 10000:breakprint('---Result---')print('Test num: %d, accuracy: %.5f, ignoreCase accuracy: %.5f' % (totalNum, equalsNum / totalNum, equalsIgnoreCaseNum / totalNum))return equalsIgnoreCaseNum / totalNumif __name__ == '__main__':train()test = Trueif test:model_path = './tmp/train_' + MODEL_NAMEmodel = load_model(model_path, compile=False)evaluate(model)print('----End----')

  interface_testset.py

import itertools
import string
import yaml
from tqdm import tqdm
import cv2
import numpy as np
import os
import tensorflow as tf
from keras import backend as K
from keras.models import Model, load_modelf = open('./config/config_demo.yaml', 'r', encoding='utf-8')
cfg = f.read()
cfg_dict = yaml.load(cfg)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)MODEL_NAME = cfg_dict['System']['ModelName']letters = string.ascii_uppercase + string.ascii_lowercase+string.digitsdef decode_batch(out):ret = []for j in range(out.shape[0]):out_best = list(np.argmax(out[j, 2:], 1))out_best = [k for k, g in itertools.groupby(out_best)]outstr = ''for c in out_best:if c < len(letters):outstr += letters[c]ret.append(outstr)return retdef get_x_data(img_data, img_w, img_h):img = cv2.cvtColor(img_data, cv2.COLOR_RGB2GRAY)img = cv2.resize(img, (img_w, img_h))img = img.astype(np.float32)img /= 255batch_size = 1if K.image_data_format() == 'channels_first':X_data = np.ones([batch_size, 1, img_w, img_h])else:X_data = np.ones([batch_size, img_w, img_h, 1])img = img.Tif K.image_data_format() == 'channels_first':img = np.expand_dims(img, 0)else:img = np.expand_dims(img, -1)X_data[0] = imgreturn X_data# Test on validation images
def interface(datapath ="./testset" ,img_w = 200,img_h = 100):save_file = open("answer.csv","a",encoding="utf-8")save_file.truncate()model_path = './tmp/train_' + MODEL_NAMEmodel = load_model(model_path, compile=False)net_inp = model.get_layer(name='the_input').inputnet_out = model.get_layer(name='softmax').outputpredict_model = Model(inputs=net_inp, outputs=net_out)print("开始预测,预测结果:")listdir = os.listdir(datapath)bar = tqdm(range(len(listdir)),total=len(listdir))for idx in bar:img_data = cv2.imread(datapath+"/" + str(idx) + ".jpg")X_data = get_x_data(img_data, img_w, img_h)net_out_value = predict_model.predict(X_data)pred_texts = decode_batch(net_out_value)#print(str(idx) + ".jpg" + "\t", pred_texts[0])save_file.write(str(idx)+","+pred_texts[0]+"\r\n")if __name__ == '__main__':interface(datapath="./testset")

  

这篇关于图像验证码识别,字母数字汉子均可cnn+lstm+ctc的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1004832

相关文章

基于人工智能的图像分类系统

目录 引言项目背景环境准备 硬件要求软件安装与配置系统设计 系统架构关键技术代码示例 数据预处理模型训练模型预测应用场景结论 1. 引言 图像分类是计算机视觉中的一个重要任务,目标是自动识别图像中的对象类别。通过卷积神经网络(CNN)等深度学习技术,我们可以构建高效的图像分类系统,广泛应用于自动驾驶、医疗影像诊断、监控分析等领域。本文将介绍如何构建一个基于人工智能的图像分类系统,包括环境

从去中心化到智能化:Web3如何与AI共同塑造数字生态

在数字时代的演进中,Web3和人工智能(AI)正成为塑造未来互联网的两大核心力量。Web3的去中心化理念与AI的智能化技术,正相互交织,共同推动数字生态的变革。本文将探讨Web3与AI的融合如何改变数字世界,并展望这一新兴组合如何重塑我们的在线体验。 Web3的去中心化愿景 Web3代表了互联网的第三代发展,它基于去中心化的区块链技术,旨在创建一个开放、透明且用户主导的数字生态。不同于传统

阿里开源语音识别SenseVoiceWindows环境部署

SenseVoice介绍 SenseVoice 专注于高精度多语言语音识别、情感辨识和音频事件检测多语言识别: 采用超过 40 万小时数据训练,支持超过 50 种语言,识别效果上优于 Whisper 模型。富文本识别:具备优秀的情感识别,能够在测试数据上达到和超过目前最佳情感识别模型的效果。支持声音事件检测能力,支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。高效推

usaco 1.2 Name That Number(数字字母转化)

巧妙的利用code[b[0]-'A'] 将字符ABC...Z转换为数字 需要注意的是重新开一个数组 c [ ] 存储字符串 应人为的在末尾附上 ‘ \ 0 ’ 详见代码: /*ID: who jayLANG: C++TASK: namenum*/#include<stdio.h>#include<string.h>int main(){FILE *fin = fopen (

Spring 验证码(kaptcha)

首先引入需要的jar包: <dependency><groupId>com.github.axet</groupId><artifactId>kaptcha</artifactId><version>0.0.9</version></dependency> 配置验证码相关设置: <bean id="captchaProducer" class="com.

Verybot之OpenCV应用一:安装与图像采集测试

在Verybot上安装OpenCV是很简单的,只需要执行:         sudo apt-get update         sudo apt-get install libopencv-dev         sudo apt-get install python-opencv         下面就对安装好的OpenCV进行一下测试,编写一个通过USB摄像头采

AIGC6: 走进腾讯数字盛会

图中是一个程序员,去参加一个技术盛会。AI大潮下,五颜六色,各种不确定。 背景 AI对各行各业的冲击越来越大,身处职场的我也能清晰的感受到。 我所在的行业为全球客服外包行业。 业务模式为: 为国际跨境公司提供不同地区不同语言的客服外包解决方案,除了人力,还有软件系统。 软件系统主要是提供了客服跟客人的渠道沟通和工单管理,内部管理跟甲方的合同对接,绩效评估,BI数据透视。 客服跟客人

深度学习实战:如何利用CNN实现人脸识别考勤系统

1. 何为CNN及其在人脸识别中的应用 卷积神经网络(CNN)是深度学习中的核心技术之一,擅长处理图像数据。CNN通过卷积层提取图像的局部特征,在人脸识别领域尤其适用。CNN的多个层次可以逐步提取面部的特征,最终实现精确的身份识别。对于考勤系统而言,CNN可以自动从摄像头捕捉的视频流中检测并识别出员工的面部。 我们在该项目中采用了 RetinaFace 模型,它基于CNN的结构实现高效、精准的

Clion不识别C代码或者无法跳转C语言项目怎么办?

如果是中文会显示: 此时只需要右击项目,或者你的源代码目录,将这个项目或者源码目录标记为项目源和头文件即可。 英文如下:

【python计算机视觉编程——7.图像搜索】

python计算机视觉编程——7.图像搜索 7.图像搜索7.1 基于内容的图像检索(CBIR)从文本挖掘中获取灵感——矢量空间模型(BOW表示模型)7.2 视觉单词**思想****特征提取**: 创建词汇7.3 图像索引7.3.1 建立数据库7.3.2 添加图像 7.4 在数据库中搜索图像7.4.1 利用索引获取获选图像7.4.2 用一幅图像进行查询7.4.3 确定对比基准并绘制结果 7.