图像验证码识别,字母数字汉子均可cnn+lstm+ctc

2024-05-26 15:18

本文主要是介绍图像验证码识别,字母数字汉子均可cnn+lstm+ctc,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

图形验证码如下:

 

训练两轮时的准确率:上边显示的是未识别的 

 

 config_demo.yaml

System:GpuMemoryFraction: 0.7TrainSetPath: 'train/'TestSetPath: 'test/'ValSetPath: 'dev/'LabelRegex: '([\u4E00-\u9FA5]{4,8}).jpg'MaxTextLenth: 8IMG_W: 200IMG_H: 100ModelName: 'captcha2.h5'Alphabet: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'NeuralNet:RNNSize: 256Dropout: 0.25TrainParam:EarlyStoping:monitor: 'val_acc'patience: 10mode: 'auto'baseline: 0.02Epochs: 10BatchSize: 100TestBatchSize: 10

  train.py

# coding=utf-8
"""
将三通道的图片转为灰度图进行训练
"""
import itertools
import os
import re
import random
import string
from collections import Counter
from os.path import join
import yaml
import cv2
import numpy as np
import tensorflow as tf
from keras import backend as K
from keras.callbacks import ModelCheckpoint, EarlyStopping, Callback
from keras.layers import Input, Dense, Activation, Dropout, BatchNormalization, Reshape, Lambda
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers.merge import add, concatenate
from keras.layers.recurrent import GRU
from keras.models import Model, load_modelf = open('./config/config_demo.yaml', 'r', encoding='utf-8')
cfg = f.read()
cfg_dict = yaml.load(cfg)config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)TRAIN_SET_PTAH = cfg_dict['System']['TrainSetPath']
VALID_SET_PATH = cfg_dict['System']['TrainSetPath']
TEST_SET_PATH = cfg_dict['System']['TestSetPath']
IMG_W = cfg_dict['System']['IMG_W']
IMG_H = cfg_dict['System']['IMG_H']
MODEL_NAME = cfg_dict['System']['ModelName']
LABEL_REGEX = cfg_dict['System']['LabelRegex']RNN_SIZE = cfg_dict['NeuralNet']['RNNSize']
DROPOUT = cfg_dict['NeuralNet']['Dropout']MONITOR = cfg_dict['TrainParam']['EarlyStoping']['monitor']
PATIENCE = cfg_dict['TrainParam']['EarlyStoping']['patience']
MODE = cfg_dict['TrainParam']['EarlyStoping']['mode']
BASELINE = cfg_dict['TrainParam']['EarlyStoping']['baseline']
EPOCHS = cfg_dict['TrainParam']['Epochs']
BATCH_SIZE = cfg_dict['TrainParam']['BatchSize']
TEST_BATCH_SIZE = cfg_dict['TrainParam']['TestBatchSize']letters_dict = {}
MAX_LEN = 0def get_maxlen():global MAX_LENmaxlen = 0lines = open("train.csv", "r", encoding="utf-8").readlines()for line in lines:name,label = line.strip().split(",")if len(label)>maxlen:maxlen = len(label)MAX_LEN = maxlenreturn maxlendef get_letters():global letters_dictletters = ""lines = open("train.csv","r",encoding="utf-8").readlines()maxlen = get_maxlen()for line in lines:name,label = line.strip().split(",")letters = letters+labelif len(label) < maxlen:label = label + '_' * (maxlen - len(label))letters_dict[name] = labelif os.path.exists("letters.txt"):letters = open("letters.txt","r",encoding="utf-8").read()return lettersreturn "".join(set(letters))letters = get_letters()
f_W = open("letters.txt","w",encoding="utf-8")
f_W.write("".join(letters))
class_num = len(letters) + 1   # plus 1 for blank
print('Letters:', ''.join(letters))
print("letters_num:",class_num)def labels_to_text(labels):return ''.join([letters[int(x)] if int(x) != len(letters) else '' for x in labels])def text_to_labels(text):return [letters.find(x) if letters.find(x) > -1 else len(letters) for x in text]def is_valid_str(s):for ch in s:if not ch in letters:return Falsereturn Trueclass TextImageGenerator:def __init__(self,dirpath,tag,img_w, img_h,batch_size,downsample_factor,):global letters_dictself.img_h = img_hself.img_w = img_wself.batch_size = batch_sizeself.downsample_factor = downsample_factorself.letters_dict = letters_dictself.n = len(self.letters_dict)self.indexes = list(range(self.n))self.cur_index = 0self.imgs = np.zeros((self.n, self.img_h, self.img_w))self.texts = []for i, (img_filepath, text) in enumerate(self.letters_dict.items()):img_filepath = dirpath+img_filepathif i == 0:img_filepath = "train/0.jpg"img = cv2.imread(img_filepath)img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)     # cv2默认是BGR模式img = cv2.resize(img, (self.img_w, self.img_h))img = img.astype(np.float32)img /= 255self.imgs[i, :, :] = imgself.texts.append(text)print(len(self.texts),len(self.imgs),self.n)@staticmethoddef get_output_size():return len(letters) + 1def next_sample(self):   #每次返回一个数据和对应标签self.cur_index += 1if self.cur_index >= self.n:self.cur_index = 0random.shuffle(self.indexes)return self.imgs[self.indexes[self.cur_index]], self.texts[self.indexes[self.cur_index]]def next_batch(self):   #while True:# width and height are backwards from typical Keras convention# because width is the time dimension when it gets fed into the RNNif K.image_data_format() == 'channels_first':X_data = np.ones([self.batch_size, 1, self.img_w, self.img_h])else:X_data = np.ones([self.batch_size, self.img_w, self.img_h, 1])Y_data = np.ones([self.batch_size, MAX_LEN])input_length = np.ones((self.batch_size, 1)) * (self.img_w // self.downsample_factor - 2)label_length = np.zeros((self.batch_size, 1))source_str = []for i in range(self.batch_size):img, text = self.next_sample()img = img.Tif K.image_data_format() == 'channels_first':img = np.expand_dims(img, 0)     #增加一个维度else:img = np.expand_dims(img, -1)X_data[i] = imgY_data[i] = text_to_labels(text)source_str.append(text)text = text.replace("_", "")  # important steplabel_length[i] = len(text)inputs = {'the_input': X_data,'the_labels': Y_data,'input_length': input_length,'label_length': label_length,# 'source_str': source_str}outputs = {'ctc': np.zeros([self.batch_size])}yield (inputs, outputs)# # Loss and train functions, network architecture
def ctc_lambda_func(args):    #ctc损失是时间序列损失函数y_pred, labels, input_length, label_length = args# the 2 is critical here since the first couple outputs of the RNN# tend to be garbage:y_pred = y_pred[:, 2:, :]return K.ctc_batch_cost(labels, y_pred, input_length, label_length)downsample_factor = 4def train(img_w=IMG_W, img_h=IMG_H, dropout=DROPOUT, batch_size=BATCH_SIZE, rnn_size=RNN_SIZE):# Input Parameters# Network parametersconv_filters = 16kernel_size = (3, 3)pool_size = 2time_dense_size = 32if K.image_data_format() == 'channels_first':input_shape = (1, img_w, img_h)else:input_shape = (img_w, img_h, 1)global downsample_factordownsample_factor = pool_size ** 2tiger_train = TextImageGenerator(TRAIN_SET_PTAH, 'train', img_w, img_h, batch_size, downsample_factor)tiger_val = TextImageGenerator(VALID_SET_PATH, 'val', img_w, img_h, batch_size, downsample_factor)act = 'relu'input_data = Input(name='the_input', shape=input_shape, dtype='float32')inner = Conv2D(conv_filters, kernel_size, padding='same',activation=None, kernel_initializer='he_normal',name='conv1')(input_data)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max1')(inner)inner = Conv2D(conv_filters, kernel_size, padding='same',activation=None, kernel_initializer='he_normal',name='conv2')(inner)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)inner = MaxPooling2D(pool_size=(pool_size, pool_size), name='max2')(inner)conv_to_rnn_dims = (img_w // (pool_size ** 2), (img_h // (pool_size ** 2)) * conv_filters)inner = Reshape(target_shape=conv_to_rnn_dims, name='reshape')(inner)# cuts down input size going into RNN:inner = Dense(time_dense_size, activation=None, name='dense1')(inner)inner = BatchNormalization()(inner)  # add BNinner = Activation(act)(inner)if dropout:inner = Dropout(dropout)(inner)  # 防止过拟合# Two layers of bidirecitonal GRUs# GRU seems to work as well, if not better than LSTM:gru_1 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru1')(inner)gru_1b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru1_b')(inner)gru1_merged = add([gru_1, gru_1b])gru_2 = GRU(rnn_size, return_sequences=True, kernel_initializer='he_normal', name='gru2')(gru1_merged)gru_2b = GRU(rnn_size, return_sequences=True, go_backwards=True, kernel_initializer='he_normal', name='gru2_b')(gru1_merged)inner = concatenate([gru_2, gru_2b])if dropout:inner = Dropout(dropout)(inner)  # 防止过拟合# transforms RNN output to character activations:inner = Dense(tiger_train.get_output_size(), kernel_initializer='he_normal',name='dense2')(inner)y_pred = Activation('softmax', name='softmax')(inner)base_model = Model(inputs=input_data, outputs=y_pred)base_model.summary()labels = Input(name='the_labels', shape=[MAX_LEN], dtype='float32')input_length = Input(name='input_length', shape=[1], dtype='int64')label_length = Input(name='label_length', shape=[1], dtype='int64')# Keras doesn't currently support loss funcs with extra parameters# so CTC loss is implemented in a lambda layerloss_out = Lambda(ctc_lambda_func, output_shape=(1,), name='ctc')([y_pred, labels, input_length, label_length])model = Model(inputs=[input_data, labels, input_length, label_length], outputs=loss_out)# the loss calc occurs elsewhere, so use a dummy lambda func for the lossmodel.compile(loss={'ctc': lambda y_true, y_pred: y_pred}, optimizer='adadelta')earlystoping = EarlyStopping(monitor=MONITOR, patience=PATIENCE, verbose=1, mode=MODE, baseline=BASELINE)train_model_path = './tmp/train_' + MODEL_NAMEcheckpointer = ModelCheckpoint(filepath=train_model_path,verbose=1,save_best_only=True)if os.path.exists(train_model_path):model.load_weights(train_model_path)print('load model weights:%s' % train_model_path)evaluator = Evaluate(model)model.fit_generator(generator=tiger_train.next_batch(),steps_per_epoch=tiger_train.n,epochs=EPOCHS,initial_epoch=1,validation_data=tiger_val.next_batch(),validation_steps=tiger_val.n,callbacks=[checkpointer, earlystoping, evaluator])print('----train end----')# For a real OCR application, this should be beam search with a dictionary
# and language model.  For this example, best path is sufficient.
def decode_batch(out):ret = []for j in range(out.shape[0]):out_best = list(np.argmax(out[j, 2:], 1))out_best = [k for k, g in itertools.groupby(out_best)]outstr = ''for c in out_best:if c < len(letters):outstr += letters[c]ret.append(outstr)return retclass Evaluate(Callback):def __init__(self, model):self.accs = []self.model = modeldef on_epoch_end(self, epoch, logs=None):acc = evaluate(self.model)self.accs.append(acc)# Test on validation images
def evaluate(model):global downsample_factortiger_test = TextImageGenerator(VALID_SET_PATH, 'test', IMG_W, IMG_H, TEST_BATCH_SIZE, downsample_factor)net_inp = model.get_layer(name='the_input').inputnet_out = model.get_layer(name='softmax').outputpredict_model = Model(inputs=net_inp, outputs=net_out)equalsIgnoreCaseNum = 0.00equalsNum = 0.00totalNum = 0.00for inp_value, _ in tiger_test.next_batch():batch_size = inp_value['the_input'].shape[0]X_data = inp_value['the_input']net_out_value = predict_model.predict(X_data)pred_texts = decode_batch(net_out_value)labels = inp_value['the_labels']texts = []for label in labels:text = labels_to_text(label)texts.append(text)for i in range(batch_size):totalNum += 1if pred_texts[i] == texts[i]:equalsNum += 1if pred_texts[i].lower() == texts[i].lower():equalsIgnoreCaseNum += 1else:print('Predict: %s ---> Label: %s' % (pred_texts[i], texts[i]))if totalNum >= 10000:breakprint('---Result---')print('Test num: %d, accuracy: %.5f, ignoreCase accuracy: %.5f' % (totalNum, equalsNum / totalNum, equalsIgnoreCaseNum / totalNum))return equalsIgnoreCaseNum / totalNumif __name__ == '__main__':train()test = Trueif test:model_path = './tmp/train_' + MODEL_NAMEmodel = load_model(model_path, compile=False)evaluate(model)print('----End----')

  interface_testset.py

import itertools
import string
import yaml
from tqdm import tqdm
import cv2
import numpy as np
import os
import tensorflow as tf
from keras import backend as K
from keras.models import Model, load_modelf = open('./config/config_demo.yaml', 'r', encoding='utf-8')
cfg = f.read()
cfg_dict = yaml.load(cfg)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
K.set_session(session)MODEL_NAME = cfg_dict['System']['ModelName']letters = string.ascii_uppercase + string.ascii_lowercase+string.digitsdef decode_batch(out):ret = []for j in range(out.shape[0]):out_best = list(np.argmax(out[j, 2:], 1))out_best = [k for k, g in itertools.groupby(out_best)]outstr = ''for c in out_best:if c < len(letters):outstr += letters[c]ret.append(outstr)return retdef get_x_data(img_data, img_w, img_h):img = cv2.cvtColor(img_data, cv2.COLOR_RGB2GRAY)img = cv2.resize(img, (img_w, img_h))img = img.astype(np.float32)img /= 255batch_size = 1if K.image_data_format() == 'channels_first':X_data = np.ones([batch_size, 1, img_w, img_h])else:X_data = np.ones([batch_size, img_w, img_h, 1])img = img.Tif K.image_data_format() == 'channels_first':img = np.expand_dims(img, 0)else:img = np.expand_dims(img, -1)X_data[0] = imgreturn X_data# Test on validation images
def interface(datapath ="./testset" ,img_w = 200,img_h = 100):save_file = open("answer.csv","a",encoding="utf-8")save_file.truncate()model_path = './tmp/train_' + MODEL_NAMEmodel = load_model(model_path, compile=False)net_inp = model.get_layer(name='the_input').inputnet_out = model.get_layer(name='softmax').outputpredict_model = Model(inputs=net_inp, outputs=net_out)print("开始预测,预测结果:")listdir = os.listdir(datapath)bar = tqdm(range(len(listdir)),total=len(listdir))for idx in bar:img_data = cv2.imread(datapath+"/" + str(idx) + ".jpg")X_data = get_x_data(img_data, img_w, img_h)net_out_value = predict_model.predict(X_data)pred_texts = decode_batch(net_out_value)#print(str(idx) + ".jpg" + "\t", pred_texts[0])save_file.write(str(idx)+","+pred_texts[0]+"\r\n")if __name__ == '__main__':interface(datapath="./testset")

  

这篇关于图像验证码识别,字母数字汉子均可cnn+lstm+ctc的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1004832

相关文章

基于Python实现数字限制在指定范围内的五种方式

《基于Python实现数字限制在指定范围内的五种方式》在编程中,数字范围限制是常见需求,无论是游戏开发中的角色属性值、金融计算中的利率调整,还是传感器数据处理中的异常值过滤,都需要将数字控制在合理范围... 目录引言一、基础条件判断法二、数学运算巧解法三、装饰器模式法四、自定义类封装法五、NumPy数组处理

Django开发时如何避免频繁发送短信验证码(python图文代码)

《Django开发时如何避免频繁发送短信验证码(python图文代码)》Django开发时,为防止频繁发送验证码,后端需用Redis限制请求频率,结合管道技术提升效率,通过生产者消费者模式解耦业务逻辑... 目录避免频繁发送 验证码1. www.chinasem.cn避免频繁发送 验证码逻辑分析2. 避免频繁

基于Python开发一个图像水印批量添加工具

《基于Python开发一个图像水印批量添加工具》在当今数字化内容爆炸式增长的时代,图像版权保护已成为创作者和企业的核心需求,本方案将详细介绍一个基于PythonPIL库的工业级图像水印解决方案,有需要... 目录一、系统架构设计1.1 整体处理流程1.2 类结构设计(扩展版本)二、核心算法深入解析2.1 自

Python中图片与PDF识别文本(OCR)的全面指南

《Python中图片与PDF识别文本(OCR)的全面指南》在数据爆炸时代,80%的企业数据以非结构化形式存在,其中PDF和图像是最主要的载体,本文将深入探索Python中OCR技术如何将这些数字纸张转... 目录一、OCR技术核心原理二、python图像识别四大工具库1. Pytesseract - 经典O

Python基于微信OCR引擎实现高效图片文字识别

《Python基于微信OCR引擎实现高效图片文字识别》这篇文章主要为大家详细介绍了一款基于微信OCR引擎的图片文字识别桌面应用开发全过程,可以实现从图片拖拽识别到文字提取,感兴趣的小伙伴可以跟随小编一... 目录一、项目概述1.1 开发背景1.2 技术选型1.3 核心优势二、功能详解2.1 核心功能模块2.

Python验证码识别方式(使用pytesseract库)

《Python验证码识别方式(使用pytesseract库)》:本文主要介绍Python验证码识别方式(使用pytesseract库),具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全... 目录1、安装Tesseract-OCR2、在python中使用3、本地图片识别4、结合playwrigh

Python中OpenCV与Matplotlib的图像操作入门指南

《Python中OpenCV与Matplotlib的图像操作入门指南》:本文主要介绍Python中OpenCV与Matplotlib的图像操作指南,本文通过实例代码给大家介绍的非常详细,对大家的学... 目录一、环境准备二、图像的基本操作1. 图像读取、显示与保存 使用OpenCV操作2. 像素级操作3.

C/C++的OpenCV 进行图像梯度提取的几种实现

《C/C++的OpenCV进行图像梯度提取的几种实现》本文主要介绍了C/C++的OpenCV进行图像梯度提取的实现,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的... 目录预www.chinasem.cn备知识1. 图像加载与预处理2. Sobel 算子计算 X 和 Y

c/c++的opencv图像金字塔缩放实现

《c/c++的opencv图像金字塔缩放实现》本文主要介绍了c/c++的opencv图像金字塔缩放实现,通过对原始图像进行连续的下采样或上采样操作,生成一系列不同分辨率的图像,具有一定的参考价值,感兴... 目录图像金字塔简介图像下采样 (cv::pyrDown)图像上采样 (cv::pyrUp)C++ O

Python+wxPython构建图像编辑器

《Python+wxPython构建图像编辑器》图像编辑应用是学习GUI编程和图像处理的绝佳项目,本教程中,我们将使用wxPython,一个跨平台的PythonGUI工具包,构建一个简单的... 目录引言环境设置创建主窗口加载和显示图像实现绘制工具矩形绘制箭头绘制文字绘制临时绘制处理缩放和旋转缩放旋转保存编