OCR文字识别在UI自动化上的运用

本文主要是介绍OCR文字识别在UI自动化上的运用，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

用了Airtest的图像识别后发现在一些文字的识别上有些不准确，猜测可能是特征值比较低吧，容易匹配错。

在论坛上也看到过有人用OCR的方式，记不得是哪个帖子了，用的是腾讯云的接口吧。

按这个思路尝试了一下，腾讯云的接口有次数限制，我找了讯飞的接口，完全免费，也能用

原理很简单，给这个接口上传一张图片，后台处理生成识别出来的文字以及位置坐标。

有几个云平台提供了OCR的接口，腾讯云超过一定次数就收费，我找到了科大讯飞的接口是完全免费的。

给这个接口上传一张图片，后台处理生成识别出来的文字以及位置坐标。

所以，只要把设备的屏幕截图保存，读进来，转成base64编码，传给讯飞云接口

等着结果返回json串，解析里面包含你要找的文字，拿到位置坐标，算出中心点

点击

搞定

贴代码，讯飞云上的demo代码照搬有问题，改了一下

import urllib
def OCR_getPos(target):filePath = snapshot()f = open(filePath, 'rb')file_content = f.read()base64_image = base64.b64encode(file_content)body = urllib.parse.urlencode({'image': base64_image}).encode(encoding='utf-8')url = 'http://webapi.xfyun.cn/v1/service/v1/ocr/general'api_key = '1e90ca2d09d7213bf6770f34e6d2e70b'#用你自己的api_key替换param = {"language": "cn|en", "location": "true"}x_appid = "c23538b5" #用你自己的appid替换，我这个是乱敲的哈x_param = base64.b64encode(json.dumps(param).replace(' ', '').encode(encoding="utf-8"))x_param_b64_str = x_param.decode('utf-8')x_time = str(int(int(round(time.time() * 1000)) / 1000))string = api_key+x_time+x_param_b64_strstring = string.encode('utf-8')# string = api_key + str(x_time) + x_param# m = hashlib.new('md5')# m.update(string.encode(encoding='UTF-8'))# x_checksum = m.hexdigest()# hash = hashlib.new('md5')# hash.update(.encode(encoding='utf-8'))# x_checksum = hash.hexdigest()x_checksum = hashlib.md5(string).hexdigest()x_header = {'X-Appid': x_appid,'X-CurTime': x_time,'X-Param': x_param_b64_str,'X-CheckSum': x_checksum}req = urllib.request.Request(url, body, x_header)result = urllib.request.urlopen(req)result = result.read().decode()jsonObject = json.loads(result)location=Nonetry:data = jsonObject.get('data').get('block')for block in data:if block.get('type') == 'text':data = blockexcept:print('no words')returnlines = data.get('line')for line in lines:words = line.get('word')for word in words:content = word.get('content')if content is not None and target in content:location = word.get('location')print(location)if location :x1 = int(location.get('top_left').get('x'))y1 = int(location.get('top_left').get('y'))x2 = int(location.get('right_bottom').get('x'))y2 = int(location.get('right_bottom').get('y'))width = x2 -x1height = y2 - y1center_x = x1 + width/2center_y = y1 + height/2pos = [center_x, center_y]touch(pos)print(result+'\n')print(data)if __name__ == '__main__':OCR_getPos('姓名')

这篇关于OCR文字识别在UI自动化上的运用的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！