本文主要是介绍VLM多模态图像识别小模型UForm,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
参考:https://github.com/unum-cloud/uform
https://huggingface.co/unum-cloud/uform-gen2-qwen-500m
https://baijiahao.baidu.com/s?id=1787054120353641459&wfr=spider&for=pc
demo:https://huggingface.co/spaces/unum-cloud/uform-gen2-qwen-500m-demo
UForm相比其他多模态模型小很多,不到5G参数
UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model consists of two parts:CLIP-like ViT-H/14
Qwen1
这篇关于VLM多模态图像识别小模型UForm的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!