Computer Vision for Predicting Facial Attractiveness

2024-03-16 08:30

本文主要是介绍Computer Vision for Predicting Facial Attractiveness,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

转载:http://www.learnopencv.com/computer-vision-for-predicting-facial-attractiveness/

Computer Vision for Predicting Facial Attractiveness

Tiifany Hwang, Korean pop star

Most of us have looked in the mirror and wondered how good we look. But, it is often difficult to be objective while judging our own attractiveness, and we are often too embarrassed to ask for others’ opinion. What if there was a computer program that could answer this question for you, without a human to look at your image? Pretty nifty, huh?

In this post, I will show you how we can use computer vision and machine learning to predict facial attractiveness of an individual from a single photograph. I will use OpenCV, Numpy and scikit-learn to develop a completely automated pipeline that takes a photograph of a person’s face, and rates the photo on a scale of 1 to 5. The code is available at the bottom of this page.

This is a guest post by  Avi Singh — an undergraduate student at Indian Institute of Technology, Kanpur. Visit his  page to see all the cool projects he has done.

While applying machine learning algorithms to a computer vision problems, the high dimensionality of visual data presents a huge challenge. Even a relatively low-res 200×200 image translates to a feature vector of length 40,000. Deep learning models like Convolutional Neural Nets can work directly with raw images, but they require huge amounts of data to be successful. Since we only have a access to a small-ish dataset in this tutorial (500 images), we cannot work with raw images, and we therefore need to extract some informative features from these images.

Feature Extraction for Facial Attractiveness

One feature that determines facial attractiveness (as cited in psychology literature), is the ratio of the distances between the various facial landmarks. Facial landmarks are features like the corner of the eyes, tip of the nose, lowest point on the chin, etc. You can see these points marked in red on the image at the top of this page. In order to calculate these ratios, we would need to have an algorithm with can localize the landmarks accurately in an image. In this post, we would be making use of an open-source software called the CLM framework to extract these landmarks. It is written in C++, and utilizes OpenCV 3.0. It has instructions for both Windows and Ubuntu (which should also work with OS X without much change), and works out of the box. If you do not want to bother with setting up this piece of software, I will be providing a file with facial landmark coordinates of all the images that I use in this tutorial, and you can then tinker with the Machine Learning part of the pipeline.

Facial Attractiveness Dataset

Now that we have a way to extract features, what we need now is the data to extract these features from. While  there are several datasets available to benchmark facial attractiveness, most of them require signing up, and won’t be available to people outside the academia. However, I was able to find one dataset that does not have these requirements. It is called the SCUT-FBP dataset, and contains images of 500 Asian females. Each image is rated by 10 users, and we would be attempting to predict the average rating for each image.

A peak into the SCUT-FBP dataset

Generating Features

We first run the CLM-framework on these images and write the facial landmarks to a file. Click here to get the file. Next we use a python script to calculate ratios between all possible pairs of points, and save the results to a text file. Now that we have our features, we can finally delve into Machine Learning!

Machine Learning for Rating a Face

While there are numerous packages available for Machine Learning, I chose to go with scikit-learn in this post because it’s easy to install and use. Installing scikit-learn is as simple as:

Ubuntu

1
sudo apt-get install python-sklearn

Mac OSX

1
pip install -U numpy scipy scikit-learn

Windows

1
pip install -U scikit-learn

For more details, you can visit the official scikit-learn installation page.

Once you are past the installation, you will notice that working with scikit-learn is a walk in the park. Follow these instructions to load the data (both features and ratings):

1
2
3
4
import numpy as np
features = np.loadtxt( 'features_ALL.txt' , delimiter = ',' )
ratings = np.loadtxt( 'labels.txt' , delimiter = ',' )

Now, if you open the features file, you will see that the number of features per image are extremely large (for the size of our dataset), and if we directly use them, we will face a problem called the Curse of Dimensionality. What we need is to project this large feature space into a smaller feature space, and the algorithm to do so is Principal Component Analysis (PCA). Using PCA in scikit-learn is pretty simple, and the following code is all you need:

1
2
3
4
5
6
from sklearn import decomposition
pca = decomposition.PCA(n_components = 20 )
pca.fit(features_train)
features_train = pca.transform(features_train)
features_test = pca.transform(features_test)

In the above code, n_components is a hyper-parameter, and it’s value can be chosen by testing the performance of the model on a validation set (or cross-validation).

Now we need to choose what model we should use for our problem. Our problem is a regression problem. There are several models/algorithms available to deal with these problem like Linear Regression, Support Vector Regressors (SVR), Random Forest Regression, Gaussian Process Regression, et all.

Let’s have a look at the code that we need to train a linear model:

1
2
3
regr = linear_model.LinearRegression()
regr.fit(features_train, ratings_train)
ratings_predict = regr.predict(features_test)

The above code will fit a linear regressor on the traininng set, and then store the predictions. The next step in our pipeline is to do performance evaluation. One way to do so is to calculate the Pearson correlation between the predicted ratings and the actual ratings. This can be calculated using Numpy as shown below

1
corr = np.corrcoef(ratings_predict, ratings_test)[ 0 , 1 ]

Results

One thing that I have not talked about so far is how I split the dataset into training and testing. To get the initial results, I used a split of 450 training images and 50 testing images. But, since we have very limited data, we should try to make use of as much data as possible to train the model. One way to do so is leave-one-out cross-validation. In this technique, we take 499 images for training, and then test the model on the one image that we left out. We do this for all 500 images (train 500 models), and then calculate the Pearson correlation between the predictions and the actual ratings. The graphic below highlights the performance obtained when I evaluated several models using this method.

Performance of the various models

Linear Regression and Random Forest perform the best, with a Pearson correlation of 0.64 each. K-Nearest Neighbors is comparable with 0.6, while Gaussian Process regression lags a bit at 0.52. SVM perform really poorly, which was somewhat of a surprise for me.

Testing on Celebrities

Yoona, Predicted Rating: 3.6

Yuri, Predicted Rating: 3.4

Yuri, Predicted Rating: 3.4

Tiffany, Predicted Rating: 3.8

Tiffany, Predicted Rating: 3.8

Now, let us have some fun, and use the model that we just trained on some celebrity photographs! I am personally a fan of the K-pop group Girls’ Generation, and since the training images in the dataset are all asian females, the singers of Girls’ Generations are ideal for our experiment!

As one might expect, the predicted ratings are quite high (the average was 2.7 in the training set). This shows that the algorithms generalizes well, at least for faces of the same gender and ethnicity.

There are a lot of things that we could do to improve the performance. The present algorithm takes into account only the shape of the face. Some other important features might be skin gradients and hair style, which I have not modeled in this blog post. Perhaps a future approach can use pre-trained Convolutional Neural Nets to extract high level features, and then something like an Random Forest or an SVM at the top can do the regression.

Subscribe & Download Code

You can download the code for this article on GitHub. To get access to all code and images used in all articles ( including this one ), please subscribeto our newsletter. You will also receive a free Computer Vision Resource guide. In our newsletter we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

这篇关于Computer Vision for Predicting Facial Attractiveness的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/814891

相关文章

Computer Exercise

每日一练 单选题 在计算机机箱前面板接口插针上(     C   )表示复位开关。 A.SPK    B.PWRLED    C.RESET    D.HDDLED每台PC机最多可接(     B   )块IDE硬盘。 A.2    B.4    C.6    D.8(     B   )拓扑结构由连接成封闭回路的网络结点组成的,每一结点与它左右相邻的结点连接。 A.总线型    B

复盘高质量Vision Pro沉浸式视频的制作流程与工具

在探索虚拟现实(VR)和增强现实(AR)技术的过程中,高质量的沉浸式体验是至关重要的。最近,国外开发者Dreamwieber在其作品中展示了如何使用一系列工具和技术,创造出令人震撼的Vision Pro沉浸式视频。本文将详细复盘Dreamwieber的工作流,希望能为从事相关领域的开发者们提供有价值的参考。 一、步骤和工作流 构建基础原型 目的:快速搭建起一个基本的模型,以便在设备

一键部署Phi 3.5 mini+vision!多模态阅读基准数据集MRR-Benchmark上线,含550个问答对

小模型又又又卷起来了!微软开源三连发!一口气发布了 Phi 3.5 针对不同任务的 3 个模型,并在多个基准上超越了其他同类模型。 其中 Phi-3.5-mini-instruct 专为内存或算力受限的设备推出,小参数也能展现出强大的推理能力,代码生成、多语言理解等任务信手拈来。而 Phi-3.5-vision-instruct 则是多模态领域的翘楚,能同时处理文本和视觉信息,图像理解、视频摘要

HOW DO VISION TRANSFORMERS WORK

HOW DO VISION TRANSFORMERS WORK Namuk Park1,2, Songkuk Kim1 1Yonsei University, 2NAVER AI Lab{namuk.park,songkuk}@yonsei.ac.kr 总结 MSA 改善模型泛化能力: MSA 不仅提高了模型的准确性,还通过平滑损失景观来提高泛化能力。损失景观的平坦化使得模型更容易优化,表现

在Vision Pro上实现360度全景视频播放:HLS360VideoMaterial框架介绍

随着Apple Vision Pro的推出,空间计算技术正在变得越来越普及,而360度全景视频则是其中一种令人兴奋的应用形式。对于希望在visionOS平台上集成360度视频流的开发者而言,找到合适的工具和框架至关重要。今天,我们要介绍的正是这样一个框架——HLS360VideoMaterial,它可以帮助你在Vision Pro上轻松实现360度全景视频的播放,并支持二次开发,让你的应用更上一层

Vision Transformer (ViT) + 代码【详解】

文章目录 1、Vision Transformer (ViT) 介绍2、patch embedding3、代码3.1 class embedding + Positional Embedding3.2 Transformer Encoder3.3 classifier3.4 ViT总代码 1、Vision Transformer (ViT) 介绍 VIT论文的摘要如下,谷歌

【课程笔记】谭平计算机视觉(Computer Vision)[5]:反射和光照 - Reflectance Lighting

课程链接(5-1): 课程链接(5-2): radiance的影响因素(辐射强度) 光源 材质、反射 局部形状 反射 计算机视觉中主要考虑反射 BRDF(Bi-directional reflectance distribution function) BRDF假设(local assumption):反射只和此点接收到的光有关,忽略了半透明、荧光等 这个假设导致依靠BRDF模型建立的人皮

【课程笔记】谭平计算机视觉(Computer Vision)[4]:辐射校准高动态范围图像 - Radiometric Calibration HDR

视频地址链接 预备知识 radiance:单位面积单位时间单位方向角发出去的能量 irradiance:单位:功率/平方米;单位面积单位时间接收的能量 ISP: image signal processor 白平衡:人眼会自动滤过白炽灯、日光灯、节能灯下对物体的附加颜色,然而相机没有此功能,因此相机具有矫正功能。 vignetting:对于白墙拍照,一般是中间亮周边暗。边缘上光线散开的效果,

《Computer Organization and Design》Chap.6 笔记

原本昨天应该看完的Chap.6,没想到大晚上居然停电了。Chap.6主要是介绍parallel processors,内容不深。 提要: SISD, MIMD, SIMD, SPMD和vector的原理。硬件多线程技术。多核&多处理器,多处理器网络拓扑。 (待再看) 6.10&6.11待看。 内容: SISD, MIMD,SIMD的中文解释(引用百度百科)—— SISD(Single

《Computer Organization and Design》Chap.5 笔记

Memory Hierarchy 这章读起来较困难,需要多次学习!!! 提要: cache的基本原理,如何读写?如何处理miss?virtual memory,其原理与cache有类似的地方。 5.3&5.4重看,5.5-5.10&5.13-5.17待看! 内容: Two different types of locality: temporal locality & spatial loc