本文主要是介绍帕金森定律_通过图像分析将帕金森病分类2,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
帕金森定律
应用计算机视觉 (Applied Computer Vision)
In my previous post, I outlined some manually obtained features such as number of intersection and end-points, line thickness, standard deviation, etc. In this post, we will apply those features to try and classify two sets of drawings into either healthy or Parkinson’s for wave and spiral type. In addition, we will also look at two other methods of classification using more advanced machine learning approaches: namely feature extraction combined with logistic regression; as well as a custom, trained-from-scratch convolutional model.
在我的上一篇文章中 ,我概述了一些手动获得的特征,例如交点和终点的数量,线宽,标准偏差等。在本篇文章中,我们将应用这些特征来尝试将两组图形分类为健康图形或正常图形帕金森氏波和螺旋型。 此外,我们还将研究使用更高级的机器学习方法进行的其他两种分类方法:即特征提取与逻辑回归相结合; 以及自定义的,从零开始训练的卷积模型。
Before we begin, disclaimer that this is not meant to be any kind of medical study or test. Please refer to the original paper for details on the actual experiment, which I was not a part of.Zham P, Kumar DK, Dabnichki P, Poosapadi Arjunan S, Raghav S. Distinguishing Different Stages of Parkinson’s Disease Using Composite Index of Speed and Pen-Pressure of Sketching a Spiral. Front Neurol. 2017;8:435. Published 2017 Sep 6. doi:10.3389/fneur.2017.00435.
在我们开始之前,请声明这并不意味着要进行任何医学研究或测试。 有关实际实验的详细信息,请参阅原始论文 ,我不参与其中.Zham P,Kumar DK,Dabnichki P,Poosapadi Arjunan S,Raghav S. 使用速度和笔的综合指数来区分帕金森氏病的不同阶段-绘制螺旋线的压力 。 前神经元。 2017; 8:435。 2017年9月6日发布。doi:10.3389 / fneur.2017.00435。
One thing that is important to decide when doing a project like this is, what are the goals? Here, there are two:1. Obtain an understanding of the observable differences between sketch types (healthy or Parkinson’s) for wave and spiral images.2. Create a classifier with high accuracy to determine if a patient is likely to have Parkinson’s or not.As I mentioned in my previous post, one limitation with going straight to a neural network model is that it is a black-box classifier for the most part. We lose the understanding of what exactly are the fundamental differences in the drawings caused by symptoms of Parkinson’s. This is where our first classifier is used, which actually targets both goals. We will start there.
决定像这样进行项目时重要的一件事是目标是什么? 这里有两个:1。 了解波形和螺旋图像的草图类型(健康或帕金森氏)之间可观察到的差异2。 创建一个高精度的分类器来确定患者是否可能患有帕金森氏病。正如我在上一篇文章中提到的那样,直接进入神经网络模型的一个局限性在于它在大多数情况下都是黑盒分类器。 我们对帕金森氏症的症状所导致的图纸根本差异到底是什么一无所知。 这是我们使用第一个分类器的地方,它实际上既针对两个目标。 我们将从这里开始。
手工构建功能的随机森林分类器 (Random Forest Classifier from hand-built features)
The title of this section says it all. I use a Random Forest classifier on the obtained features in my last post. Explicitly, the features we will be using, which are calculated for each image and stored in a pandas dataframe are:- ‘mean_thickness’; ‘std_thickness’; ‘number_pixels’; ‘number_edgepoints’; ‘number_intersections’.A trick I have found is also incorporating some non-linearity by creating interaction features, where each feature is multiplied by each other feature. For example: mean_thickness * number_intersections. This increases the number of features to be 15.We also have to remember to standardize each feature. This places the mean at 0. Essentially we subtract the mean and divide by the standard deviation for each feature column. Finally, we have to create a one-hot-coding for our target class. This is only a binary classification, so this will suffice. This will transform our ‘healthy’ or ‘parkinson’ label to a 0, or 1.We also make sure to separate our training data from our test data.That’s it! We can pass our feature columns into a Random Forest classifier. It is important to note here as well, that we need separate models for the wave and spiral datasets. An obvious, but important note. Here are the results for wave:
本节的吨他标题说明了一切。 我在上一篇文章中对获得的功能使用了随机森林分类器。 明确地说,我们将使用的功能(针对每个图像计算并存储在pandas数据框中)为:-“ mean_thickness”; 'std_thickness'; 'number_pixels'; 'number_edgepoints'; 我发现的一个技巧还在于通过创建交互特征来合并一些非线性,其中每个特征彼此相乘。 例如:mean_thickness * number_intersections。 这将功能数量增加到15。我们还必须记住要标准化每个功能。 这将平均值置于0。本质上,我们减去了平均值,然后除以每个要素列的标准差。 最后,我们必须为目标类创建一个一次性编码。 这只是二进制分类,因此就足够了。 这会将我们的``健康''或``帕金森''标签转换为0或1.我们还确保将训练数据与测试数据分开。 我们可以将特征列传递到随机森林分类器中。 同样重要的是,我们需要为波浪和螺旋数据集建立单独的模型。 一个明显但重要的提示。 这是wave的结果:
Wave accuracy on test: 80%
测试波准确度:80%
+--------------+-----------+----------+-------------+---------+
| Wave | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.74 | 0.93 | 0.82 | 15 |
| parkinson | 0.91 | 0.67 | 0.77 | 15 |
| | | | | |
| accuracy | | | 0.80 | 30 |
| macro avg | 0.82 | 0.80 | 0.80 | 30 |
| weighted avg | 0.82 | 0.80 | 0.80 | 30 |
+--------------+-----------+----------+-------------+---------+
https://ozh.github.io/ascii-tables/
https://ozh.github.io/ascii-tables/
There are a number of nice advantages that the Random Forest classifier provide, the main one I am using here is the ability to give a list of the most impactful features. Here they are in order of appearance:
随机森林分类器具有许多不错的优点,我在这里使用的主要优点是能够列出最有影响力的功能。 它们按出现顺序排列:
Feature impact (in order of importance):
['std_thickness_num_ep' 'std_thickness' 'num_ep_num_inters'
'num_pixels_num_ep' 'num_pixels_num_inters' 'mean_thickness_num_inters'
'mean_thickness_num_ep' 'std_thickness_num_inters'
'std_thickness_num_pixels' 'mean_thickness' 'num_ep' 'num_inters'
'num_pixels' 'mean_thickness_std_thickness' 'mean_thickness_num_pixels']
And for spiral:
对于螺旋:
Spiral accuracy on test: 67%
测试时的螺旋精度:67%
+--------------+-----------+----------+-------------+---------+
| Spiral | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.63 | 0.80 | 0.71 | 15 |
| parkinson | 0.73 | 0.53 | 0.62 | 15 |
| | | | | |
| accuracy | | | 0.67 | 30 |
| macro avg | 0.68 | 0.67 | 0.66 | 30 |
| weighted avg | 0.68 | 0.67 | 0.66 | 30 |
+--------------+-----------+----------+-------------+---------+Feature impact (in order of importance): ['num_pixels_num_ep' 'num_pixels' 'std_thickness_num_ep'
'num_pixels_num_inters' 'std_thickness_num_pixels'
'mean_thickness_num_ep' 'mean_thickness' 'std_thickness_num_inters'
'std_thickness' 'mean_thickness_std_thickness'
'mean_thickness_num_pixels' 'mean_thickness_num_inters'
'num_ep_num_inters' 'num_ep' 'num_inters']
In my last post, I asked, ‘based on the results we had seen, which would we guess would be the most important?’. One advantage of machine learning is that we can assign these non-linearity interactions that would potentially be difficult for us to predict. Also, we can see the impact is different for each type of drawing. The accuracy is not bad on the wave set, at 80% but quite low on the spiral set at 67% — still better than chance (50%). Clearly, there could be some fine tuning of the features then for the spiral set to achieve a better accuracy. However, using Random Forest we can now observe how important certain parts of the drawing are. Certainly the waviness identified through end-points and number of pixels is prominent in both wave and spiral models.
在我的上一篇文章中,我问:“根据我们所看到的结果,我们认为哪一个是最重要的?”。 机器学习的优势之一是我们可以分配这些非线性交互,而这些交互对于我们来说可能很难预测。 此外,我们可以看到每种图形的影响是不同的。 波形集的准确性还不错,为80%,但螺旋形集的准确性却很低,为67%,仍然比偶然性(50%)要好。 显然,可能需要对特征进行一些微调,然后才能使螺旋集达到更高的精度。 但是,使用随机森林,我们现在可以观察图形中某些部分的重要性。 当然,在波形模型和螺旋模型中,通过端点和像素数确定的波纹度都很明显。
使用ResNet50提取特征并使用Logistic回归进行分类 (Feature extraction with ResNet50 and classification with Logistic Regression)
In this next section we will pass our images first through the ResNet50 model trained on ImageNet, but leaving the top of the layer off. We can do this easily using tf.keras:
在下一节我们将首先通过我们的图片通过培训了ImageNet的ResNet50模型,但在离开掉层的顶部。 我们可以使用tf.keras轻松地做到这一点:
model = ResNet50(weights="imagenet", include_top = False)
When an image is passed through this, it will result in 2048 * 7 * 7 ‘features’ — numbers that classify the image. This is the number in the final layer of ResNet50. We can then pass this large vector through Logistic Regression to obtain a classification — a very quick and efficient way to obtain pretty decent accuracy, without every really training anything. One important note is that you have to first pass the images through some pre-processing to get into a suitable format. The size required for this model is 224 x 224. The tensor shape should be (N, 224, 224, 3), where N is the number of samples. There is a predefined prepossessing function to subtract the mean with imagenet_utils.preprocess_input. To give credit, I learned this technique from Adrian Rosebrock in the Practitioners Deep Learning for Computer Vision book.
通过此图像时,将产生2048 * 7 * 7个“功能” —对图像进行分类的数字。 这是ResNet50的最后一层中的数字。 然后,我们可以将这个大向量传递给Logistic回归以获得分类,这是一种非常快速有效的方法,可以得到相当不错的准确性,而无需进行任何真正的训练。 一个重要的注意事项是,您必须先将图像进行某种预处理,然后才能转换为合适的格式。 该模型所需的大小为224 x224。张量形状应为(N,224,224,3),其中N是样本数。 有一个预定义的预设函数,可以用imagenet_utils.preprocess_input减去平均值。 值得赞扬的是,我从《计算机视觉的从业者深度学习》一书的阿德里安·罗斯布鲁克(Adrian Rosebrock)那里学到了这项技术。
bs = 16
# loop in batches
for i in np.arange(0, len(imagePaths), bs):
batchPaths = imagePaths[i:i + bs]
batchLabels = labels[i:i + bs]
batchImages = []
# preprocess each image
for j, imagePath in enumerate(batchPaths):
image = load_img(imagePath, target_size=(224, 224), interpolation='bilinear')
image = img_to_array(image)
# expand dims and subtract mean RGB
image = np.expand_dims(image, axis=0)
image = imagenet_utils.preprocess_input(image)
if j==0 and i==0:
fig, axs = plt.subplots()
plt.imshow(image.reshape((224,224,3)).astype(np.uint8),clim=(0,255), interpolation=None)
plt.show()
batchImages.append(image)
batchImages = np.vstack(batchImages)
# extract features
features = model.predict(batchImages, batch_size=bs)
features = features.reshape((features.shape[0], 100352))
Once we have our features, and associated labels, stored in a dataframe, we can pass these to Logistic Regression. Essentially, the idea here, is that in this very high dimensional representation of the images, there should be a clear division. And well, I think that is correct given the accuracy:
将我们的功能和相关的标签存储在数据框中后,就可以将它们传递给Logistic回归。 本质上,这里的想法是,在图像的这种高维表示中,应该有明确的划分。 好吧,鉴于准确性,我认为这是正确的:
Wave accuracy on test: 87%
测试中的波准确度:87%
+--------------+-----------+----------+-------------+---------+
| Wave | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.92 | 0.80 | 0.86 | 15 |
| parkinson | 0.82 | 0.93 | 0.87 | 15 |
| | | | | |
| accuracy | | | 0.87 | 30 |
| macro avg | 0.87 | 0.87 | 0.87 | 30 |
| weighted avg | 0.87 | 0.87 | 0.87 | 30 |
+--------------+-----------+----------+-------------+---------+
Spiral accuracy on test: 83%
测试时的螺旋精度:83%
+--------------+-----------+----------+-------------+---------+
| Wave | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.81 | 0.87 | 0.84 | 15 |
| parkinson | 0.86 | 0.80 | 0.83 | 15 |
| | | | | |
| accuracy | | | 0.83 | 30 |
| macro avg | 0.83 | 0.83 | 0.83 | 30 |
| weighted avg | 0.83 | 0.83 | 0.83 | 30 |
+--------------+-----------+----------+-------------+---------+
Definite improvement over our manually created model of just 15 features, also this one required the least amount of effort — but we also don’t obtain much information other than the classification.
相对于我们手动创建的仅具有15个功能的模型进行了绝对的改进,这也需要最少的工作量-但是除了分类之外,我们也没有获得太多其他信息。
自定义卷积神经网络 (Custom Convolution Neural Network)
In our last method we build a custom model and train from scratch. We have a very few number of training examples — just 72, so we use augmentation. During training I created a validation set which was 20% of the training set, and predicted on the untouched 30 images I have been comparing to so far. Below is a summary of the Spiral model. I downsample the images by a factor of 2 and also convert to greyscale. The Wave model is the exact same but the image size is different. I also apply a preprocessing function for contrast stretching and normalization. For details on the augmentation, callbacks etc. please refer to my code instead as this is beyond the scope of this post. Optimization was done using Adam.
在我们的最后一种方法中,我们建立一个自定义模型并从头开始训练。 我们有很少的训练示例-只有72个,因此我们使用增强。 在训练过程中,我创建了一个验证集,该验证集是训练集的20%,并根据到目前为止我一直在比较的30张不变的图像进行了预测。 以下是螺旋模型的摘要。 我将图像降采样2倍,然后转换为灰度。 Wave模型完全相同,但是图像大小不同。 我还应用了预处理功能来进行对比度拉伸和归一化。 有关扩充,回调等的详细信息,请参阅我的代码,因为这超出了本文的范围。 优化是使用Adam完成的。
Model: "SpiralNet"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 128, 128, 32) 320
_________________________________________________________________
activation (Activation) (None, 128, 128, 32) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 128, 128, 32) 9248
_________________________________________________________________
activation_1 (Activation) (None, 128, 128, 32) 0
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 64, 64, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 64, 64, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 64, 64, 64) 18496
_________________________________________________________________
activation_2 (Activation) (None, 64, 64, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 64, 64, 64) 36928
_________________________________________________________________
activation_3 (Activation) (None, 64, 64, 64) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 32, 32, 64) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 32, 32, 128) 73856
_________________________________________________________________
activation_4 (Activation) (None, 32, 32, 128) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 128) 147584
_________________________________________________________________
activation_5 (Activation) (None, 32, 32, 128) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 128) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 16, 16, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 32768) 0
_________________________________________________________________
dense (Dense) (None, 128) 4194432
_________________________________________________________________
activation_6 (Activation) (None, 128) 0
_________________________________________________________________
dropout_3 (Dropout) (None, 128) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 129
_________________________________________________________________
activation_7 (Activation) (None, 1) 0
=================================================================
Total params: 4,480,993
Trainable params: 4,480,993
Non-trainable params: 0
_________________________________________________________________
And here is the training board:
这是培训委员会:
Finally, the metrics:
最后,指标:
Wave accuracy on test: 87%
测试中的波准确度:87%
+--------------+-----------+----------+-------------+---------+
| WaveNet | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.92 | 0.80 | 0.86 | 15 |
| parkinson | 0.82 | 0.93 | 0.87 | 15 |
| | | | | |
| accuracy | | | 0.87 | 30 |
| macro avg | 0.87 | 0.87 | 0.87 | 30 |
| weighted avg | 0.87 | 0.87 | 0.87 | 30 |
+--------------+-----------+----------+-------------+---------+
Spiral accuracy on test: 90%
测试时的螺旋精度:90%
+--------------+-----------+----------+-------------+---------+
| SpiralNet | precision | recall | f1-score | support |
+--------------+-----------+----------+-------------+---------+
| healthy | 0.93 | 0.87 | 0.90 | 15 |
| parkinson | 0.88 | 0.93 | 0.90 | 15 |
| | | | | |
| accuracy | | | 0.90 | 30 |
| macro avg | 0.90 | 0.90 | 0.90 | 30 |
| weighted avg | 0.90 | 0.90 | 0.90 | 30 |
+--------------+-----------+----------+-------------+---------+
Although we achieve the same accuracy on the wave set as with the ResNet50 feature extraction, we get much better improvement on the spiral set. We also only have 30 images in our training set, which is pretty small. With a higher number we could see closer to the validation accuracy for both which was around 95% in each for the best model.
尽管我们在波形集上获得了与ResNet50特征提取相同的精度,但在螺旋集上却获得了更好的改进。 我们的训练集中也只有30张图像,这非常小。 有了更大的数字,我们可以看到两者的验证准确性都更接近,对于最佳模型而言,两者的准确性均在95%左右。
摘要 (Summary)
Here is a final summary of the results on the same test set for the different approaches:
这是针对不同方法的同一测试集上的结果的最终摘要:
+------------------------+------+--------+
| Test Accuracy % | Wave | Spiral |
+------------------------+------+--------+
| Random Forest Manual | 80 | 67 |
| ResNet50 feat. Log Reg | 87 | 83 |
| Custom Nets | 87 | 90 |
+------------------------+------+--------+
Looking at the images I am quite happy with the performance, some images I would not be able to tell the difference… Some more feature engineering could bump the Random Forest manual model but I will leave that to you.
查看这些图像,我对性能感到非常满意,有些图像我将无法分辨。…更多的功能工程可能会破坏Random Forest手动模型,但我会留给您。
If you felt that any part of this post provided some useful information or just a bit of inspiration please follow me for more.
如果您认为这篇文章的任何部分提供了一些有用的信息或只是一些启发,请关注我以获取更多信息。
You can find the source code on my Github.
您可以在我的Github上找到源代码。
Link to my other post:
链接到我的其他帖子:
Classifying Parkinson’s disease through image analysis: Part 1 — Pre-processing and exploratory data analysis
通过图像分析对帕金森氏病进行分类:第1部分 -预处理和探索性数据分析
翻译自: https://towardsdatascience.com/classifying-parkinsons-disease-through-image-analysis-part-2-ddbbf05aac21
帕金森定律
相关文章:
这篇关于帕金森定律_通过图像分析将帕金森病分类2的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!