[论文精读]BrainNetCNN: Convolutional neural networks for brain networks； towards predicting neuro-

本文主要是介绍[论文精读]BrainNetCNN: Convolutional neural networks for brain networks； towards predicting neuro-，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

论文原文：BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment - ScienceDirect

论文全名：BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment

论文代码：www.brainnetcnn.cs.sfu.ca（原文提供但是挂了t子也打不开）

自找代码：GitHub - FurtherAdu/BrainNetCNN_Personality：使用BrainNetCNN深度学习模型的人类连接组项目fMRI数据的个性预测（Kawahara等人，2016）。与慈善医科大学心智与脑系的Johann Kruschwitz博士合作的项目（未验证）

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用！

1. 省流版

1.1. 心得

1.2. 论文框架图

2. 原文逐段阅读

2.1. Abstract

2.2. Introduction

2.2.1. Related works

2.3. Method

2.3.1. CNN layers for network data

2.3.2. Preterm data

2.3.3. BrainNetCNN architecture

2.3.4. Implementation

2.3.5. Evaluation metrics

2.4. Experiments

2.4.1. Simulating injury connectomes for phantom experiments

2.4.2. Infant age and neurodevelopmental outcome prediction

2.5. Discussion

2.6. Conclusions

3. 知识补充

3.1. Leaky rectified linear units

3.2. Distance

3.3. Synthetic minority over-sampling technique (SMOTE)

4. Reference List

1. 省流版

1.1. 心得

（1）这篇论文的数学部分非常神奇啊，很多东西没写。它自己也提到“为简单起见，我们省略了激活函数和标准偏置项”

（2）用的DTI，不是我要分析的fMRI

（3）比的不是accuracy和AUC，而是什么r,p,MAE,SDAE（因为这个比的不是分类而是预测结果和真实的差异

1.2. 论文框架图

2. 原文逐段阅读

2.1. Abstract

（1）They propose a BrainNet convolutional neural network (CNN) framework including edge-to-edge, edge-to-node and node-to-graph convolutional filters which presents the topological locality of brain.

（2）This model analyzes the diffusion tensor images (DTI) of newborns' brains to determine their brain development

preterm n./adj.早产儿;早产;早产的 gestational adj.妊娠期的 menstrual adj.月经的

2.2. Introduction

（1）There is a upward trend of premature birth, which may cause brain injuries, abnormalities even death

（2）This model can predict the actual age of premature infants through diffusion tensor imaging, with an error of only two weeks. Furthermore, by intervention in the early stage, defects in premature infants are more likely to be treated successfully.

（3）⭐BrainNetCNN is the first DL designed for brain.（应该是很里程碑的了吧，2017？之后看看有无更早）

mortality n.死亡率;死亡;死亡数量;生命的有限

2.2.1. Related works

（1）Compared with Ziv et al (2013), who adopting SVM in DTIs of 6-month-old babys, they focus on premature infants at 18 months of age.

（2）Artificial Neural Networks (ANNs) have been used in brain lesions in multiple sclerosis segmentation, brain tumors in multimodal MRI volumes segmentation, cerebellar ataxia classification, AD prediction. However, all these are trained over standard grid-like MR images instead of brain structure-like.

sclerosis n.硬化;硬化症 ataxia n.共济失调，运动失调(表现为动作不稳、不协调)

cerebellar adj.小脑的

2.3. Method

2.3.1. CNN layers for network data

①DTI, which represents the white-matter connections of patients' brain, brings a graph $G= \left ( A,\Omega \right )$ structure. $A$ denotes weighted adjacency matrix of edges, $\Omega$ denotes the set of nodes. Also, $A\in \mathbb{R}^{\left | \Omega \right |\times \left | \Omega \right |}$ . Each brain region in this paper is defined as a node.

②The previous work, putting all the features in one vector, obviously ignores the topological structure of the brain graph.

③The other way is regarding the 2D matrix as an image

④Authors still say that people can only get topological information in one special row or column. It is significant hard to obtain a global topological relationship.

⑤For simplicity, they ignore the activation function and the standard bias in the following equation

（1）Edge-to-edge Layers

①Further define the $G^{l,m}=\left ( A^{l,m};\Omega \right )$ , where $m$ represents the $m$ -th feature map, $l$ represents the $l$ -th layer.

②There are $M^{l}$ feature maps in one (the $l$ -th) layer.

③The filter is:

$A_{i,j}^{\ell+1,n}=\sum_{m=1}^{M^{^\ell}}\sum_{k=1}^{|\Omega|}r_k^{\ell,m,n}A_{i,k}^{\ell,m}+c_k^{\ell,m,n}A_{k,j}^{\ell,m}$

where $r,c\in \mathbb{R}^{1\times \left | \Omega \right | }$ , then combine them as:

$[\mathbf{c}^{\ell,m,n},\mathbf{r}^{\ell,m,n}]=[w_1^{^{\ell,m,n}},\ldots,w_{2|\Omega|}^{^{\ell,m,n}}]=\mathbf{w}^{\ell,m,n}\in\mathbb{R}^{2|\Omega|}$

④The set of all weight vectors is:

$\{\mathbf{w}^{\ell,m,n}|m\in\{1,2,\ldots,M^{\ell}\}\}\sim [\mathbf{w}^{\ell,1,n},\ldots,\mathbf{w}^{\ell,M^\ell,n}]\in\mathbb{R}^{2|\Omega|\times M^\ell}$

which concludes learnable weights at layer $l$ .

⑤The edge feature is come from all the adjacency edges with different weights, the filter visualized as:

⑥Directed graph matrix might be asymmetric, but undirected graph will not be influenced. Moreover, only upper/lower triangular is enough for undirected graph.

⑦There is no need for any padding in brain matrix cuz "the brain network has no topological boundaries"（虽然能get到图的卷积是一直整体结构不变的，和2D平面图不一样，但是什么叫没有拓扑边界啊。。意思是无向图任何一个点都可以是最后一个点吗）

⑧Authors mention line graph is able to present the characters of graph, 这里用中文说，我只是揣测他们的意思。折线图，其中每个点代表一个node，并且加上了那个节点所有的边关系。其中每条边是与它相邻的边的加权和。但是剩下的都不是很懂惹，应该是排除了这种idea

（2）Edge-to-node layer

①The E2N layer is:

$a_i^{\ell+1,n}=\sum_{m=1}^{M^\ell}\sum_{k=1}^{|\Omega|}r_k^{\ell,m,n}A_{i,k}^{\ell,m}+c_k^{\ell,m,n}A_{k,i}^{\ell,m}$

②The dimension of $a$ is |Ω|×1, which presents $a$ is a vector. On the other hand, the dimension of $A$ is |Ω|×|Ω|. This is because the difference in filters. Obviously, $a$ is a vector which sums up (with weights) all the rows in $A$ .

③Especially, when the input is a symmetric matrix, ⭐there is no need for two weights. Delete one of $r$ or $c$ is reasonable.

（3）Node-to-graph layer

①The final output $a$ , which represents a whole graph, is a single scalar/number:

$a^{\ell+1,n}=\sum_{m=1}^{M^{^\ell}}\sum_{i=1}^{|\Omega|}w_i^{\ell,m,n}a_i^{\ell,m}$

（4）Thus, every layer brings a dimension reduction process. For a single feature map, the model shows below:

2.3.2. Preterm data

（1）Subjects: prematures of 24-32 weeks postmenstrual age (PMA)

（2）Collected by: BC Children's Hospital in Vancouver, Canada.

（3）Screen: exclude images with severe artifacts and directional deviations in DTI

（4）Data: adopting 115 images, about half of them scan twice

（5）Atlas: 90, divided by University of North Carolina (UNC) School of Medicine at Chapel Hill

（6）Adjacency matrix: Scale each element to $\left [ 0,1 \right ]$

（7）Evaluation: At babies' true age of 18 months, evaluate cognitive and neuromotor function by Bayley Scales of Infant and Toddler Development (Bayley-III)

（8）Lack of dataset: small and imbalanced

（9）Compensation methods: synthetic minority over-sampling technique (SMOTE), expands original data to 256 times（可以之后学学）. Besides, authors think LSI is not suitable here.

tractography n.纤维束成像;白质束成像;纤维跟踪技术;纤维追踪技术;神经径路追踪

artefact n.人工制品;工艺品;手工艺品(尤指有历史或文化价值的) ，伪影（额额，等于artifact）

2.3.3. BrainNetCNN architecture

（1）Final output: 2 nodes (one for motor and one for cognitive)

（2）For every input, E2E keeps the same size. Therefore, stacking E2E layer is feasible by incresing the learnable parameters

（3）Ablation study:

①Standard: E2Enet

②Remove E2E layer: E2Nnet

③Remove 2 fully connection layers in E2Enet: E2Enet-sml

④Remove 2 fully connection layers in E2Nnet: E2Nnet-sml

⑤Add additional E2E layer in E2Enet-sml: 2E2Enet-sml

⑥Only fully connection: FC30net and FC90net (flatten the upper/lower triangular matrix to one vector)

（4）They add the number of feature map to compensate for $i$ and $j$ dimensions

（5）Activation function: leaky ReLU

$Leaky\, ReLU(x)=\left\{\begin{matrix} x\, \, \,\; \; \; \; \; if\, x> 0\\ \frac{x}{3}\, \, \, \; \; \; \; \; if\, x\leq 0 \end{matrix}\right.$

where $\frac{1}{3}$ is a quite large number for leaky ReLU. This is why authors claim it as "very leaky ReLU"

（6）Dropout rate: 0.5 after N2G and FC layer（但是图中感觉每一个卷积层都在dropout啊怎么回事）

（7）Momentum: 0.9

（8）Batch size: 14 (mini)

（9）Weight decay: 0.0005

（10）Learning rate: 0.01

（11）Loss: "Euclidean distance between the predicted and real outcomes plus a weighted L2 regularization term over the network parameters"（我揣测按上文来说输出值是两个，是（x,y）这样的东西吗？还是说motor和cognitive分开算呢？）

（12）Iteration: 10K-100K (the increment is 10K)

（13）Selection: the least overfitting

2.3.4. Implementation

（1）Fliters

①E2E: two 1D filters （虽然是两个1D没错但是感觉不是空间可分离，因为就像上面公式提的以及现在作者说的，应该是融合成的一步）

②E2N and N2G: one 1D filter

2.3.5. Evaluation metrics

（1）They calculate the Pearson correlation coefficient, p-value, mean absolute error (MAE) and the standard deviation of absolute error (SDAE) between predicted value and the real value. These metrics are complementary.

2.4. Experiments

2.4.1. Simulating injury connectomes for phantom experiments

①好像这里过于医学了我没有很get到欸，因此这一段的数学分析我没有写

②Simulating image

where the left one is the averaged connectome, the middle one is focal injury phantom, and the right two images are after introducing noise and the two signatures

③the $i$ -th synthetic connectome, $X_{i}$

$X_i=\frac{X_\mu}{\left(\boldsymbol{1+\alpha}_iS_i^1\right)\left(\boldsymbol{1+\beta}_i^2\right)}+\gamma N_i$

perturb vt.使不安;使焦虑 focal adj.焦点的;中心的;很重要的;有焦点的

emanate vt.产生;散发;显示;表现

（1）Predicting injury parameters over varying noise

①Training set: 1000

②Test set: 1000

③Prediction chart under different noises:

where when noise decreases, Pearson correlation $r$ increases, MAE and SDAE decrease.

（2）Predicting focal injury parameters with different models

①Models participating in prediction: FC90net, E2Nnet-sml, E2Enet-sml

②Training set: 112

③Test set: 56

④Setting: fixed PSNR of 8 or 18 dB (图像质量评价指标之 PSNR 和 SSIM - 知乎 (zhihu.com)

⑤Prediction chart of different models:

（3）Predicting diffuse injury parameters with different models

①Focal injury pattern matrices in 2.4.1 are replaced by diffuse injury pattern matrices here:

where the left two are diffuse whole brain injury patterns and the right one is combine left two and noise

②Symmetric diffuse injury pattern function:

$D_{i,j}^{k}=\frac{1}{2}(\underset{i}{\operatorname*{\stackrel{k}{d}}}+\underset{j}{\operatorname*{\stackrel{k}{d}}})$

③Prediction chart of different models:

2.4.2. Infant age and neurodevelopmental outcome prediction

①Validation: 3-fold cross-validation

②5 different random initializations

（1）Model sensitivity to initialization and number of iterations

①For different model, the best iteration time is not the same. Thus, authors only consider the iteration with the best performance

②Correlation value is fairly insensitive to parameters

③100K times iteration almost gets the ceiling. After this, the function tends to stabilize

④Iteration charts for FC90net (left) and E2Enet-sml (right), PCC is between prediction and real scores:

where vertical line denotes the range of 5 different random initializations

（2）Age prediction

①Correlation of E2Enet-sml: 0.864

②Correlation of FC90net: 0.858

③Correlation of E2Nnet-sml: 0.843

④Conclusion: age prediction is more accurate for young infants

（3）Neurodevelopmental outcome prediction

①Compre their model with others:

②Hypothesis-testing: Kolmogorov–Smirnov test (非参数统计之分布检验[1]:Kolmogorov-Smirnov检验 - 知乎 (zhihu.com)

（4）Maps of predictive edges

①Provide interpretability with calculating mean partial derivatives of the outputs of an ANN:

red represents positive and blue represents negativepartial derivatives. Also, remove magnitudes < 0.001

②Draw a conclusion

2.5. Discussion

（1）E2Enet-sml and 2E2Enet-sml bring an achievement that models perform good without large fully connected layers.

（2）To stack E2E layer is helpful, which learns a more complex structure and optimizes less parameters

（3）Babies might be influenced by environments （因为脑子是生长一会儿再扫的啦

（4）Motor features are easier to analyse and identify, whereas cognitive features are more complex and comprehensive

（5）Combine connectome and clinical features intelligently might be a better way

（6）The main challenge of this model is lack of data

（7）They discovered an asymmetry in brain function

postnatal adj.产后的；出生后的 aetiology n.病原学;病因学

disparity n.差距;(尤指因不公正对待引起的)不同，不等，差异，悬殊

maternal adj.母亲的;母系的;作为母亲的;母亲方面的;母亲般慈爱的

2.6. Conclusions

A concnlusion.

3. 知识补充

3.1. Leaky rectified linear units

（1）额这个看起来这么别致但是实际上就是Leaky ReLU...

（2）一张整合图（来自：深度学习随笔——激活函数(Sigmoid、Tanh、ReLU、Leaky ReLU、PReLU、RReLU、ELU、SELU、Maxout、Softmax、Swish、Softplus) - 知乎 (zhihu.com)）

（3）优点：Leaky ReLU的话可以保留一点负数的特征

（4）缺点：近似线性函数，导致分类效果不是很好

3.2. Distance

（1）欧氏距离（Euclidean Distance）：这辈子最开始学的就是这个啦

（2）曼哈顿距离（Manhattaan Distance）：等同于城市街区距离(City Block distance)

二维平面的曼哈顿距离： $d\left ( i,j \right )=\left | x_{i}-x_{j} \right |+\left | y_{i}-y_{j} \right |$

n维空间的曼哈顿距离： $d_{i,j}=\sum_{k=1}^n\lvert x_{ik}-x_{jk}\rvert$

（3）剩下的参考：9个机器学习算法常见距离计算公式 - 知乎 (zhihu.com)

3.3. Synthetic minority over-sampling technique (SMOTE)

SMOTE过采样处理不均衡数据（imbalanced data）-CSDN博客

4. Reference List

Kawahara, J. et al. (2017) 'BrainNetCNN: Convolutional neural networks for brain networks; towards predicting neurodevelopment', Neuro Image, vol. 146, pp. 1038-1049. doi: Redirecting

这篇关于[论文精读]BrainNetCNN: Convolutional neural networks for brain networks； towards predicting neuro-的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！