本文主要是介绍基于微阵列基因表达的基因提取选择偏差 --- .632+bootstrap,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
#引用
#LaTex
@article {Ambroise6562,
author = {Ambroise, Christophe and McLachlan, Geoffrey J.},
title = {Selection bias in gene extraction on the basis of microarray gene-expression data},
volume = {99},
number = {10},
pages = {6562–6566},
year = {2002},
doi = {10.1073/pnas.102102699},
publisher = {National Academy of Sciences},
abstract = {In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes. AE,apparent error rate;CV,cross-validated;RFE,recursive feature elimination;SVM,support vector machine},
issn = {0027-8424},
URL = {http://www.pnas.org/content/99/10/6562},
eprint = {http://www.pnas.org/content/99/10/6562.full.pdf},
journal = {Proceedings of the National Academy of Sciences}
}
#Normal
Ambroise, Christophe,
and Geoffrey J. McLachlan.
“Selection bias in gene extraction on the basis of microarray gene-expression data.”
Proceedings of the National Academy of Sciences
99.10 (2002): 6562-6566.
Web. 15 May. 2018.
#主要内容
对于微阵列基因样本,其样本数目较少,基因数目非常多
提出了一种更准确的测试评估方法。
偏倚选择 — 基于部分样本训练,基于部分样本测试
结果过于乐观
M M M折交叉验证 CV
##Bootstrap
R R R — 预测规则
R k ∗ R_k^* Rk∗ — bootstrap版本的 R R R
K K K — 大小为 n n n的bootstrap采样次数,有放回的
n n n — 原始样本大小
B 1 B1 B1 — 留一法bootstrap误差,对于bootstrap平滑处理的留一法交叉验证,只对未在bootstrap样本的点进行预测
基于 K K K次bootstrap采样的 B 1 B1 B1的Monte Carlo估计如下:
I j k I_{jk} Ijk —
- 1, x j x_j xj未在bootstrap样本中
- 0,反之
k k k — bootstrap采样当前次数
Q j k Q_{jk} Qjk —
- 1, R k ∗ R^*_k Rk∗将 x j x_j xj错误归类
- 0,反之
一般, B . 632 B.632 B.632计算如下:
B . 632 + B.632+ B.632+计算如下:
来自类 i i i的样本比例为 p i p_i pi, q i q_i qi为它们中被 R R R分到类 i i i的比例。
r r r需要被截断以使其在范围 [ 0 , 1 ] [0,1] [0,1]内
w w w的值从B.632( r = 0 r=0 r=0)到B1( r = 1 r=1 r=1)
*B.632+*估计赋予了bootstrap留一误差B1更多的权重,其中,B1-AE所度量的过拟合量相对较大,因此,在当前预测规则 R R R由于特征选择是过拟合的情况下,也是可用的。
这篇关于基于微阵列基因表达的基因提取选择偏差 --- .632+bootstrap的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!