基于微阵列基因表达的基因提取选择偏差 --- .632+bootstrap

本文主要是介绍基于微阵列基因表达的基因提取选择偏差 --- .632+bootstrap，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

#引用

#LaTex

@article {Ambroise6562,
author = {Ambroise, Christophe and McLachlan, Geoffrey J.},
title = {Selection bias in gene extraction on the basis of microarray gene-expression data},
volume = {99},
number = {10},
pages = {6562–6566},
year = {2002},
doi = {10.1073/pnas.102102699},
publisher = {National Academy of Sciences},
abstract = {In the context of cancer diagnosis and treatment, we consider the problem of constructing an accurate prediction rule on the basis of a relatively small number of tumor tissue samples of known type containing the expression data on very many (possibly thousands) genes. Recently, results have been presented in the literature suggesting that it is possible to construct a prediction rule from only a few genes such that it has a negligible prediction error rate. However, in these results the test error or the leave-one-out cross-validated error is calculated without allowance for the selection bias. There is no allowance because the rule is either tested on tissue samples that were used in the first instance to select the genes being used in the rule or because the cross-validation of the rule is not external to the selection process; that is, gene selection is not performed in training the rule at each stage of the cross-validation process. We describe how in practice the selection bias can be assessed and corrected for by either performing a cross-validation or applying the bootstrap external to the selection process. We recommend using 10-fold rather than leave-one-out cross-validation, and concerning the bootstrap, we suggest using the so-called .632+ bootstrap error estimate designed to handle overfitted prediction rules. Using two published data sets, we demonstrate that when correction is made for the selection bias, the cross-validated error is no longer zero for a subset of only a few genes. AE,apparent error rate;CV,cross-validated;RFE,recursive feature elimination;SVM,support vector machine},
issn = {0027-8424},
URL = {http://www.pnas.org/content/99/10/6562},
eprint = {http://www.pnas.org/content/99/10/6562.full.pdf},
journal = {Proceedings of the National Academy of Sciences}
}

#Normal

Ambroise, Christophe,
and Geoffrey J. McLachlan.
“Selection bias in gene extraction on the basis of microarray gene-expression data.”
Proceedings of the National Academy of Sciences
99.10 (2002): 6562-6566.
Web. 15 May. 2018.

#主要内容

对于微阵列基因样本，其样本数目较少，基因数目非常多

提出了一种更准确的测试评估方法。

偏倚选择 — 基于部分样本训练，基于部分样本测试

结果过于乐观

$M$ 折交叉验证 CV

##Bootstrap

$R$ — 预测规则
$R_k^*$ — bootstrap版本的 $R$
$K$ — 大小为 $n$ 的bootstrap采样次数，有放回的
$n$ — 原始样本大小