本文主要是介绍数据分析中的统计误差,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
Type I error:“false positive”
the error of rejecting a null hypothesis when it is actually true.
(没有significant, 但它说有)
Type II error: “false negative”
the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature.
(有significant,但它说没有)
False Discovery Rate:
控制 type I errors 的
One-sample T-test:
- trying to find evidence of a significant difference between population means
Two-sample T-test:
- trying to find evidence of a significant difference between the population mean and a hypothesized value
The t-value measures the size of the difference relative to the variation in your sample data.
T is simply the calculated difference represented in units of standard error.
The greater the magnitude of T (it can be either positive or negative), the greater the evidence against the null hypothesis that there is no significant difference.
The closer T is to 0, the more likely there isn’t a significant difference.
但是因为在你的t-test中,你实际上只是用到了一次sample的值,而真正情况下应该要知道整个population的情况,然后重复用该 T-test 在population中取样本,所以最后的结果不一定是可靠的。可能在真实情况下,取下一个sample的T-test的值,就完全不一样了。那怎么去保证这个可靠性呢?这就要用到 T-distribution 了
那么 T-distribution 是什么呢?在我的理解里,这就要跟之前说的用Test的情况下的parametric和正态分布联系起来了。
所以一般来说,T-distribution就是一个正态分布的图。
Reference:
[1] http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics
这篇关于数据分析中的统计误差的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!