数据分析中的统计误差

本文主要是介绍数据分析中的统计误差，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

Type I error：“false positive”
the error of rejecting a null hypothesis when it is actually true.
（没有significant，但它说有）

Type II error： “false negative”
the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature.
（有significant，但它说没有）

False Discovery Rate：
控制 type I errors 的

One-sample T-test:

trying to find evidence of a significant difference between population means

Two-sample T-test:

trying to find evidence of a significant difference between the population mean and a hypothesized value

The t-value measures the size of the difference relative to the variation in your sample data.

T is simply the calculated difference represented in units of standard error.

The greater the magnitude of T (it can be either positive or negative), the greater the evidence against the null hypothesis that there is no significant difference.

The closer T is to 0, the more likely there isn’t a significant difference.

但是因为在你的t-test中，你实际上只是用到了一次sample的值，而真正情况下应该要知道整个population的情况，然后重复用该 T-test 在population中取样本，所以最后的结果不一定是可靠的。可能在真实情况下，取下一个sample的T-test的值，就完全不一样了。那怎么去保证这个可靠性呢？这就要用到 T-distribution 了

那么 T-distribution 是什么呢？在我的理解里，这就要跟之前说的用Test的情况下的parametric和正态分布联系起来了。

所以一般来说，T-distribution就是一个正态分布的图。

Reference：
[1] http://blog.minitab.com/blog/statistics-and-quality-data-analysis/what-are-t-values-and-p-values-in-statistics

这篇关于数据分析中的统计误差的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！