通过Active Learning(AL)算法,找到最小的需要标注的数据进行训练,来标记未标记的数据。 AL必须满需下边的需求才能作为crowd-sourced database的默认的最优策略: Generality:算法必须能够应用到任意的分类和标记任务。因为crowd-sourced systems应用广泛。Black-box treatment of the classife
论文Scaling Up Crowd-Sourcing to Very Large Datasets A Case for Active Learning对bootstrap做了介绍。 原书(B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993.)
论文Scaling Up Crowd-Sourcing to Very Large Datasets A Case for Active Learning提出两种AL算法。 首先找到分类器θ对未标注数据的不确定程度。然后让crowd对这些数据进行标定。下边介绍两种不确定性方法。 下边的u是未标记数据,但是是指未标注数据的每一个,而不是整体。 一:Uncertainty Algorithm
Active Learning Notation 本文是介绍论文Scaling Up Crowd-Sourcing to Very Large Datasets A Case for Active Learning中的AL算法。 Active learning algorithm主要由:1.一个ranker R; 2. selection strategy S;3. budget allo