In this paper, we compare active learning strategies for indexing concepts in video shots. Active learning is simulated using subsets of a fully annotated dataset instead of actually calling for user intervention. Training is done using the collaborative annotation of 39 concepts of the TRECVID 2005 campaign. Performance is measured on the 20 concepts selected for the TRECVID 2006 concept detection task. The simulation allows exploring the effect of several parameters: the strategy, the annotated fraction of the dataset, the size of the dataset, the number of iterations and the relative difficulty of concepts.
Three strategies were compared. The first two respectively select the most probable and the most uncertain samples. The third one is a random choice. For easy concepts, the "most probable" strategy is the best one when less than 15% of the dataset is annotated and the "most uncertain" strategy is the best one when 15% or more of the dataset is annotated. The "most probable" and "most uncertain" strategies are roughly equivalent for moderately difficult and difficult concepts. In all cases, the maximum performance is reached when 12 to 15% of the whole dataset is annotated. This result is however dependent upon the step size and the training set size. 1/40th of the training set size is a good value for the step size. The size of the subset of the training set that has to be annotated in order to reach the maximum achievable performance varies with the square root of the training set size. The "most probable" strategy is more "recall oriented" and the "most uncertain" strategy is more "precision oriented".