In this paper, we compare active learning strategies for indexing concepts
in video shots. Active learning is simulated using subsets of a fully
annotated dataset instead of actually calling for user intervention.
Training is done using the collaborative annotation of 39 concepts of
the TRECVID 2005 campaign. Performance is measured on the 20 concepts
selected for the TRECVID 2006 concept detection task. The simulation
allows exploring the effect of several parameters: the strategy, the
annotated fraction of the dataset, the number of iterations and the
relative difficulty of concepts.
Three strategies were compared. The first two respectively select the
most probable and the most uncertain samples. The third one is a random
choice. For easy concepts, the "most probable" strategy is the best one
when less than 15% of the dataset is annotated and the "most uncertain"
strategy is the best one when 15% or more of the dataset is annotated.
The "most probable" and "most uncertain" strategies are roughly
equivalent for moderately difficult and difficult concepts. In all
cases, the maximum performance is reached when 12 to 15% of the whole
dataset is annotated.