Shot boundary detection was performed using a system based on image difference with motion compensation and direct dissolve detection. This system gives control of the silence to noise ratio over a wide range of values and for an equal value of noise and silence (or recall and precision), the F1 value is 0.805 for all types of transitions, 0.833 for cuts and 0.727 for gradual transitions.
High level feature detection was performed using networks of SVM classifiers arranged in a variety of architectures and taking into account a variety of low level descriptors combining text, local and global information as well as conceptual context. The inferred average precision of our first run is 0.088.
The search system uses a user controlled combination of five mechanisms: keywords, similarity to example images, semantic categories, similarity to already identified positive images, and temporal closeness to already identified positive images. The mean average precision of the system (with the most experienced user) is 0.184.