TRECVID 2012 Collaborative annotation

Authors: Georges Quénot (LIG), Franck Thollard (LIG), Bahjat Safadi (LIG) and Stéphane Ayache (LIF).
Last revision: 17-April-2012.

We (LIG, Laboratoire d'Informatique de Grenoble) and LIF (Laboratoire d'Informatique Fondamentale de Marseille) are organizing the collaborative annotation for the TRECVID [1] SIN 2012 task as it was organized in 2003 [2], 2005 [3], 2007, 2008, 2009, 2010 and 2011 [4].

We have done some works on active learning and its relation to the corpus annotation problem. Part of this work has been described in a paper that has been published in Signal Processing: Image Communication [5]. This work indicates that it is possible to annotate only a small fraction of carefully chosen samples (typically between 15 and 20%) of a training collection and the system still achieves the same performance (or even better) compared to if all the collection was annotated. This was confirmed on the TRECVID 2007 collaborative annotation [4] though the optimal annotation fraction was found to be between 35 and 50% (this may be due to the small size of TRECVID 2007 development collection).

1. Active learning

How does it work? A set of samples is available for training but it is not annotated yet (this is currently the case for the development set of TRECVID 2012 where the samples are keyframes/subshots). A system for concept detection is also available and can be trained using samples annotated as positive and negative for a concept to be detected. The annotation of the training set is partial and incremental.

The principle of using active learning for annotation is to use the system to select the samples that are potentially the most informative ones for the system training. Several strategies can be considered, the most popular one selects the most probable or the most uncertain samples. If several systems are available, it is possible to select samples from which the different systems disagree.

Since the system is used to produce the set of annotations with which it will then use to train itself, this can work only in an iterative way while there is a "cold start" problem. In the present case, the cold start will be done using previous year annotations (on TRECVID 2007 development collection) and judgements (on TRECVID 2007 test collection) when available or using LSCOM annotations (on TRECVID 2005 development collection) depending upon the target concept (feature). Once the process is started, at each iteration, the system becomes better and better at selecting good samples for annotation [4][5].

Compared to TRECVID 2007-2009 collaborative annotations using active learning, there is innovation for TRECVID 2010-2012 in the use of relations between annotated concepts. The principle is that if a shot is labeled as positive for Adult, it will automatically be labeled as positive for Person and if a concept is labeled negative for Person, it will be automatically labeled as negative for Adult, Male_Person, Female_Person, Teenagers, etc. Special care will be taken into account when choosing the annotation to be done and their order so that this effect makes each annotation done as efficient as possible. 500 concepts were selected for TRECVID SIN 2011 so that they cover as many previous TRECVID HLFs as possible and also comply as much as possible with the LSCOM ontology and that they are linked by a number of generic-specific type relations. Only 346 of them were sufficiently annotated and were actually used for the submissions. The same set will be used again in the TRECVID SIN 2012 task.

2. Application to TRECVID 2012 collaborative annotation

We propose to use this approach to perform a partial annotation of the TRECVID 2012 development set in the context of a collaborative annotation effort. That effort would be done using a system similar to the one used for TRECVID 2007 to 2010 with light effort since only a fraction of the training set will be annotated. Like in the previous collaborative annotations, the annotations will be available to all teams that participated to the annotation before the TRECVID 2012 workshop and to everybody after.

We provide a web interface for the collaborative annotation. The active learning process will be transparent for the annotators, they will simply encounter more positive samples than in a full or in a random annotation (at least in the beginning). In practice, the concept detection system will be re-trained continuously with the latest available annotations and the next samples to be annotated will be re-evaluated and re-sorted each time the system has been re-trained.

We plan to set up the annotation on March 31st. We will take advantage of ASR/MT. The annotation would be done during 2 to 4 weeks and annotation on sets of increasing sizes could be delivered periodically meanwhile.

We produced the master shot segmentation using the LIG shot boundary detection system [6] and extracted one keyframe per shot. There are 400,289 shots/keyframes in the TRECVID 2012 development set. Each participating team is invited to produce 30,000 annotations or more. At an average of 2 seconds per annotation (the exact value varies: in some cases, one may see almost at once that there is no positive in 25 images; in other cases, one may have to play the subshot to check its content quite often), this corresponds to about 17 hours of full time work.

We recommend to spread your annotation effort in sessions of from 30 to 60 minutes and once or twice a day. Annotations can be done until the 20th of May though we do advise to do it earlier or as soon as possible.

3. Participants

Teams that already participated to the TRECVID 2011 collaborative annotation and that are registered for the 2012 SIN task have by default been registered for the TRECVID 2012 collaboartive annotation with the same group name and contact person. This is for simplifying the registration process and no actual participation to the collaborative annotation is require though, indeed, it would be much appreciated. If you are a new team and interested in participating this collaborative annotation, please send an email to Franck.Thollard@imag.fr with copy to Georges.Quenot@imag.fr indicating your team and the contact person.

Please note that participating in the annotations and/or getting them is only open to the registered TRECVID participants who have signed the Sound and Vison licence agreement (i.e. listed in the "tv12.who.what" file). Access to the annotation system is password protected. Registered participants can access the annotation system at http://mrim.imag.fr/tvca2012/al.html.

40 teams are currently registered for the collaborative annotation:

Team Contact person
Laboratoire d'informatique de Grenoble (LIG) Bahjat Safadi
Laboratoire d'informatique fondamentale de Marseille (LIF) Stéphane Ayache
JOANNEUM RESEARCH Forschungsgesellschaft mbH & Vienna University of Technology Werner Bailer
France Telecom Orange Labs, Beijing Kun Tao
GDR ISIS IRIM group Franck Thollard
Tokyo Institute of Technology, Canon Corporation Kiochi Shinoda
Intelligent Multimedia Group in Tsinghua University, Fujitsu Research & Development Center Co., Ltd. and Fujitsu Laboratories Ltd Jianmin Li
Quaero consortium Georges Quénot
National Institute of Informatics, Japan Duy-Dinh Le
The University of Electro-Communications, Japan Keiji Yanai
Beijing Jiaotong University, China Jie Geng
East China Normal University Feng Wang
NHK (Japan Broadcasting Corp.) Science and Technical Research Laboratories Yoshihiko Kawai
University of Kaiserlautern Wan-Lei Zhao
Informatics and Telematics Institute, Greece Vasileios Mezaris
Universitaet Klagenfurt Klaus Schöffmann
Carnegie Mellon University Lei Bao
Florida International University Fausto Fleites
VIREO at City Universtiy of Hong Kong Shiai Zhu
Dublin City University Frank Hopfgartner
Brno University of Technology Michal Hradis
Beijing University of Posts and Telecommunications - MCPRL Zhicheng Zhao
University of Amsterdam Cees Snoek
NTT Cyber Space Laboratories and Dalian University of Technology Haojie Li
École Nationale d'Ingénieurs de Sfax (ENIS) Anis Benammar
Nikon Corporation Takeshi Matsuo
Shanghai Jiaotong Univrsity-IS Jiang Chengming
Sichuan University of China Xiao-Yong Wei
Aalto University School of Science and Technology Mats Sjöberg
Northwest Polytechnical University Jun Wu
Columbia University Dong Liu
Liris-Imagine Charles-Edmond Bichot
Marburg Markus Muhling
Nist Paul Over
Fudan University, China Hong Lu
IBM Watson Research Center Lexing Xie
Information and Communicatin Engineering, Xi'an Jiaotong University Zhe Wang
INRIA Willow Rachid Benmokhtar
Institute of Image Comm. and Inf. Proc., Shanghai Jiao Tong University Xiaokang Yang
Kobe University Kimiaki Shirahama
Peking University Yuan Feng
TÜBİTAK Space Technologies Research Institute Ahmet Saracoglu
University of Marburg Markus Mühling
National Cheng Kung University Chien-Li Chou
Fuzhou University Jianjun Huang
Stanford University André Filgueiras de Araujo

4. Annotations download

The final version of the 2007 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2007/ann.tgz.
The final version of the 2008 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2008/ann.tgz.
The final version of the 2009 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2009/ann.tgz.
The final version of the 2010 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2010/ann-orig.tgz (original version).
The final version of the 2010 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2010/ann.tgz (updated version).
The final version of the 2011 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2011/ann-orig.tgz original version).
The final version of the 2011 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2011/ann.tgz (updated version).

The latest version of the 2012 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2012/ann.tgz (currently containing only the 2012 new annotations).

The difference between the original and updated versions is that the annotations corresponding to dropped (withdrawn) videos have been removed.

In 2012, it will not be required to do any minimum of annotation for accessing the annotation. During the annotation phase, you can get the latest version of full annotation at any time. You can update it as frequently as you wish as the collaborative annotation progresses.

Annotations are given in the order in which they were selected by the active learning process (those predicted as most useful first).

The annotations are delivered as a gzip compressed unix tar archive (".tgz"). The format is the same as in 2005, 2007, 2008, 2009, 2010 and 2011:

 Each line represents a judgment. For a given feature and shot there will likely be
 more than one judgment. It is up to you how you use these multiple annotations.

 Each line contains the following information:

  toolName annotationSite featureName movieName keyFrameName judgment(Skip/Positive/Negative)

The shotname can be derived from the keyframename by removing the "_RKF" or "_NRKF_#" part. There may be several keyframes/subshots per shot for 2007, 2008 and 2009.

There is currently only one judgment per keyframe and possibly several judgments per shot.

5. Milestones (2010)

17 April: Annotation system available.
20 May: Annotation system closed.
21 May: Final version of the annotation available.

6. Status on April 17, 2012

About 4.1M direct annotations from 2010 and 2011. About 18M annotations after propagation using relations (e.g. Cat implies Animal).

7. Acknowledgments

This work is supported by the Quaero programme.

8. Contacts

Georges.Quenot@imag.fr
Bahjat.Safadi@imag.fr
Franck.Thollard@imag.fr
Stephane.Ayache@univmed.fr

References

[1] Smeaton, A. F., Over, P., and Kraaij, W. 2004. TRECVID: evaluating
    the effectiveness of information retrieval tasks on digital video.
    In Proceedings of the 12th Annual ACM international Conference on
    Multimedia (New York, NY, USA, October 10 - 16, 2004). MULTIMEDIA '04.
    ACM Press, New York, NY, 652-655.
    DOI= http://doi.acm.org/10.1145/1027527.1027678
[2] C.-Y. Lin, B. L. Tseng and J. R. Smith, "Video Collaborative Annotation
    Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets,"
    NIST TREC-2003 Video Retrieval Evaluation Conference, Gaithersburg, MD,
    November 2003.
    URL: http://www-nlpir.nist.gov/projects/tvpubs/papers/ibm.final.paper.pdf
[3] Timo Volkmer, John R. Smith, Apostol (Paul) Natsev, Murray Campbell, 
    Milind Naphade, "A web-based system for collaborative annotation of large 
    image and video collections", In Proceedings of the 13th ACM international 
    Conference on Multimedia, Singapore, 6-11 November, 2005 
[4] Stéphane Ayache and Georges Quénot, "Video Corpus Annotation using
    Active Learning", 30th European Conference on Information Retrieval
    (ECIR'08), Glasgow, Scotland, 30th March - 3rd April, 2008
    URL: http://mrim.imag.fr/georges.quenot/articles/ecir08.pdf
[5] Stéphane Ayache and Georges Quénot, "Evaluation of Active Learning
    Strategies for Video Indexing", Signal Processing: Image Communication,
    Vol 22/7-8 pp 692-704, August-September 2007.
    DOI: http://dx.doi.org/10.1016/j.image.2007.05.010
[6] Stéphane Ayache, Georges Quénot, and Jéröme Gensel,
    "CLIPS-LSR Experiments at TRECVID 2010",
    TREC Video Retrieval Evaluation Online Proceedings, TRECVID, 2006
    URL: http://mrim.imag.fr/georges.quenot/articles/trec06.pdf
[7] C. Petersohn. "Fraunhofer HHI at TRECVID 2004:  Shot Boundary Detection
    System", TREC Video Retrieval Evaluation Online Proceedings, TRECVID, 2004
    URL: http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/fraunhofer.pdf
[8] Marijn Huijbregts, Roeland Ordelman and Franciska de Jong, Annotation
    of Heterogeneous Multimedia Content Using Automatic Speech
    Recognition. in Proceedings of SAMT, December 5-7 2007, Genova, Italy
[9] Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse,
    Lori Lamel, and Abdel Messaoudi. Modeling Northern and Southern
    Varieties of Dutch for STT. In Interspeech'09, pages 96-99, Brighton,
    UK, September, 2009.