We (LIG, Laboratoire d'Informatique de Grenoble) and LIF (Laboratoire d'Informatique Fondamentale de Marseille) are organizing the collaborative annotation for the TRECVID [1] SIN 2012 task as it was organized in 2003 [2], 2005 [3], 2007, 2008, 2009, 2010 and 2011 [4].
We have done some works on active learning and its relation to the corpus annotation problem. Part of this work has been described in a paper that has been published in Signal Processing: Image Communication [5]. This work indicates that it is possible to annotate only a small fraction of carefully chosen samples (typically between 15 and 20%) of a training collection and the system still achieves the same performance (or even better) compared to if all the collection was annotated. This was confirmed on the TRECVID 2007 collaborative annotation [4] though the optimal annotation fraction was found to be between 35 and 50% (this may be due to the small size of TRECVID 2007 development collection).
How does it work? A set of samples is available for training but it is not annotated yet (this is currently the case for the development set of TRECVID 2012 where the samples are keyframes/subshots). A system for concept detection is also available and can be trained using samples annotated as positive and negative for a concept to be detected. The annotation of the training set is partial and incremental.
The principle of using active learning for annotation is to use the system to select the samples that are potentially the most informative ones for the system training. Several strategies can be considered, the most popular one selects the most probable or the most uncertain samples. If several systems are available, it is possible to select samples from which the different systems disagree.
Since the system is used to produce the set of annotations with which it will then use to train itself, this can work only in an iterative way while there is a "cold start" problem. In the present case, the cold start will be done using previous year annotations (on TRECVID 2007 development collection) and judgements (on TRECVID 2007 test collection) when available or using LSCOM annotations (on TRECVID 2005 development collection) depending upon the target concept (feature). Once the process is started, at each iteration, the system becomes better and better at selecting good samples for annotation [4][5].
Compared to TRECVID 2007-2009 collaborative annotations using active learning, there is innovation for TRECVID 2010-2012 in the use of relations between annotated concepts. The principle is that if a shot is labeled as positive for Adult, it will automatically be labeled as positive for Person and if a concept is labeled negative for Person, it will be automatically labeled as negative for Adult, Male_Person, Female_Person, Teenagers, etc. Special care will be taken into account when choosing the annotation to be done and their order so that this effect makes each annotation done as efficient as possible. 500 concepts were selected for TRECVID SIN 2011 so that they cover as many previous TRECVID HLFs as possible and also comply as much as possible with the LSCOM ontology and that they are linked by a number of generic-specific type relations. Only 346 of them were sufficiently annotated and were actually used for the submissions. The same set will be used again in the TRECVID SIN 2012 task.
We propose to use this approach to perform a partial annotation of the TRECVID 2012 development set in the context of a collaborative annotation effort. That effort would be done using a system similar to the one used for TRECVID 2007 to 2010 with light effort since only a fraction of the training set will be annotated. Like in the previous collaborative annotations, the annotations will be available to all teams that participated to the annotation before the TRECVID 2012 workshop and to everybody after.
We provide a web interface for the collaborative annotation. The active learning process will be transparent for the annotators, they will simply encounter more positive samples than in a full or in a random annotation (at least in the beginning). In practice, the concept detection system will be re-trained continuously with the latest available annotations and the next samples to be annotated will be re-evaluated and re-sorted each time the system has been re-trained.
We plan to set up the annotation on March 31st. We will take advantage of ASR/MT. The annotation would be done during 2 to 4 weeks and annotation on sets of increasing sizes could be delivered periodically meanwhile.
We produced the master shot segmentation using the LIG shot boundary detection system [6] and extracted one keyframe per shot. There are 400,289 shots/keyframes in the TRECVID 2012 development set. Each participating team is invited to produce 30,000 annotations or more. At an average of 2 seconds per annotation (the exact value varies: in some cases, one may see almost at once that there is no positive in 25 images; in other cases, one may have to play the subshot to check its content quite often), this corresponds to about 17 hours of full time work.
We recommend to spread your annotation effort in sessions of from 30 to 60 minutes and once or twice a day. Annotations can be done until the 20th of May though we do advise to do it earlier or as soon as possible.
Teams that already participated to the TRECVID 2011 collaborative annotation and that are registered for the 2012 SIN task have by default been registered for the TRECVID 2012 collaboartive annotation with the same group name and contact person. This is for simplifying the registration process and no actual participation to the collaborative annotation is require though, indeed, it would be much appreciated. If you are a new team and interested in participating this collaborative annotation, please send an email to Franck.Thollard@imag.fr with copy to Georges.Quenot@imag.fr indicating your team and the contact person.
Please note that participating in the annotations and/or getting them is only open to the registered TRECVID participants who have signed the Sound and Vison licence agreement (i.e. listed in the "tv12.who.what" file). Access to the annotation system is password protected. Registered participants can access the annotation system at http://mrim.imag.fr/tvca2012/al.html.
40 teams are currently registered for the collaborative annotation:
| Team | Contact person |
|---|---|
| Laboratoire d'informatique de Grenoble (LIG) | Bahjat Safadi |
| Laboratoire d'informatique fondamentale de Marseille (LIF) | Stéphane Ayache |
| JOANNEUM RESEARCH Forschungsgesellschaft mbH & Vienna University of Technology | Werner Bailer |
| France Telecom Orange Labs, Beijing | Kun Tao |
| GDR ISIS IRIM group | Franck Thollard |
| Tokyo Institute of Technology, Canon Corporation | Kiochi Shinoda |
| Intelligent Multimedia Group in Tsinghua University, Fujitsu Research & Development Center Co., Ltd. and Fujitsu Laboratories Ltd | Jianmin Li |
| Quaero consortium | Georges Quénot |
| National Institute of Informatics, Japan | Duy-Dinh Le |
| The University of Electro-Communications, Japan | Keiji Yanai |
| Beijing Jiaotong University, China | Jie Geng |
| East China Normal University | Feng Wang |
| NHK (Japan Broadcasting Corp.) Science and Technical Research Laboratories | Yoshihiko Kawai |
| University of Kaiserlautern | Wan-Lei Zhao |
| Informatics and Telematics Institute, Greece | Vasileios Mezaris |
| Universitaet Klagenfurt | Klaus Schöffmann |
| Carnegie Mellon University | Lei Bao |
| Florida International University | Fausto Fleites |
| VIREO at City Universtiy of Hong Kong | Shiai Zhu |
| Dublin City University | Frank Hopfgartner |
| Brno University of Technology | Michal Hradis |
| Beijing University of Posts and Telecommunications - MCPRL | Zhicheng Zhao |
| University of Amsterdam | Cees Snoek |
| NTT Cyber Space Laboratories and Dalian University of Technology | Haojie Li |
| École Nationale d'Ingénieurs de Sfax (ENIS) | Anis Benammar |
| Nikon Corporation | Takeshi Matsuo |
| Shanghai Jiaotong Univrsity-IS | Jiang Chengming |
| Sichuan University of China | Xiao-Yong Wei |
| Aalto University School of Science and Technology | Mats Sjöberg |
| Northwest Polytechnical University | Jun Wu |
| Columbia University | Dong Liu |
| Liris-Imagine | Charles-Edmond Bichot |
| Marburg | Markus Muhling |
| Nist | Paul Over |
| Fudan University, China | Hong Lu |
| IBM Watson Research Center | Lexing Xie |
| Information and Communicatin Engineering, Xi'an Jiaotong University | Zhe Wang |
| INRIA Willow | Rachid Benmokhtar |
| Institute of Image Comm. and Inf. Proc., Shanghai Jiao Tong University | Xiaokang Yang |
| Kobe University | Kimiaki Shirahama |
| Peking University | Yuan Feng |
| TÜBİTAK Space Technologies Research Institute | Ahmet Saracoglu |
| University of Marburg | Markus Mühling |
| National Cheng Kung University | Chien-Li Chou |
| Fuzhou University | Jianjun Huang |
| Stanford University | André Filgueiras de Araujo |
The final version of the 2007 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2007/ann.tgz.
The final version of the 2008 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2008/ann.tgz.
The final version of the 2009 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2009/ann.tgz.
The final version of the 2010 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2010/ann-orig.tgz
(original version).
The final version of the 2010 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2010/ann.tgz
(updated version).
The final version of the 2011 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2011/ann-orig.tgz
original version).
The final version of the 2011 collaborative annotation can be downloaded from
http://mrim.imag.fr/tvca2011/ann.tgz
(updated version).
The latest version of the 2012 collaborative annotation can be downloaded from http://mrim.imag.fr/tvca2012/ann.tgz (currently containing only the 2012 new annotations).
The difference between the original and updated versions is that the annotations corresponding to dropped (withdrawn) videos have been removed.
In 2012, it will not be required to do any minimum of annotation for accessing the annotation. During the annotation phase, you can get the latest version of full annotation at any time. You can update it as frequently as you wish as the collaborative annotation progresses.
Annotations are given in the order in which they were selected by the active learning process (those predicted as most useful first).
The annotations are delivered as a gzip compressed unix tar archive (".tgz"). The format is the same as in 2005, 2007, 2008, 2009, 2010 and 2011:
Each line represents a judgment. For a given feature and shot there will likely be more than one judgment. It is up to you how you use these multiple annotations. Each line contains the following information: toolName annotationSite featureName movieName keyFrameName judgment(Skip/Positive/Negative)
The shotname can be derived from the keyframename by removing the "_RKF" or "_NRKF_#" part. There may be several keyframes/subshots per shot for 2007, 2008 and 2009.
There is currently only one judgment per keyframe and possibly several judgments per shot.
This work is supported by the Quaero programme.
[1] Smeaton, A. F., Over, P., and Kraaij, W. 2004. TRECVID: evaluating
the effectiveness of information retrieval tasks on digital video.
In Proceedings of the 12th Annual ACM international Conference on
Multimedia (New York, NY, USA, October 10 - 16, 2004). MULTIMEDIA '04.
ACM Press, New York, NY, 652-655.
DOI= http://doi.acm.org/10.1145/1027527.1027678
[2] C.-Y. Lin, B. L. Tseng and J. R. Smith, "Video Collaborative Annotation
Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets,"
NIST TREC-2003 Video Retrieval Evaluation Conference, Gaithersburg, MD,
November 2003.
URL: http://www-nlpir.nist.gov/projects/tvpubs/papers/ibm.final.paper.pdf
[3] Timo Volkmer, John R. Smith, Apostol (Paul) Natsev, Murray Campbell,
Milind Naphade, "A web-based system for collaborative annotation of large
image and video collections", In Proceedings of the 13th ACM international
Conference on Multimedia, Singapore, 6-11 November, 2005
[4] Stéphane Ayache and Georges Quénot, "Video Corpus Annotation using
Active Learning", 30th European Conference on Information Retrieval
(ECIR'08), Glasgow, Scotland, 30th March - 3rd April, 2008
URL: http://mrim.imag.fr/georges.quenot/articles/ecir08.pdf
[5] Stéphane Ayache and Georges Quénot, "Evaluation of Active Learning
Strategies for Video Indexing", Signal Processing: Image Communication,
Vol 22/7-8 pp 692-704, August-September 2007.
DOI: http://dx.doi.org/10.1016/j.image.2007.05.010
[6] Stéphane Ayache, Georges Quénot, and Jéröme Gensel,
"CLIPS-LSR Experiments at TRECVID 2010",
TREC Video Retrieval Evaluation Online Proceedings, TRECVID, 2006
URL: http://mrim.imag.fr/georges.quenot/articles/trec06.pdf
[7] C. Petersohn. "Fraunhofer HHI at TRECVID 2004: Shot Boundary Detection
System", TREC Video Retrieval Evaluation Online Proceedings, TRECVID, 2004
URL: http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/fraunhofer.pdf
[8] Marijn Huijbregts, Roeland Ordelman and Franciska de Jong, Annotation
of Heterogeneous Multimedia Content Using Automatic Speech
Recognition. in Proceedings of SAMT, December 5-7 2007, Genova, Italy
[9] Julien Despres, Petr Fousek, Jean-Luc Gauvain, Sandrine Gay, Yvan Josse,
Lori Lamel, and Abdel Messaoudi. Modeling Northern and Southern
Varieties of Dutch for STT. In Interspeech'09, pages 96-99, Brighton,
UK, September, 2009.