This paper presents the systems used by CLIPS-IMAG to perform the Shot Boundary Detection (SBD) task and the Feature Extraction (FE) task of the TRECvid workshop. Results obtained for the 2003 evaluation are presented. The CLIPS SBD system based on image difference with motion compensation and direct dissolve detection was second among 14 systems. This system gives control of the silence to noise ratio over a wide range of values and for an equal value of noise and silence (or recall and precision), the value is 12 % for all types of transitions. Detection of person X from speaker recognition alone was deceiving due to the small number of shots containing person X in the overall test collection (about 1/700) and the even small number in which person X was actually speaking (about 1/6000). Detection of person X from speech transcription performed much better but was still lower than other systems using also the image track for the detection.