This paper presents the systems used by CLIPS-IMAG and its partners, LSR-IMAG, LIS and LABRI laboratories, to perform the tasks proposed in the TRECVID 2004 workshop. SBD was performed using a system based on image difference with motion compensation and direct dissolve detection. This system gives control of the silence to noise ratio over a wide range of values and for an equal value of noise and silence (or recall and precision), the F1 value is 0.83 for all types of transitions. Story segmentation was performed using a combination of multi-modal detectors and the F1 value for the optimal system configuration was 0.48. Feature extraction was achieved using a combination of lexical context based classification, a color and texture based classification and face recognition. The search system uses a user controlled combination of five mechanisms: keywords, similarity to example images, semantic categories, similarity to already identified positive images, and temporal closeness to already identified positive images. The mean average precision of the system (with the most experienced user) is 0.24.