Tutorial on Semantic Search on Medical Texts

Description of the Tutorial

Semantic search is known as ’search with meaning’ [1]. This refers to the query understanding, the data understanding and the appropriate representation of knowledge allowing meaningful retrieval. While this topic is addressed in a wide range of research disciplines with different formulations of the problem, we focus in this tutorial on keyword-based searches processed particularly on medical texts with the view of the information retrieval (IR) community. The central and practical goal of medical IR is to design text matching and ranking models that support systems in 1) assisting patients and their next-of-kins seeking understanding and guidance through web search and 2) helping physicians and clinicians seeking to improve their decision-making in diagnosing and treating. However, numerous research studies have shown that such tasks are complex, leading to system failure, mainly because of the gap between low-level document representations and high-level meaning of their content [2, 3, 4]. The review of the state-of-the-art in medical IR clearly reveals that an important mean of understanding and improvement in the area is related to the use of semantics [5]. Researchers in IR have intensively studied the medical and health-related queries [6, 7, 8] and developed a large body of contributions using semantics alongside the cycle-life of search: 1) enhancing the query and documents representations through expansion and/or rewriting guided by knowledge resources [9, 10, 11] or more recently, by learning neural based concept and document representations constrained by domain knowledge [12, 13, 14]; 2) making inferences between texts and words distribution in corpora with symbolic knowledge provided in external resources [15, 16]; 3) revisiting relevance under the constraint of both word statistics like in traditional IR and semantics hold in knowledge-base resources [17, 18, 19]. This tutorial will first explore the peculiarities of medical and health- related queries with respect to various facets (eg. vocabulary, user’s expertise, task) with the attempt of better understanding the underlying human intent. Second, as envisioned in semantic search, we will focus on the techniques and theoretical models that go beyond lexical matching to drive the search. We will cover both the symbolic semantics through the use of external resources (eg. UMLS, MeSH, Gene Ontology) and the distributional semantics relying on words collocations in the corpus including recent representation learning approaches of concepts and documents. Third, we will develop a roadmap on the main evaluation frameworks used in medical IR and then particularly examine and compare the effectiveness of semantic-based IR approaches. Finally, we summarize the research findings in the area and outline the key open research questions. To sum up, the goals of the tutorial are the following:

Summarize the lessons that can be drawn from studies investigating the peculiarities of medical-related information needs;

Present state-of-the art semantic search models supporting medical IR processes;

Describe the major medical search evaluation benchmarks used in the IR community and report the key result trends achieved by the application of semantic IR models;

[1] H. Bast, B. Björn, and E. Haussmann. Semantic search on text and knowledge bases. Foundations and Trends Information Retrieval, 10(2-3):119–271, June 2016.
[2] K. Natarajan, D. Stein, S. Jain, and N. Elhadad. An analysis of clinical queries in an electronic health record search utility. International journal of medical information, 79(7):515–522, 2010.
[3] T. Edinger, A. M. Cohen, S. Bedrick, K. Ambert, and W. Hersh. Barriers to retrieving patient information from electronic health record data: Failure analysis from the trec medical records track. In AMIA Annual Symposium Proceedings, page 180. American Medical Informatics Association, 2012.
[4] L. Goeuriot, G. J. F. Jones, L. Kelly, H. Mueller, and J. Zobel. Medical information retrieval: introduction to the special issue. Information Retrieval Journal, 19(1):1–5, Apr 2016.
[5] C. Marton and C. W. Choo. A review of theoretical models of health information seeking on the web. Journal of documentation, 68(3):330–352, 2012.
[6] J. Ely, M. H. Osheroff, G. Bergus, B. T. levt, M. Chambliss, and E. Evans. Analysis of question asked by family doctors regarding patient care. British medical journal, 319:358–361, 1999.
[7] W. Hersh, M. Crabtree, D. Hickam, L. Sacherek, L. Rose, and C. Friedman. Factors associated with successful answering of clinical questions using an information retrieval system. Bull Medical Library Association, 88, 2000.
[8] Y. Zhang. Searching for specific health-related information in medlineplus: behavioral patterns and user experience. Journal of the American Society for Information Science and Technology (JASIST), 65(1):53–68, 2013.
[9] D. Dinh and L. Tamine. Combining global and local semantic contexts for improving biomedical information retrieval. In Advances in Information Retrieval - 33rd European Conference on IR Research, ECIR 2011, Dublin, Ireland, April 18-21, 2011. Proceedings, pages 375–386, 2011.
[10] P. Sondhi, J. Sun, R. Sorrentino, and M. S. Kohn. Leveraging medical thesauri and physician feedback for improving medical literature retrieval for case queries. Jurnal of American Medical Association, 19(5):851–858, 2014.
[11] D. Zhu, S. Wu, B. Carterette, and H. Liu. Using large clinical corpora for query expansion in text-based cohort identification. J. of Biomedical Informatics, 49(C):275–281, June 2014.
[12] L. Xiaojie, N. Jian-Yun, and S. Alessandro. Constraining word embeddings by prior knowledge – application to medical information retrieval. In AIRS, pages 155–167, 2016.
[13] E. Choi, M. T. Bahadori, E. Searles, C. Coffey, and J. Sun. Multilayer representation learning for medical concepts. In KDD, pages 635–644. ACM, 2016.
[14] G.-H. Nguyen, L. Tamine, L. Soulier, and N. Souf. Learning Concept-Driven Document Embeddings for Medical Information Search. In Conference on Artificial Intelligence in Medicine (AIME 2017), pages 160–170, Vienna, Austria, 2017.
[15] B. Koopman, G. Zuccon, P. Bruza, L. Sitbon, and M. Lawley. Information retrieval as semantic inference: a graph inference model applied to medical search. Information Retrieval Journal, 19(1):6– 37, Apr 2016.
[16] N. Limsopatham, C. Macdonald, and I. Ounis. Inferring conceptual relationships to improve medical records search. In Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, OAIR ’13, pages 1–8, 2013.
[17] W. Zhou, C. Yu, N. Smalheiser, V. Torvik, and J. Hong. Knowledge-intensive conceptual retrieval and passage extraction of biomedical literature. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, pages 655–662, New York, NY, USA, 2007. ACM.
[18] C. Wang and R. Akella. Concept-based relevance models for med- ical and semantic information retrieval. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, pages 173–182, New York, NY, USA, 2015. ACM.
[19] Z.Xie,Y.Xia,andQ.Zhou.Incorporating semantic knowledge with mrf term dependency model in medical document retrieval. In J. Li, H. Ji, D. Zhao, and Y. Feng, editors, Natural Language Processing and Chinese Computing: 4th CCF Conference, NLPCC 2015, Nanchang, China, October 9-13, 2015, Proceedings, pages 219–228, Cham, 2015. Springer International Publishing.

Presenters

Lynda Tamine-Lechani
Department of Computing Science, University Paul Sabatier, Toulouse (France)

Lynda Tamine is a Professor of Computer Science at the Paul Sabatier university in Toulouse and member of the Institut de Recherche en Informatique de Toulouse (IRIT). Her research interests include modelling and evaluation of medical, contextual, collaborative and social information retrieval. Lynda Tamine has already presented tutorials at ECIR 2016 and ICTIR 2016 on collaborative and social IR models. Together with a team of PhD students, she works on the characterization of medical queries according to diverse facets such as user’s expertise, task and difficulty and on semantic search models within medical settings .

Lorraine Goeuriot
LIG - Université Grenoble Alpes

Lorraine Goeuriot is an associate professor in Universit\'e' Grenoble Alpes. She obtained her Master in computer science and a PhD in computational linguistics on medical data in the University of Nantes, France. She worked as a post-doctoral researcher in Nanyang Technological University, Singapore, on medical opinion mining and in Dublin City University on medical information processing and retrieval. Since 2013, she has been highly involved as an organizer in the CLEF eHealth evaluation lab. This lab organizes several information extraction and information retrieval tasks in the medical domain every year. She has been involved in the organization of the information retrieval task since its first edition in 2013.

Tutorial on Semantic Search on Medical Texts

Description of the Tutorial

Presenters

Content of the tutorial

Contact