Seminar "Modeling Term Associations for Probabilistic Information Retrieval"

-

Room “Sala Seminari” - Abacus Building (U14)

 

Modeling Term Associations

for Probabilistic Information Retrieval

 

Speaker Prof. Jimmy Huang

 

Abstract

Traditionally, in many probabilistic retrieval models, query terms are assumed to be independent. Although such models can achieve reasonably good performance, associations can exist among terms from human being's point of view. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this talk, I will introduce a new concept named Cross Term, to model term proximity, with the aim of boosting retrieval performance. With Cross Terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query term impact gradually weakens with increasing distance from the place of occurrence. We use shape functions to characterize such impacts. Based on this assumption, we first propose a bigram CRoss TErm Retrieval (CRTER2) model as the basis model, and then recursively propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model for n query terms where n > 2. Specifically, a bigram Cross Term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. For n-gram Cross Term, we develop several distance metrics with different properties and employ them in the proposed models for ranking. We also show how to extend the language model using the newly proposed cross terms. Extensive experiments on a number of TREC collections demonstrate the effectiveness of our proposed models. Finally, conclusions and future work will be presented.

 

Short Bio

Professor Jimmy Huang is a Tier 1 York Research Chair in Big Data Analytics at York University. His research focuses on information retrieval, artificial intelligence, natural language processing, and big data analytics, particularly for web and healthcare applications. He has published more than 360 refereed papers in leading journals and conferences, including ACM SIGIR, CIKM, ACL, EMNLP, ICML, IJCAI, and AAAI, with several receiving best paper awards. His research on task-oriented and context-sensitive information retrieval has contributed to advances in areas such as conversational search and BERT/ChatGPT-related technologies. His SIGIR 2022 paper on hypergraph contrastive collaborative filtering was recognized as the most influential SIGIR paper in 2023.

 

Professor Huang has held major leadership roles in the research community, including General Chair of ACM SIGIR 2020 and ACM CIKM 2010, and currently serves as Chair of the IEEE Technical Community on Intelligent Informatics. Since joining York University in 2003, he has led numerous large-scale funded research projects supported by NSERC and other agencies. He is an ACM Distinguished Scientist, IEEE Fellow, Fellow of the Canadian Academy of Engineering, and Fellow of several international professional societies. His honors include the Premier’s Research Excellence Award, multiple best paper awards, and the 2026 President’s Research Excellence Award.

 

contact person for this Seminar: gabriella.pasi@unimib.it

Argomento