Seminars

The DIG team holds a seminar about every two weeks with speakers either from the team, or invited.

You can add the seminars to your calendar with this ics file, and get emails about future seminars by subscribing to our mailing-list.

If you would like to present your work at our seminar, please contact Nils.

Upcoming Seminars

Check back soon for upcoming seminar announcements!

Past Seminars

Identifying and Evaluating Misleading Climate Communication with Natural Language Processing

Tuesday, June 23, 2026 14:00, 4A301

Tom Calamai

Climate communication is becoming more abundant, but not necessarily more informative. This thesis investigates whether Natural Language Processing (NLP) can help structure climate-related discourse, distinguish substantive content from vague or rhetorical formulations, and support credibility assessment. By examining the literature on greenwashing and major datasets for climate-related NLP tasks, it highlights key limitations, including subjectivity, ambiguity, and noisy data. It then proposes ways to address these issues through annotation schemes and evaluation metrics designed for ambiguity, as well as methods for propagating uncertainty into downstream analyses. Overall, the thesis shows that NLP can make climate-related discourse more explicit and analyzable, while also emphasizing that progress depends not only on model performance, but also on task design, data quality, and uncertainty-aware evaluation.

Read full seminar details

Uncertainty in the Data Engineering Pipeline: From Imputation to Explanation

Tuesday, June 02, 2026 11:45, 4A301

Julia Stoyanovich (New York University)

Machine learning pipelines are commonly evaluated on model accuracy, yet much of the uncertainty that shapes real-world outcomes originates not in the model itself but in the data engineering decisions that precede it. This talk examines two manifestations of that upstream uncertainty and their downstream consequences for trustworthy AI.

Read full seminar details

Data Integration: Remaining Challenges and Research Paths

Tuesday, May 19, 2026 11:45, 4A301

Robert Wrembel (Poznań University of Technology)

Data integration (DI) has been a cornerstone of computer science research for decades, resulting in a few established reference architectures. They generally fall into three categories: virtual (federated and mediated), physical (data warehouse), and hybrid (data lake, data lakehouse, and data mesh). Regardless of the paradigm, these architectures depend on an integration layer, implemented by means of sophisticated software designed to orchestrate and execute DI processes. The integration layer is responsible for ingesting data from various sources (typically heterogeneous and distributed) and for homogenizing data into formats suitable for future processing and analysis. On the one hand, in all business domains, large volumes of highly heterogeneous data are produced, e.g., medical systems, smart cities, smart agriculture, which require further advancements in the data integration technologies. On the other hand, the widespread adoption of artificial intelligence (AI) solutions is now extending towards DI, offering alternative solutions, opening new research paths, and generating new open problems. Emerging paradigms, such as Data Spaces and the Model Context Protocol, further advance DI.

Read full seminar details

Toward Responsible Natural Language Processing : Ideal, Illusion, or Imperative?

Tuesday, April 14, 2026 11:45, 1A312

Antoine Gourru (Télécom Saint-Etienne)

Large language models have profoundly transformed natural language processing and are increasingly reshaping work, knowledge production, and social organization, yet they remain misaligned with societal values and demand substantial computational resources. In this seminar, I will present my research on responsible NLP, structured around two central pillars: fairness and frugality. Through a scientific overview of selected recent and ongoing works, I will discuss methods to assess and mitigate alignment failures, and to develop resource-efficient approaches that promote more sustainable NLP systems.

Read full seminar details

From Multimodal Foundation Models to Physical AI: Scalable and Structured Learning

Tuesday, March 31, 2026 11:45, 1D23

Duy Nguyen Ho Minh

Multi-modal LLMs have transformed AI’s ability to learn from rich, multi-modal inputs, spanning images, text, structured records, and more. Yet real-world domains such as healthcare and scientific discovery demand not only accuracy, but also scalability, robustness, and efficiency. This talk will present recent advances in scalable multi-modal learning with a focus on hybrid discrete–continuous representations that bridge structured knowledge and high-dimensional signals. We will discuss algorithmic designs that integrate combinatorial structures into neural networks via differentiable relaxations, enabling end-to-end training across heterogeneous modalities.

Read full seminar details

Neuro-symbolic representation learning

Tuesday, March 10, 2026 11:45, 4A301

Boshko Koloski (Jožef Stefan Institute)

Abstract TBA

Read full seminar details

Computational analysis of news: methods and applications in keyword extraction, fake-news classification, sentiment and migrations discourse analysis

Tuesday, February 24, 2026 11:45, 4A301

Senja Pollak & Boshko Koloski (Jožef Stefan Institute)

With the growing volume and influence of digital news media, computational methods for analysing news content have become essential. This talk addresses interconnected challenges spanning keyword extraction, misinformation detection, sentiment classification, and the study of migration discourse. For keyword extraction, we propose SEKE, a mixture-of-experts architecture that achieves state-of-the-art performance while offering interpretability through expert specialisation in distinct linguistic components. To combat misinformation, we demonstrate that ensembling heterogeneous representations from bag-of-words to knowledge graph-enriched neural embeddings substantially improves fake news classification. Extending beyond English, we develop zero-shot cross-lingual methods for both offensive language detection and news sentiment analysis, introducing novel training strategies that significantly outperform prior approaches for less-resourced languages. We apply these computational tools to the socially critical domain of migration discourse, analysing dehumanisation patterns and news framing in Slovene media coverage of Syrian and Ukrainian migrants — uncovering that while discourse has grown more negative over time, it is notably less dehumanising toward Ukrainian migrants. These contributions advance NLP methodology for news analysis while demonstrating its power to illuminate media narratives around pressing societal issues.

Read full seminar details

Automatic Evaluation of Human-Written and Machine-Generated Text

Tuesday, January 13, 2026 11:45, 4A301

Yanzhu Guo

With the rapid expansion of digital content, automatic evaluation of textual information has become increasingly crucial. My research addresses the challenge of evaluating and enhancing the quality of both human-written and machine-generated texts. Beginning with human-written texts, we develop an argument extraction system to evaluate the substantiation of scientific peer reviews, providing insights into the quality of peer review in NLP conferences in recent years. Additionally, we assess the factuality of abstractive summarization datasets and propose a data refinement approach that enhances model performance while reducing computational demands. For machine-generated texts, we focus on the underexplored aspects of diversity and naturalness. We introduce a suite of metrics for measuring linguistic diversity and conduct a systematic evaluation across state-of-the-art LLMs, exploring how development and deployment choices influence their output diversity. We also investigate the impact of training LLMs on synthetic data produced by earlier models, demonstrating that recursive training loops lead to diminishing diversity. Finally, we explore the naturalness of multilingual LLMs, uncovering an English-centric bias and proposing an alignment method to mitigate it. These contributions advance evaluation methodologies for natural language generation and provide insights into the interaction between evaluation metrics, dataset quality and language model performance.

Read full seminar details

Contextual knowledge representation for neurosymbolic Artificial Intelligence reasoning

Thursday, December 11, 2025 12:15, 4A301

Simon Coumes

The field of Knowledge Representation and Reasoning is concerned with the representation of information about reality in a form that is both human-readable and machine-processable. It has been a part of artificial intelligence since its inception, and has produced many important formalisms and systems. One key aspect of knowledge is the context in which it is expressed. This has been identified early on in the field and matches with our common experience: understanding a statement or judging its validity often require to know in what context it was meant. Historically, there has been some work aiming at producing logics implementing a general notion of context. None of them saw a lot of adoption, in part because they lack either sufficient expressive power or because they were not sufficiently usable.

Read full seminar details

SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation

Tuesday, December 09, 2025 11:45, 1D19

Le-Minh Nguyen (Japan Advanced Institute of Science and Technology)

Inference with modern Large Language Models (LLMs) is both computationally intensive and time-consuming. While speculative decoding has emerged as a promising solution, existing approaches face key limitations. Training-based methods require the development of a draft model, which is often difficult to obtain and lacks generalizability. On the other hand, training-free methods provide only modest speedup improvements. In this work, we introduce SPECTRA — a novel framework designed to accelerate LLM inference without requiring any additional training or modifications to the original LLM. SPECTRA incorporates two new techniques that efficiently leverage both internal and external speculation, each independently outperforming corresponding state-of-the-art (SOTA) methods. When combined, these techniques deliver up to a 4.08× speedup across a variety of benchmarks and LLM architectures, significantly surpassing existing training-free approaches. The implementation of SPECTRA is publicly available. Biography: Le-Minh Nguyen is currently a Professor of the School of Information Science and the director of the Interpretable AI Center at JAIST. He leads the Machine Learning and Natural Language Understanding Laboratory at JAIST. He is currently taking his sabbatical at Imperial College London, UK (Until April 2026). His research interests include machine learning & deep learning, natural language processing, legal text processing, and explainable AI. He serves as an action editor of TACL (a leading journal in NLP), a board member of VLSP (Vietnamese language and speech processing), and an editorial board member of AI &Law, Journal of Natural Language Processing (Cambridge). He is a steering committee of Juris-informatics (Jurisin) in Japan – a research area that studies legal issues from informatics.

Read full seminar details

Older