Seminars

The DIG team holds a seminar about every two weeks with speakers either from the team, or invited.

You can add the seminars to your calendar with this ics file, and get emails about future seminars by subscribing to our mailing-list.

If you would like to present your work at our seminar, please contact Nils.

Upcoming Seminars

Neuro-symbolic representation learning

Tuesday, March 10, 2026 11:45, 4A301

Boshko Koloski (Jožef Stefan Institute)

Abstract TBA

Read full seminar details


Title TBA

Tuesday, April 14, 2026 11:45, 1A312

Antoine Gourru (Télécom Saint-Etienne)

Abstract TBA

Read full seminar details


Data Integration: Remaining Challenges and Research Paths

Tuesday, May 19, 2026 11:45, 4A301

Robert Wrembel (Poznań University of Technology)

Data integration (DI) has been a cornerstone of computer science research for decades, resulting in a few established reference architectures. They generally fall into three categories: virtual (federated and mediated), physical (data warehouse), and hybrid (data lake, data lakehouse, and data mesh). Regardless of the paradigm, these architectures depend on an integration layer, implemented by means of sophisticated software designed to orchestrate and execute DI processes. The integration layer is responsible for ingesting data from various sources (typically heterogeneous and distributed) and for homogenizing data into formats suitable for future processing and analysis. On the one hand, in all business domains, large volumes of highly heterogeneous data are produced, e.g., medical systems, smart cities, smart agriculture, which require further advancements in the data integration technologies. On the other hand, the widespread adoption of artificial intelligence (AI) solutions is now extending towards DI, offering alternative solutions, opening new research paths, and generating new open problems. Emerging paradigms, such as Data Spaces and the Model Context Protocol, further advance DI.

Read full seminar details


Past Seminars

Computational analysis of news: methods and applications in keyword extraction, fake-news classification, sentiment and migrations discourse analysis

Tuesday, February 24, 2026 11:45, 4A301

Senja Pollak & Boshko Koloski (Jožef Stefan Institute)

With the growing volume and influence of digital news media, computational methods for analysing news content have become essential. This talk addresses interconnected challenges spanning keyword extraction, misinformation detection, sentiment classification, and the study of migration discourse. For keyword extraction, we propose SEKE, a mixture-of-experts architecture that achieves state-of-the-art performance while offering interpretability through expert specialisation in distinct linguistic components. To combat misinformation, we demonstrate that ensembling heterogeneous representations from bag-of-words to knowledge graph-enriched neural embeddings substantially improves fake news classification. Extending beyond English, we develop zero-shot cross-lingual methods for both offensive language detection and news sentiment analysis, introducing novel training strategies that significantly outperform prior approaches for less-resourced languages. We apply these computational tools to the socially critical domain of migration discourse, analysing dehumanisation patterns and news framing in Slovene media coverage of Syrian and Ukrainian migrants — uncovering that while discourse has grown more negative over time, it is notably less dehumanising toward Ukrainian migrants. These contributions advance NLP methodology for news analysis while demonstrating its power to illuminate media narratives around pressing societal issues.

Read full seminar details


Automatic Evaluation of Human-Written and Machine-Generated Text

Tuesday, January 13, 2026 11:45, 4A301

Yanzhu Guo

With the rapid expansion of digital content, automatic evaluation of textual information has become increasingly crucial. My research addresses the challenge of evaluating and enhancing the quality of both human-written and machine-generated texts. Beginning with human-written texts, we develop an argument extraction system to evaluate the substantiation of scientific peer reviews, providing insights into the quality of peer review in NLP conferences in recent years. Additionally, we assess the factuality of abstractive summarization datasets and propose a data refinement approach that enhances model performance while reducing computational demands. For machine-generated texts, we focus on the underexplored aspects of diversity and naturalness. We introduce a suite of metrics for measuring linguistic diversity and conduct a systematic evaluation across state-of-the-art LLMs, exploring how development and deployment choices influence their output diversity. We also investigate the impact of training LLMs on synthetic data produced by earlier models, demonstrating that recursive training loops lead to diminishing diversity. Finally, we explore the naturalness of multilingual LLMs, uncovering an English-centric bias and proposing an alignment method to mitigate it. These contributions advance evaluation methodologies for natural language generation and provide insights into the interaction between evaluation metrics, dataset quality and language model performance.

Read full seminar details


Contextual knowledge representation for neurosymbolic Artificial Intelligence reasoning

Thursday, December 11, 2025 12:15, 4A301

Simon Coumes

The field of Knowledge Representation and Reasoning is concerned with the representation of information about reality in a form that is both human-readable and machine-processable. It has been a part of artificial intelligence since its inception, and has produced many important formalisms and systems. One key aspect of knowledge is the context in which it is expressed. This has been identified early on in the field and matches with our common experience: understanding a statement or judging its validity often require to know in what context it was meant. Historically, there has been some work aiming at producing logics implementing a general notion of context. None of them saw a lot of adoption, in part because they lack either sufficient expressive power or because they were not sufficiently usable.

Read full seminar details


SPECTRA: Faster Large Language Model Inference with Optimized Internal and External Speculation

Tuesday, December 09, 2025 11:45, 1D19

Le-Minh Nguyen (Japan Advanced Institute of Science and Technology)

Inference with modern Large Language Models (LLMs) is both computationally intensive and time-consuming. While speculative decoding has emerged as a promising solution, existing approaches face key limitations. Training-based methods require the development of a draft model, which is often difficult to obtain and lacks generalizability. On the other hand, training-free methods provide only modest speedup improvements. In this work, we introduce SPECTRA — a novel framework designed to accelerate LLM inference without requiring any additional training or modifications to the original LLM. SPECTRA incorporates two new techniques that efficiently leverage both internal and external speculation, each independently outperforming corresponding state-of-the-art (SOTA) methods. When combined, these techniques deliver up to a 4.08× speedup across a variety of benchmarks and LLM architectures, significantly surpassing existing training-free approaches. The implementation of SPECTRA is publicly available. Biography: Le-Minh Nguyen is currently a Professor of the School of Information Science and the director of the Interpretable AI Center at JAIST. He leads the Machine Learning and Natural Language Understanding Laboratory at JAIST. He is currently taking his sabbatical at Imperial College London, UK (Until April 2026). His research interests include machine learning & deep learning, natural language processing, legal text processing, and explainable AI. He serves as an action editor of TACL (a leading journal in NLP), a board member of VLSP (Vietnamese language and speech processing), and an editorial board member of AI &Law, Journal of Natural Language Processing (Cambridge). He is a steering committee of Juris-informatics (Jurisin) in Japan – a research area that studies legal issues from informatics.

Read full seminar details


Mining Expressive Cross-Table Dependencies in Relational Databases

Tuesday, December 02, 2025 11:45, 4A125

François Amat

This thesis addresses the gap between what relational database schemas declare and the richer set of cross-table rules that actually govern real-world data. It introduces MATILDA, the first deterministic system capable of mining expressive first-order tuple-generating dependencies (FO-TGDs) with multi-atom heads, existential witnesses, and recursion directly from arbitrary relational databases, using principled, database-native definitions of support and confidence. MATILDA uncovers hidden business rules, workflow constraints, and multi-relation regularities that schemas alone cannot capture, while ensuring reproducible results through canonicalized search and tractable pruning guided by a constraint graph. To understand when simpler formalisms suffice, the thesis also presents MAHILDA, a relational Horn-rule baseline equipped with disjoint semantics to prevent self-justifying recursion. Overall, the work shows that expressive rule mining on realistic databases is both feasible and insightful, enabling more systematic, explainable, and schema-grounded analyses of complex relational data.

Read full seminar details


Entity Linking and Relation Extraction for Historical Italian Texts: Challenges and Potential Solutions

Tuesday, October 28, 2025 11:45, 4A125

Cristian Santini (University of Macerata)

Entity Linking and Relation Extraction enable the automatic identification of named entities mentioned in texts, along with their relationships, by connecting them to external knowledge graphs such as Wikidata. While these techniques work well on modern documents, applying them to historical texts presents significant challenges due to the diachronic evolution of language and limited resources for training computational models. This seminar presents recent work on developing methods and datasets for processing historical Italian texts. It will discuss the creation of a new benchmark dataset extracted from digital scholarly editions covering two centuries of Italian literary and political writing. The talk will then present approaches that enhance entity disambiguation by incorporating temporal and contextual information from external Wikidata. Finally, it will detail a method for automatically constructing knowledge graphs from historical correspondence that combines multiple language models in sequence, demonstrating how these technologies can facilitate the exploration and understanding of historical archives without requiring extensive manual annotation or model training.

Read full seminar details


FLORA: Unsupervised Knowledge Graph Alignment by Fuzzy Logic

Tuesday, October 21, 2025 11:45, 4A301

Yiwen Peng & Fabian Suchanek

Knowledge graph alignment is the task of matching equivalent entities (that is, instances and classes) and relations across two knowledge graphs. Most existing methods focus on pure entity-level alignment, computing the similarity of entities in some embedding space. They lack interpretable reasoning and need training data to work. To solve these issues, we introduce FLORA, a simple yet effective method that (1) is unsupervised, i.e., does not require training data, (2) provides a holistic alignment for entities and relations iteratively, (3) is based on fuzzy logic and thus delivers interpretable results, (4) provably converges, (5) allows dangling entities, i.e., entities without a counterpart in the other KG, and (6) achieves state-of-the-art results on major benchmarks.

Read full seminar details


Data- and knowledge-driven approaches for step-by-step guidance to differential diagnosis

Tuesday, October 07, 2025 11:45, 4A301

Adrien Coulet (INRIA)

Diagnosis guidelines provide recommendations based on expert consensus that cover the majority of the population, but often overlook patients with uncommon conditions or multiple morbidities. We will present and compare two alternative approaches that provide a step-by-step guidance to the differential diagnosis of anemia and lupus. The first approach relies on reinforcement learning and observational data. The second on large langage models and domain knowledge.

Read full seminar details


Meaning Representations and Reasoning in the Age of Large Language Models

Tuesday, September 30, 2025 11:45, 3A301

Zacchary Sadeddine

This thesis explores how to make large language models (LLMs) more reliable and transparent in their reasoning. It first examines around fifteen societal issues related to these models, such as disinformation or user overreliance, and then investigates symbolic structures from linguistics and how they can be used to improve the performance and transparency of LLMs. It presents VANESSA, a reasoning neuro-symbolic system that combines the power of neural models with the rigor of symbolic reasoning, achieving performance comparable to LLMs while remaining transparent. Finally, it addresses the problem of verifying LLM outputs by introducing a step-by-step verification benchmark, paving the way for more interpretable, controllable and trustworthy artificial intelligence systems.

Read full seminar details


Robust Knowledge Graph Cleaning

Tuesday, May 27, 2025 11:45, 4A301

Maximilian Egger

Data quality is needed to properly and reliably use the information represented in the dataset. The increasing volume of data renders data preparation and cleaning increasingly difficult. Additionally, more diverse types of data structures for databases, like graphs, get used and need to be handled differently. This leads to the necessity of robust methods to increase data integrity, scalable approaches for finding and fixing errors, and local-oriented algorithms that can be used to pinpoint attention where needed.

Read full seminar details


Synthesis & Augmentation of Tabular Data In the Age of Foundation Models

Tuesday, May 13, 2025 11:45, 4A301

Nikola Simidjievski

Foundation models - large pre-trained performant models - have shown remarkable success in applications that predominately focus on vision, language, and sound data. On the other hand, tabular data - one of the most prevalent data modalities in many critical domains of business, science, and healthcare - has seen limited benefits from these advances. Tabular data poses unique challenges that relate to heterogeneity, dimensionality, and scarcity as well as lack of explicit symmetries, implicit structures and incomplete prior knowledge – all of which have limiting effects on how we construct, train and apply/transfer large models for tabular data.

Read full seminar details


GPTKB: Comprehensively Materializing Factual LLM Knowledge

Tuesday, April 29, 2025 11:45, 4A301

Simon Razniewski (TU Dresden)

LLMs have majorly advanced NLP and AI, and next to their ability to perform a wide range of procedural tasks, a major success factor is their internalized factual knowledge. Since (Petroni et al., 2019), analyzing this knowledge has gained attention. However, most approaches investigate one question at a time via modest-sized pre-defined samples, introducing an “availability bias” (Tversky and Kahneman, 1973) that prevents the discovery of knowledge (or beliefs) of LLMs beyond the experimenter’s predisposition. To address this challenge, we propose a novel methodology to comprehensively materialize an LLM’s factual knowledge through recursive querying and result consolidation. As a prototype, we employ GPT-4o-mini to construct GPTKB, a large-scale knowledge base (KB) comprising 101 million triples for over 2.9 million entities. This work marks a milestone in two areas: For LLM research, for the first time, it provides constructive insights into the scope and structure of LLMs’ knowledge (or beliefs), and its strengths and weaknesses. For KB construction, it pioneers new pathways for the long-standing challenge of general-domain KB construction. GPTKB is accessible at https://gptkb.org.

Read full seminar details


ProvSQL: Provenance and Probabilistic Querying in Uncertain Databases

Tuesday, April 08, 2025 11:45, 4A125

Pratik Karmakar (None)

Probabilistic databases provide a powerful framework for managing and querying uncertain data, enabling principled reasoning under uncertainty. ProvSQL extends PostgreSQL to support provenance tracking and probability computation in probabilistic databases, leveraging provenance circuits to efficiently compute probabilities and Shapley-based data valuations. In this talk, we introduce ProvSQL, demonstrate its capabilities, and explore a key use case—content based image retrieval from the COCO dataset. We show how probabilistic query evaluation and data valuation techniques enhance explainability and trust in AI-driven decision-making.

Read full seminar details


Tabular foundation models: priors for numbers and strings

Tuesday, March 25, 2025 11:45, 4A301

Gaël Varoquaux (INRIA)

Deep-learning typically does not outperform tree-based models on tabular data. Often this may be explained by the small size of such datasets. For images, sound, text, the solution has be pretrained models, leading to foundation models, adapted and reused for many tasks. I will discuss the challenges to bring these ideas to tabular learning, and the progress that we have made, building priors for tables, ie columns of different natures, with numbers and strings.

Read full seminar details


Neuro-symbolic approaches for the knowledge graph lifecycle

Tuesday, March 18, 2025 11:45, 4A301

Pierre Monnin (INRIA)

In the Web of Data, an increasing number of knowledge graphs (KGs) are concurrently published, edited, and accessed by human and software agents. Their wide adoption makes essential the tasks of their lifecycle: construction, refinement (e.g., matching, link prediction), mining, and usage to support applications (e.g., explainable AI, recommender systems). However, all these tasks require facing the inherent heterogeneity of KGs, e.g., in terms of granularities, vocabularies, and completeness. Besides, scalability issues arise due to their increasing size and combinatorial nature. In my talk, I will present my research on neuro-symbolic approaches for the KG lifecycle, intertwining domain knowledge from ontologies, deductive reasoning, analogical reasoning, and machine learning models. Throughout my presentation, I will show that such approaches enhance models by improving their semantic awareness, frugality, and the semantic interpretability of their latent representation space.

Read full seminar details


None

Tuesday, March 04, 2025 11:45, 4A301

Ken Satoh (None)

None

Read full seminar details


None

Tuesday, February 04, 2025 11:45, 4A125

Fabian (None)

None

Read full seminar details


None

Tuesday, January 21, 2025 11:45, 4A301

Simon Delarue (None)

None

Read full seminar details


None

Tuesday, December 10, 2024 11:45, 4A125

Lanfang Kong (None)

None

Read full seminar details


None

Tuesday, December 03, 2024 11:45, 4A125

Gabriel Damay (None)

None

Read full seminar details


None

Tuesday, November 12, 2024 11:45, 4A125

Cyril Chhun (None)

None

Read full seminar details


None

Tuesday, October 29, 2024 11:45, 4A125

Simon Coumes (None)

None

Read full seminar details


None

Tuesday, October 15, 2024 11:45, 4A301

Yael Amsterdamer + Daniel Deutch (None)

None

Read full seminar details


None

Tuesday, October 08, 2024 11:45, 4A125

Rajaa + Yiwen (None)

None

Read full seminar details


None

Tuesday, September 24, 2024 11:45, 4A125

Ambroise Odonnat (None)

None

Read full seminar details


None

Tuesday, September 10, 2024 11:45, 4A125

Samuel & Jean-Louis (None)

None

Read full seminar details


None

Tuesday, July 09, 2024 11:45, 4A125

Peter Fratric (None)

None

Read full seminar details


None

Tuesday, July 02, 2024 11:45, 4A301

Chadi (None)

None

Read full seminar details


None

Tuesday, June 18, 2024 11:45, 4A125

Shady Elbassuoni (None)

None

Read full seminar details


None

Tuesday, June 11, 2024 11:45, 4A301

Agneszka (None)

None

Read full seminar details


None

Tuesday, May 28, 2024 11:45, 4A125

Concept AI (None)

None

Read full seminar details


None

Tuesday, May 21, 2024 11:45, 4A125

fake talks (None)

None

Read full seminar details


None

Tuesday, March 26, 2024 11:45, 4A125

Mehwish (None)

None

Read full seminar details


None

Tuesday, February 13, 2024 11:45, oui, 4A125

Fabian (None)

None

Read full seminar details


None

Tuesday, January 30, 2024 11:45, oui, 4A125

Nils (None)

None

Read full seminar details


None

Tuesday, January 23, 2024 11:45, oui, 4A125

Mariam (None)

None

Read full seminar details


None

Tuesday, December 19, 2023 11:45, oui, 4A125

Rajaa (None)

None

Read full seminar details


None

Tuesday, December 12, 2023 11:45, oui, 4A125

Charbel-Raphaël Segerie (None)

None

Read full seminar details


None

Tuesday, November 21, 2023 11:45, oui, 4A301

Thomas + Simon D (None)

None

Read full seminar details


None

Tuesday, September 26, 2023 11:45, oui, 4A101

Ned (None)

None

Read full seminar details


None

Tuesday, September 19, 2023 11:45, oui

Julien Lie-Panis (None)

None

Read full seminar details


None

Tuesday, June 13, 2023 11:45, None

Lihu Chen (None)

None

Read full seminar details


None

Tuesday, June 06, 2023 11:45, None

Minh Huong Le Nguyen (None)

None

Read full seminar details


None

Tuesday, May 23, 2023 11:45, None

Giovanni Sileno (None)

None

Read full seminar details


None

Tuesday, April 18, 2023 11:45, none

Armand Boschin

None

Read full seminar details


None

Tuesday, April 11, 2023 11:45, None

Fabian (None)

None

Read full seminar details


None

Tuesday, February 14, 2023 11:45, None

Lihu Chen (None)

None

Read full seminar details