2025 SEMINARS

Observers for data assimilation and parameter estimation

Didier Auroux, Université Côte d'Azur (Nice Sophia Antipolis)

Wednesday, 26 November 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

Nudging is a simple data assimilation method that uses dynamical relaxation to adjust a model towards observations. The standard nudging algorithm consists in adding a feedback term, proportional to the difference between the observations and the corresponding model state, to the model equations. Also known as the Luenberger (or asymptotic) observer, it theoretically requires an infinite time window to converge.

The Back and Forth Nudging algorithm has been introduced in order to extend the efficiency of nudging to finite/small time windows. It consists in alternately solving the model forwards and backwards in time, with a nudging term in both cases, over the assimilation window.

These approaches can be extended to more complex observers, for which non-observed variables can also be corrected with observed ones. We will give an overview of nudging, observers, and backward-forward algorithms, with applications to oceanography and fluid dynamics, for state and/or parameter estimation

Scalable Frameworks for Industrial-Scale Language Models: Applications in Code Generation and Multilingual Document Intelligence

Shreya Goyal, Applied Scientist, Amazon Web Services

Monday, 24 November 2025, 2:00 PM - 3:00 PM

Online via Zoom

Abstract:

Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning and content generation, yet scaling them for domain-specific, production-grade applications requires robust system design. In this talk, I will discuss scalable architectures for industrial applications of LLMs through two industrial deployments: (1) Multi-step PLC Code Generation, a generative pipeline that translates user given input query and structured specifications into production-grade automation PLC code; and (2) Multilingual Document Narrative Generation, an end-to-end LLM-based framework built on an AWS-based stack to generate localized narrative reports from multimodal documents. The stack synthesizes narrative reports from multimodal documents while preserving context, structure, and localization. The discussion will highlight best practices for designing and deploying scalable LLM systems that bridge research innovation with real-world industrial impact.

Explainable XGBoost for Indian Meteorology

Kieran Hunt, University of Reading

Monday, 17 November 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

In this talk, I will present a single, explainable machine-learning framework – XGBoost with Shapley value attribution – which I will briefly introduce and then apply to two problems in Indian meteorology. In the first case, this explainable AI framework is applied to monsoon low-pressure systems (LPSs) in order to identify brand new hypotheses about their behaviour: preferential early-morning intensification coincident with the diurnal convection peak over ocean; suppression of further growth by vertical wind shear; a substantive role for large-scale barotropic instability in inland penetration and peak intensity; and propagation set by vortex depth, with shallow (deep) LPSs steered by low- (mid-)-level winds. In the second case, state-level, population-weighted models are used to predict daily electricity demand from weather, achieving high skill (half of states r^2>0.8). Shapley analysis is then used to quantify the effects of weekdays/holidays, overnight minimum temperature and longer-term means, as well as threshold responses. Extending with reanalysis (1979–2023) reveals the largest demand–renewables deficits occur during/after monsoon withdrawal (Sept–Oct).

Learning Human Preferences: From Clicks to Conversations

Suryanarayana Sankagiri, EPFL, Switzerland

Wednesday, 12 November 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

People routinely reveal their preferences online, e.g., when choosing search results, videos, or products. Such data is used by algorithms to learn human tastes. Recently, curated datasets of human preferences have been used to fine-tune language models, substantially improving their alignment with human intent. These successes raise a natural question: can recommender systems learn more effectively from comparisons rather than ratings? The talk will trace a path from basic models of choice behaviour to new frameworks for recommender systems. The main focus will be on our theoretical result showing that personalised recommendations can be learned efficiently from comparison data, despite the underlying optimisation problem being nonconvex. I will then describe a bandit formulation that addresses the classical exploration-exploitation trade-off in a novel way. Finally, I’ll share empirical insights motivating richer models of human choice. I will conclude by arguing that learning from human preferences is key to building interactive AI systems that reliably serve human needs.

Refined transcription factor DNA-binding motif discovery from pangenomic ChIP-seq, ATAC-seq and similar datasets

Denis Thieffry, PSL University, Paris, France

Wednesday, 29 October 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

The development of high-throughput sequencing (HTS) techniques has opened up new avenues

for identifying, modelling and predicting DNA motifs bound by transcription factors (TFs). On

the one hand, provided that a good antibody is available, chromatin immunoprecipitation

assays coupled with HTS (ChIP-seq) can capture most TF-bound sequences in a given cell type

or tissue at the genomic scale. On the other hand, epigenomic assays, including whole-genome

bisulphite sequencing (WGBS) and combinations of ChIP-seq assays targeting chromatin

marks, can be used to identify potential promoter and enhancer regions.

Using these datasets, various types of computational analyses can be performed to deduce

potentially related transcription factors. The most common approach is to analyse putative cis-

regulatory sequences (promoters or enhancers) using collections of probabilistic models of

transcription factor binding sites, typically in the form of position weight matrices (PWMs),

which can be found in public databases such as JASPAR (https://jaspar.elixir.no/). However,

this approach is inherently limited by the quality of the available PWM sets.

Another approach is to apply pattern discovery algorithms to regions presumed to be co-

regulated, then compare the patterns obtained with public collections of PWMs. Pattern

discovery algorithms (e.g., Gibbs samplers, MEME) typically perform multiple local

alignments on a set of sequences, which requires pre-filtering and heuristic sampling to process

large sets (thousands) of sequences, at the risk of missing subtle variations in the patterns.

To overcome the shortcomings of these multiple alignment approaches, Jacques van Helden

initiated the development of a set of tools based on k-mer counting and multinomial statistics

to identify words that are overrepresented in large sequence datasets and to construct refined

PWMs (http://rsat.eu).

More recently, thanks to the accumulation of ChIP-seq data for various transcription factors,

combined with WGBS data, in the same well-established cell lines, it has become possible to

study in greater detail the impact of DNA methylation on transcription factor binding. By

combining ChIP-seq datasets targeting various dimeric transcription factor partners in the same

cell lines, Touati Benoukraf and collaborators were able to define refined PWMs for each

dimer, containing higher information content than the degenerate motifs encoded in public

databases. These refined motifs are now available in the MethMotif database

(https://methmotif.org), while a series of functions written in the R programming language,

grouped in the TFregulomeR package, is shared on github to ease the analysis of new ChIP-seq

and WGBS datasets (https://github.com/benoukraflab/TFregulomeR).

Machine Learning for Climate Modeling: Parameterizing Sub-Grid Fluxes for the Ocean Surface Boundary Layer

Aakash Sane, Princeton University

Monday, 29 September 2025, 3:00 PM - 4:00 PM

Online via Zoom

Abstract:

The ocean surface boundary layer (OSBL) plays a crucial role in the ocean by modulating the exchange of mass and energy between the atmosphere and ocean interior via vertical turbulent mixing. The processes driving this mixing cannot be resolved in ocean models, necessitating the use of parameterizations that are uncertain. I will describe improvements in an existing energetics based parameterization of vertical mixing for the OSBL in the NOAA-GFDL's model. I will demonstrate how neural networks, trained to predict the eddy diffusivity profile from high-fidelity and expensive turbulence schemes, enhances the mixing scheme in the model. The enhanced scheme reduces biases in the mixed-layer depth and modestly improves the tropical upper ocean stratification in ocean-only global simulations. Interpretable equations that replace neural networks achieve similar improvements at lower computational cost, demonstrating the successful application of machine learning to improve a sub-grid parameterization of turbulent mixing in ocean climate models.

Combining Signal Processing and AI for Cognitive State Monitoring and Digital Health

Abhishek Tiwari, University of Quebec, Gatineau

Wednesday, 24 September 2025, 4:00 PM - 5:00 PM

Online via Zoom

Abstract:

Physiological signals collected from wearable devices, when combined with machine learning, offer a cost-effective pathway to advance both cognitive state monitoring and digital health applications. However, deploying such systems in real-world settings introduces significant challenges, including noisy and missing signals, variability across users, and the high cost of data collection and model training in healthcare contexts. This talk will highlight methods developed to address these challenges, including: (i) enhancing noise-robustness of signal representations through quality metrics, (ii) integrating multiple sensor modalities, (iii) extracting additional insights via feature engineering, and (iv) incorporating contextual information—such as physical activity, posture, and location—into AI models. Together, these strategies illustrate how combining signal processing with machine learning can lead to more reliable, practical, and scalable digital health solutions

A random dynamical systems perspective on flow-based and score-based generative models

Nisha Chandramoorthy, University of Chicago

Monday, 15 September 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

A curious phenomenon observed in some dynamical generative models is the following: despite learning errors in the score function or the drift vector field, the generated samples appear to shift along the support of the data distribution but not away from it. We investigate this phenomenon of robustness of the support by taking a dynamical systems approach on the generating stochastic/deterministic process. Our perturbation analysis of the probability flow reveals that infinitesimal learning errors cause the predicted density to be different from the target density only on the data manifold for a wide class of generative models. Further, we ask, what is the dynamical mechanism that leads to the robustness of the support? Using a finite-time linear perturbation analysis on samples paths as well as probability flows, our work complements and extends existing works on obtaining theoretical guarantees for generative models from a stochastic analysis, statistical learning and uncertainty quantification points of view. Our results motivate further applications of dynamical systems theory on generative models, such as control for rare event sampling.

Data and Physics Guided Deep Learning for Complex Systems

Sumant Kumar, Sorbonne University Abu Dhabi, UAE

Wednesday, 6 August 2025, 3:00 PM - 4:00 PM

Online via Zoom

Abstract:

Real world systems in sustainable energy, engineering, and other scientific domains are often governed by complex,non-linear, physics-based mathematical models. However, these traditional models frequently face challenges relatedto accuracy and interpretability. Modeling such complex physical systems such as convective heat and fluid flowphenomena in porous media requires an integrity of physics and data to achieve more reliable and accurate model development. In this research talk, I will present developments of data and physics guided deep learning in industrial and other scientific domains. These hybrid techniques offer accurate and efficient methods for developing surrogate models that significantly reduce computational cost while preserving the interpretability of multiphysics influence on the complex system, particularly in areas such as solar power collectors and subsurface flow operations (oil production). This presentation will highlight ongoing efforts to enhance model generalization under a limited data scenario and to design scientifically grounded, computationally tractable models. Collectively, these works demonstrates a pathway toward building reliable, scalable, and interpretable data-driven solutions for computationalmodeling and engineering applications.

Learning Operators for Medical Imaging: Neural Networks, Bayesian Inversion, and Beyond

Anuj Abhishek, Case Western Reserve University

Wednesday, 23 July 2025, 4:15 PM - 5:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

In this talk, we explore the use of neural operator architectures—such as Deep Operator Networks (DeepONets) and Convolutional Neural Operators (CNOs)—for approximating operators that arise in medical imaging. While traditionally applied to mappings between function spaces, we show that these models can also approximate operators between more general Banach spaces, which naturally occur in imaging modalities like Electrical Impedance Tomography (EIT), Diffuse Optical Tomography (DOT), and Quantitative Photoacoustic Tomography (QPAT).

Building on recent theoretical advances, we present universal approximation theorems for two commonly used neural operator implementations tailored to these settings. We then demonstrate how these operator learning techniques can be employed for direct inversion of imaging problems, as well as for constructing fast surrogate models for likelihood evaluation in Bayesian inversion frameworks.

A particularly exciting application is the integration of these neural surrogates into MCMC algorithms, where we observe significant acceleration in posterior sampling and reconstruction. If time permits, I will also discuss recent results on end-to-end Bayesian inversion frameworks that incorporate generative models to approximate complex, non-Gaussian priors—leading to further improvements in efficiency and reconstruction quality.

This is based on joint works with Thilo Strauss (Xi’an Jiaotong-Liverpool University), Taufiquar Khan and Sudeb Majee (UNC Charlotte), and Sidharth Jindal (CWRU).

Structured Learning with Batched Bandits: From k-Nearest Neighbors to Single-Index Models

Sakshi Arya, Case Western Reserve University

Wednesday, 23 July 2025, 3:15 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

The multi-armed bandit (MAB) framework is a cornerstone of sequential decision-making, in which a decision-maker selects actions over time to maximize cumulative rewards. However, in many real-world applications such as clinical trials and adaptive experiments, data are naturally collected in batches rather than one observation at a time. For instance, clinical trials often allocate treatments to groups of patients in discrete phases, requiring policies that adapt only after each batch of outcomes is observed.

In this talk, we present algorithmic advances for batched bandits in both nonparametric and semiparametric regimes. For the nonparametric setting, we propose a k-nearest neighbor (kNN) based algorithm that flexibly adapts to the geometry of the context space. In the semiparametric regime, we introduce a single-index model approach that leverages successive binning and arm elimination, enabling both interpretability and statistical efficiency. We provide minimax-optimal regret guarantees for each setting and demonstrate the advantages of these methods through comprehensive simulations and real-data analyses. This talk includes joint work with Prof. Hyebin Song (Penn State University).

Persistence of stationary Gaussian processes, Generalized Restricted Isometry for sparse recovery, and Betti numbers of Gaussian excursions

Sunder Ram Krishnan, Amrita Vishwa Vidyapeetham, Amritapuri

Wednesday, 16 July 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

First, we will consider the problem of persistence of stationary real valued Gaussian processes and estimate the probability that it does not cross zero in a long interval. We will see that the behaviour of this probability depends on the nature of the spectral measure near origin. Next, we will define a generalized notion of the Restricted Isometry Property (RIP) in sparse recovery and estimate the number of rows needed for a random matrix with symmetric alpha stable entries to satisfy the same with high probability. The result highlights certain limitations of the RIP framework. Time permitting, we will also touch upon limit theorems describing the phase transitions in the Betti numbers of Gaussian process excursions.

Scientific Computing with Neural Surrogates: Hype or Hope ?

Ameya D. Jagtap, Worcester Polytechnic Institute (WPI), USA

Wednesday, 2 July 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

Classical numerical methods have long been the foundation of scientific computing, enabling the solution of complex physical systems. However, these methods often require precise knowledge of governing equations, boundary and initial conditions, and entail computationally intensive steps such as mesh generation and high-fidelity simulations. Moreover, they struggle with high-dimensional and parameterized partial differential equations (PDEs), limiting their scalability and real-time applicability. Physics-informed neural surrogates have emerged as a compelling alternative, offering data-efficient and physics-aware modeling strategies. This talk critically evaluates the effectiveness of neural surrogate models, with a focus on physics-informed neural networks (PINNs) and their recent extensions for large-scale and data-rich problems. We highlight the strengths and current limitations of these models in capturing complex physical phenomena. In particular, we discuss different performance improvement techniques in PINNs. Furthermore, we explore the transformative role of deep operator networks (neural architectures that learn mappings between infinite-dimensional function spaces) and showcase their ability to solve high-dimensional PDEs more efficiently than traditional solvers. Several applications are presented to illustrate the potential of operator learning in addressing nonlinear, multiscale systems. Overall, this talk offers a balanced perspective on the current landscape: Are neural surrogates the next frontier in scientific computing, or just another wave of hype? We aim to uncover whether the hope behind these methods is truly justified.

Democratizing Lifelike Avatars

Kiran Chhatre, KTH Royal Institute of Technology, Sweden

Wednesday, 18 June 2025, 3:00 PM - 4:00 PM

Seminar room 51, 4th Floor, Main Building, IISER Pune

Abstract:

In this seminar, I will explore how to combine real‐world data with synthetic data to create lifelike, emotionally expressive 3D avatars. I will discuss how we generate 3D synthetic data from video and motion capture to enrich behavior modeling, leading to controllable speech‐driven gestures and facial animations. Through VR‐based user studies, we find that affective states and animation diversity significantly enhance user engagement and presence. Finally, our goal is to democratize avatar creation, enabling more natural human–avatar interactions in immersive environments.

Rhythmic Neural Oscillations Underlying Birdsong

Ana Amador, University of Buenos Aires, Argentina

Friday, 25 April 2025, 4:00 PM - 5:00 PM

Seminar room 34, 2nd Floor, Main Building, IISER Pune

Abstract:

Birdsong is a complex motor activity that arises from the interaction between the central and peripheral nervous systems, the body, and the environment. Its striking similarities to human speech, both in production and learning, make songbirds powerful animal models for studying learned motor skills. In this talk, I will present neuronal recordings from a telencephalic region involved in sensorimotor integration, revealing well-defined oscillations in local field potentials (LFP) synchronized with the rhythmic structure of canary (Serinus canaria) song. I will also discuss the relation between individual neural activity, LFP and the behavior.

Listening to a Bird’s Dream: Probing Vocal Replay in the Sleeping Brain

Gabriel Mindlin, University of Buenos Aires, Argentina

Friday, 25 April 2025, 11:30 AM - 12:30 PM

Seminar Hall 51, 4th floor, Main Building, IISER Pune

Abstract:

Birdsong stands as one of the most sophisticated motor behaviors in the animal kingdom, requiring precise coordination between respiration, vocal muscle control, and neural timing. In this talk, I will present our recent efforts to eavesdrop on the dreams of birds, using a combination of electrophysiology, biomechanical modeling, and sound synthesis to decode vocal motor activity during sleep. By recording from the syringeal muscles and brain regions of both Taeniopygia guttata (zebra finch), a learned songbird, and Pitangus sulphuratus (great kiskadee), an innate caller, we explore the presence of sleep-associated motor replay. Our findings show that in the zebra finch, fragmented yet structured muscle activations during sleep resemble elements of daytime song—suggestive of silent rehearsal. In contrast, the kiskadee exhibits repetitive and simple vocal gestures during sleep. These results raise fundamental questions about memory consolidation, the role of REM and non-REM sleep in vocal learning, and the limits of interspecies dream interpretation. By bridging neuroscience, physics, and ethology, this work invites a new perspective on how animals may relive, reshape, or even rehearse their vocal expressions in the quiet hours of sleep.

4D Varnet: a neural network model for data assimilation

Shashank Kumar Roy, IMT Atlantique

Wednesday, 26th February 2025, 3:00 PM - 4:00 PM

Seminar Hall 51, 4th floor, Main Building, IISER Pune

Abstract:

Data assimilation combines numerical models of chaotic systems with sparse and noisy observations to estimate the system's hidden state. The 4DVar algorithm is a state-of-the-art classical data assimilation method used in numerical weather prediction for geophysical data. A crucial assumption is that the model state represented by the model in 4DVar is close to the true state corresponds to the minimizer of the 4DVar cost function. Using a single-layer quasi-geostrophic (QG) model, I present scenarios where this assumption breaks down, particularly in the presence of model errors and suboptimal initializations, demonstrating the sensitivity of 4DVar. I will further introduce a recent deep learning-based model called - 4DVarNet, an end-to-end neural network based on variational data assimilation and supervised learning. In the end, I will discuss the possibility of a hybrid model using physics and neural networks that can take advantage of trainable solvers for the learned 4DVar optimization problems.

Introduction Randomized Linear Algebra (two day short course)

S. Lakshmivarahan, University of Oklahoma

Wednesday, 5 February 2025 and Thursday 6 February 2025, 3:00 PM - 4:00 PM

Seminar Hall 51, 4th floor, Main Building, IISER Pune

Abstract:

Large scale matrix problems naturally arise in many applications - image processing, text processing, etc. This series of two lectures will provide an overview of the basic ideas relating to solving many of the standard problems - matrix-vector multiply, matrix-matrix multiply, sketch or a low rank approximation of a matrix, approximating the range space of a matrix, etc. using randomized algorithms. First is the data dependent approach based on importance sampling and second is the data independent approach based on random projection. We will discuss two ways of approximating the solution to large scale linear least squares problems.

Mathematics for big data analytics

S. Lakshmivarahan, University of Oklahoma

Wednesday, 5 February 2025, 3:00 PM - 4:00 PM

Seminar Hall 51, 4th floor, Main Building, IISER Pune

Abstract:

We are moving steadily from data sparse to data rich regimes, thanks for the confluence of sensor, wireless communication, large storage and computing technologies. This talk will present the impact/curse of high dimension - creation of empty space, concentration of mass, distances and their impact in analysis.

Scalability of Blockchain Systems

Kalpesh Kapoor, IIT Guwahati and IISER Pune

Wednesday, 22nd January 2025, 3:00 PM - 4:00 PM

Seminar Hall 51, 4th floor, Main Building, IISER Pune

Abstract:

This seminar will explore the potential of blockchain technology for data management. We will begin with an overview of key system properties and requirements for data management systems. Next, we will delve into the core concepts of blockchain technology, examining its decentralized and trustless architecture. By comparing blockchain with traditional centralized systems, we will highlight the fundamental trade-offs between decentralization, security, and performance. A second part of the discussion will focus on the scalability challenges facing current blockchain systems. To address the challenge, we'll review a range of potential solutions, starting with Layer 1 protocol-level enhancements. We'll then discuss Layer 2 off-chain mechanisms, with a particular focus on the Lightning Network as a potential solution. Finally, we'll discuss advanced routing algorithms designed to optimize transaction flow within payment channel networks. These algorithms aim to maximize concurrent transactions while addressing issues such as link saturation and channel exhaustion, ultimately improving the scalability and efficiency of a blockchain-based system.

Mapping India’s Decarbonization Pathways And The Way Forward: An Input-Output Framework

Kakali Mukhopadhyay, McGill University, Montreal

Wednesday, 8th January 2025, 3:00 PM - 4:00 PM

Seminar Room No 24, 1st floor, Main Building, IISER Pune

Abstract:

In the recently held COP29 event in Baku, one of the key takeaways that received mutual consensus among 200 countries was the urgency of developing equitable national climate plans and transitioning away from fossil fuels to remain on track to achieve 1.5°C global warming. This process has been ongoing over the past decade, aligned with the SDGs 7 and 13. India has made several efforts toward sectoral decarbonization strategies such as achieving 500 GW renewable energy capacity by 2030, 20% ethanol blending by 2025 and 30% share of EV sales by 2030. Results indicate that solar and wind energy have minimal economy-wide contribution, while the solid waste generation in the end-of-life phase has the potential to recycle, reuse and reinstall solar and wind capacity, through the circular economy framework. The transition to EVs and E20 blended petrol in the transport sector has a positive macroeconomic impact due to its strong backward linkages. Addressing climate change involves immediate to mid-term financial expenses associated with technology transfers, development programs, and local manufacturing initiatives, among other factors. However, in the long run, the nation is expected to reap the benefits of alternative decarbonization pathways that will serve as a blueprint for countries in the 'Global South' region which are undergoing a similar energy transition trajectory.

Page updated

Report abuse