TDA for Data Scientists

Most introductions to topological data analysis stop at the picture: "look, persistence diagrams are pretty." This one does not. By the end of day three you will have built a TDA pipeline end-to-end on real data, understood why the stability theorem matters in practice, and produced a vectorization that goes into a downstream classifier you trained yourself.

The training is for data scientists who already work with data and want a tool that complements (not replaces) the methods they already use. We assume you can read mathematical notation when you need to, and we expect you to write code in every session. You will leave with a working repository, a graded project, and a defensible answer to the question your team will ask: "okay, but when does this actually help?"

Program Overview

Three consecutive days in Cotonou, six half-day sessions, every session anchored in code. We build the standard TDA pipeline — point clouds → filtrations → persistent homology → vectorization → downstream model — on the libraries the field actually uses: GUDHI, Ripser, and giotto-tda. No survey lectures; every concept lands with a piece of code you run yourself.

The cohort is capped at roughly 12 participants so that every project gets a real defense in front of the instructors on the third afternoon. Materials, code, problem sets, and reading list are shared two weeks before and stay open-source after the cohort.

Program structure

Day 1 — From point clouds to filtrations. Simplicial complexes by hand; Vietoris–Rips, Čech, alpha. Persistent homology computed by hand on a 6-point example, then in GUDHI and Ripser. First exercises on noisy circles, the double torus, an MNIST projection.
Day 2 — Stability, interpretation, vectorization. Bottleneck and Wasserstein distances, the stability theorem in its formal and practitioner's forms, common failure modes. Persistence images, landscapes, Betti curves — plugged into scikit-learn via giotto-tda.
Day 3 — TDA on real data, final project. Three case studies (time-series volatility, transaction-network anomalies, image microstructure), each with an honest non-topological baseline. Afternoon: 30-minute defense of a final project in front of the instructors.

Faculty

Lead instructor: Yaé Ulrich Gaba, AIRINA Labs director, mathematician working in topology, geometry, and applied TDA. Co-author of The Shape of Data (No Starch Press) and 23+ papers across topology, TDA, and applied mathematics. Visiting lecturer at AIMS South Africa, AIMS Senegal, and AIMS Rwanda. Teaching assistant: a senior AIRINA researcher or affiliated AIMS alumnus, named two weeks before each cohort.

Certificate

The certificate — AIRINA TDA Foundations · graded — is issued contingent on a passing final project: a working pipeline, a clear topological justification for the design choices, and a defensible answer on whether TDA was actually warranted on the chosen dataset.

Learning Outcomes

By the end of the program, participants will be able to:

Build Vietoris–Rips, Čech, and alpha complexes from point-cloud data, and explain why they chose one over the others for a given dataset.
Compute persistent homology with GUDHI and Ripser, read persistence diagrams and barcodes, and recognize artifacts of bad parameter choices.
State the stability theorem in their own words, and explain what it does and does not guarantee for downstream ML.
Vectorize persistence diagrams into ML-ready features using persistence images, landscapes, and Betti curves — and pick between them based on the downstream model.
Run a TDA pipeline in giotto-tda from raw data to a scikit-learn classifier, and benchmark it honestly against a non-topological baseline.
Identify cases where TDA is the right tool and cases where it is not. Both happen.

Program curriculum

Day 1 · AM · Simplicial complexes and filtrations

Simplicial complexes by hand on a whiteboard. Vietoris–Rips, Čech, and alpha — what each one captures and where each one breaks. Live-coding a Rips filtration on a toy 2D dataset; the moment when the first hole appears.

Day 1 · PM · Persistent homology in code

Persistent homology computed by hand on a 6-point example. Then the same computation in GUDHI and Ripser, and a discussion of why Ripser is fast. Reading persistence diagrams and barcodes. First exercises on real point clouds: noisy circles, the double torus, an MNIST projection.

Day 2 · AM · The stability theorem and what it buys you

The bottleneck distance, the Wasserstein distance on diagrams. Statement of the stability theorem in two ways — the formal one and the practitioner's one. Common failure modes: ignoring infinite bars, over-trusting short bars, conflating noise with signal. Diagnostic plots you should always look at.

Day 2 · PM · Vectorization for ML pipelines

Persistence images, landscapes, Betti curves, and persistence signatures — what each one is, what it preserves, and when to pick which. Plugging vectorizations into scikit-learn pipelines via giotto-tda. Hyperparameter sensitivity: what to grid-search and what to leave fixed.

Day 3 · AM · Three real-world case studies

Time series with cubical filtrations (volatility regime detection); graphs with persistent homology of weighted graphs (transaction-network anomalies); images with sublevel-set filtrations (texture and microstructure classification). Each case study with its honest baseline and a discussion of whether TDA actually helped.

Day 3 · PM · Final project + defense

Participants pick from three provided datasets (or bring their own, with prior approval) and build an end-to-end TDA pipeline. 30-minute defense in front of the instructors. Constructive criticism, suggestions for follow-on work, certificate issued contingent on a passing project.

Who Should Attend

This training is for working practitioners and graduate-level researchers who want a tool that complements the methods they already use — not a survey lecture.

Data scientists, statisticians, and ML engineers at banks, microfinance institutions, mobile-money operators, telcos, and other financial-sector employers.
Graduate students with a quantitative background — AIMS alumni, master's and PhD students in mathematics, statistics, computer science.
Researchers from adjacent fields (computational biology, materials science, neuroscience) who want a working introduction with code, not a survey lecture.

Prerequisites

Working Python. Comfortable with NumPy, pandas, scikit-learn at the level of "I can train a random forest on a CSV without looking it up."
Mathematics. Calculus through multivariate, linear algebra through eigendecomposition. We will not derive anything that needs more.
No prior topology required. We build what we need from scratch in the first morning.
Laptop. Your own laptop with a working Python environment. Setup instructions sent two weeks before the cohort.

Selection

Cohort capped at ~12 participants; minimum 8 for the cohort to run. Partner-site delivery is available for cohorts of 8+ from a single institution. Selection priority is given to applicants from the BCEAO zone and to underrepresented groups in technical fields.

Brochure

The detailed program brochure (PDF, EN/FR) is sent on request — including the full day-by-day curriculum, reading list, project briefs, and the cohort calendar.

To receive the current brochure, write to contact@airina.africa with "TDA for Data Scientists — brochure request" in the subject. The brochure is updated each cohort; we send the version current at the time of your request.