what does this have to do with data science? Does this guy even know any data sc...

pizza · on Aug 25, 2023

Algebraic topology is a mathematical toolkit for analyzing connectedness/structure. Pretty applicable to data science.

wenc · on Aug 25, 2023

> Pretty applicable to data science.

Could you expand on that?

Maybe I'm missing something but I browsed the PDF and am having trouble finding anything that would be deployed to a production system or produced as part of an analysis in a data science workflow.

I think there some very niche applications in manifold learning (e.g. UMAP) that is somewhat useful, but in my decade in this space, people have tried to find topology applications but ultimately we end up going back to the basic tools of statistics and machine learning.

pizza · on Aug 25, 2023

If you want to optimize parameters to maximize a function, calculus is a natural process for that. If you want to treat data as having richer structure than just a point cloud in a metric space, topology is a natural process for that. Network characterization (eg community/bottleneck/robustness measurement) is not that niche imo, but I guess data science defaults to meaning calculus because calculus is fast and cheap to compute compared to topological methods.

There are classes of answers that I don’t see how you could find them without resorting to something like TDA. Stuff like characterizing the Betti numbers of neuronal circuits to profile signal routing redundancy. The economics of applying TDA at scale (at least at one that does not require eg nation-state level compute) don’t work well atm I think: small problems will quickly kill the beefiest cpu/ram combo you can find if you try to do basic persistent homology on them, so either you scale your compute like crazy for results that probably won’t give you jackpot margins wrt competitors, or just do whatever is cheap and quick and good enough.

A remark: I think tda is in an era not too unlike deep learning after the proliferation of backprop and little deep nets, but before backprop-on-gpus. People said all kinds of things about how deep learning was a gimmick, was too impractical to apply to real world problems, etc. I remain curious either way.

wenc · on Aug 25, 2023

That’s a fair point particularly on the classes of answers that it solves. I’m not so much interested in the “how” because we’ll always find a way of there is a “why”.

There are many applications of computational geometry today (which is not in data science but more engineering). If we can we can work with topological objects I can see we might be able to find areas where we can derive value.

NN is a little different because it has always had a very strong why (it’s a very flexible, highly parameterized nonlinear regression model — a fitting function for everything) but was hampered by the how for many years.

Topology’s whys are a bit less universal but I can see some very useful future applications.

moravak1984 · on Aug 25, 2023

Well, you are not alone. As a Math PhD, I want to believe that there is something less trivial than glorified linear regression to this whole AI/ML mess, but well, there isn't... It's non-linear transforms + holy Mary's all the way.

wilseypa · on Aug 25, 2023

there are a decent set of publications describing data analysis with tda/ph; a source with citations to several can be found in:

Otter, N., Porter, M.A., Tillmann, U. et al. A roadmap for the computation of persistent homology. EPJ Data Sci. 6, 17 (2017). https://doi.org/10.1140/epjds/s13688-017-0109-5