Topological Data Analysis: Developments and Applications
Topological Data Analysis (TDA) and its mainstay computational device, persistent homology (PH), has established a strong track record of providing researchers across the data-driven sciences with new insights and methodologies by characterizing low-dimensional geometric structures in high-dimensional data. When combined with machine learning (ML) methods, PH is valued as a discriminating-feature extraction tool. This work highlights many of the recent successes at the intersection of TDA and ML, introduces some of the foundational mathematics underpinning TDA, and summarizes the efforts to strengthen the bridge between TDA and ML. Thus, this document is a launching point for experimentalists and theoreticians to consider what can be learned from the shape of their data.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Similar content being viewed by others
Persistent Homology Techniques for Big Data and Machine Intelligence: A Survey
Chapter © 2020
Persistent Homology in Data Science
Chapter © 2021
Topology, Big Data and Optimization
Chapter © 2016
Notes
Properties preserved under homeomorphism: a continuous bijective function with continuous inverse.
The sublevel sets of a real-valued function, \(f: X \rightarrow \mathbb\) , are the subsets f −1 ((−∞, a]) = x ∈ X | f(x) ≤ a> ⊂ X.
The level sets of a real-valued function, \(f: X \rightarrow \mathbb\) , are the subsets f −1 (a>) = x ∈ X | f(x) = a> ⊂ X.
A set of vectors \(\ \subset \mathbb^\) is affinely independent if the set v i − v 0 | i = 1, …, n> is linearly independent.
Often called the 1-skeleton.
The trivial vector space over \(\mathbb\) consisting only of the 0 vector.
More generally, the sets of n-chains may be defined to be the free abelian groups with coefficients taken from a commutative ring. In this setting the boundary maps are homomorphisms (Hatcher, 2002).
If \(\mathbb\) is chosen to be a commutative ring, the boundaries and cycles form subgroups which explains the terminology homology groups.
References
- Adams, Henry, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. 2017. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research 18: 1–35 Google Scholar
- Adcock, Aaron, Erik Carlsson, and Gunnar Carlsson. 2016. The ring of algebraic functions on persistence bar codes. Homology, Homotopy and Applications 18(1): 381–402. ArticleGoogle Scholar
- Allili, M., K. Mischaikow, and A. Tannenbaum. Oct 2001. Cubical homology and the topological classification of 2D and 3D imagery. In Proceedings 2001 international conference on image processing (Cat. No.01CH37205), vol. 2, 173–176. Google Scholar
- Bendich, Paul, James S. Marron, Ezra Miller, Alex Pieloch, and Sean Skwerer. 2016. Persistent homology analysis of brain artery trees. The Annals of Applied Statistics 10(1): 198–218. ArticleGoogle Scholar
- Borsuk, Karol. 1948. On the imbedding of systems of compacta in simplicial complexes. Fundamenta Mathematicae 35(1): 217–234. ArticleGoogle Scholar
- Bubenik, Peter. 2015. Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research 16(1): 77–102. Google Scholar
- Burges, Christopher JC. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2: 121–167. ArticleGoogle Scholar
- Carrière, Mathieu, Steve Y. Oudot, and Maks Ovsjanikov. 2015. Stable topological signatures for points on 3D shapes. Computer Graphics Forum 34: 1–12. ArticleGoogle Scholar
- Chazal, Frédéric, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. 2009. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on computational geometry, SCG ’09, 237–246. New York, NY: ACM. Google Scholar
- Chung Moo K., Peter Bubenik, and Peter T. Kim. 2009a. Persistence diagrams of cortical surface data. In Information processing in medical imaging, vol. 21, 386–397. Berlin: Springer. ChapterGoogle Scholar
- Chung, Moo K., Vikas Singh, Peter T. Kim, Kim M. Dalton, and Richard J. Davidson. 2009b. Topological characterization of signal in brain images using min-max diagrams. In Medical image computing and computer-assisted intervention—MICCAI 2009, ed. Guang-Zhong Yang, David J. Hawkes, Daniel Rueckert, Alison Noble, and Chris Taylor, vol. 5762, 158–166. Berlin/Heidelberg: Springer. Google Scholar
- Cohen-Steiner, David, Herbert Edelsbrunner, and John Harer. 2007. Stability of persistence diagrams. Discrete and Computational Geometry 37(1): 103–120. ArticleGoogle Scholar
- Dabaghian, Yu, Facundo Memoli, Loren Frank, and Gunnar Carlsson. 2012. A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology 8(8): e1002581. Google Scholar
- de Silva, Vin, and Robert Ghrist. 2007. Coverage in sensor networks via persistent homology. Algebraic and Geometric Topology 7: 339–358. ArticleGoogle Scholar
- Di Fabio, Barbara, and Massimo Ferri. 2015. Comparing persistence diagrams through complex vectors. In International conference on image analysis and processing 2015 part I, ed. Murino, V., and E. Puppo. Lecture Notes in Computer Science, vol. 9279, 294–305. Heidelberg: Springer. Google Scholar
- Donatini, Pietro, Patrizio Frosini, and Alberto Lovato. 1998. Size functions for signature recognition. In SPIE’s international symposium on optical science, engineering, and instrumentation, 178–183. Google Scholar
- Edelsbrunner, Herbert, and John Harer. 2008. Persistent homology—a survey. Contemporary Mathematics 453: 257–282. ArticleGoogle Scholar
- Edelsbrunner, Herbert, and John Harer. 2010. Computational topology: an introduction. Providence, RI: American Mathematical Society. Google Scholar
- Edwards, David A. 1975. The structure of superspace. In Studies in topology, ed. Nick M. Stavrakas and Keith R. Allen, 121–133. New York, NY: Academic. ChapterGoogle Scholar
- Ferri, Massimo, Patrizio Frosini, Alberto Lovato, and Chiara Zambelli. 1997. Point selection: a new comparison scheme for size functions (with an application to monogram recognition). In Computer vision ACCV’98, 329–337. Berlin/Heidelberg: Springer. Google Scholar
- Florek, K., J. Łukaszewicz, J. Perkal, H. Steinhaus, and S. Zubrzycki. 1951. Sur la liaison et la division des points d’un ensemble fini. Colloquium Mathematicae 2(3–4): 282–285. ArticleGoogle Scholar
- Ghrist, Robert. 2008. Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society 45(1): 61–75. ArticleGoogle Scholar
- Giusti, Chad, Eva Pastalkova, Carina Curto, and Vladimir Itskov. 2015. Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences 112(44): 13455–13460. ArticleGoogle Scholar
- Hatcher, Allen. 2002. Algebraic topology. Cambridge: Cambridge University Press. Google Scholar
- Hiraoka, Yasuaki, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, and Yasumasa Nishiura. 2016. Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences 113(26): 7035–7040. ArticleGoogle Scholar
- Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. 2008. Kernel methods in machine learning. Annals of Statistics 36(3): 1171–1220. ArticleGoogle Scholar
- Hopcroft, John E., and Richard M. Karp. 1971. A N5/2 algorithm for maximum matchings in bipartite. In Proceedings of the 12th annual symposium on switching and automata theory (Swat 1971), SWAT ’71, 122–125. Washington, DC: IEEE Computer Society. ChapterGoogle Scholar
- Kantz, Holger, and Thomas Schreiber. 1997. Nonlinear time series analysis. Cambridge Nonlinear Science Series. Cambridge/New York: Cambridge University Press. Originally published: 1997. Google Scholar
- Kerber, Michael, Dmitriy Morozov, and Arnur Nigmetov. 2016. Geometry helps to compare persistence diagrams. In 2016 Proceedings of the eighteenth workshop on algorithm engineering and experiments (ALENEX), 103–112. Google Scholar
- Kuhn, Harold W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2: 83–97. ArticleGoogle Scholar
- Kuo, Wei, Bill Wang, Cindy Bruyere, Tim Scheitlin, and Don Middleton. 2017. Hurricane Isabel data produced by the weather research and forecast (WRF) model. Courtesy of NCAR, and the U.S. National Science Foundation (NSF). http://www.vets.ucar.edu/vg/isabeldata/readme.html.
- Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, Ray R., R.S. Vose, B.E. Gleason, and T.G. Houston. 2017. Global historical climatology network - daily (GHCN-daily), version 3.22. NOAA National Climatic Data Center. ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt.
- Mileyko, Yuriy, Sayan Mukherjee, and John Harer. 2011. Probability measures on the space of persistence diagrams. Inverse Problems 27(12): 124007. ArticleGoogle Scholar
- Munch, Elizabeth, Katharine Turner, Paul Bendich, Sayan Mukherjee, Jonathan Mattingly, and John Harer. 2015. Probabilistic fréchet means for time varying persistence diagrams. Electronic Journal of Statistics 9(1): 1173–1204. ArticleGoogle Scholar
- Munro, John, Peter Landecker, and Martin Gale. Goes n data book section 3. Technical Report 2, National Aeronautics and Space Administration, Feb 2005. Copyright ©2006 Boeing. Unpublished work. Google Scholar
- Nicolau, Monica, Arnold J. Levine, and Gunnar Carlsson. 2011. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108(17): 7265–7270. ArticleGoogle Scholar
- Pachauri, Deepti, Christian Hinrichs, Moo K. Chung, Sterling C. Johnson, and Vikas Singh. 2011. Topology-based kernels with application to inference problems in Alzheimer’s disease. IEEE Transactions on Medical Imaging 30(10): 1760–1770. ArticleGoogle Scholar
- Pearson, Daniel A., R. Mark Bradley, Francis C. Motta, and Patrick D. Shipman. Dec 2015. Producing nanodot arrays with improved hexagonal order by patterning surfaces before ion sputtering. Physical Review E 92: 062401. ArticleGoogle Scholar
- Perea, Jose A., and John Harer. 2015. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics 15(3): 799–838. ArticleGoogle Scholar
- Perea, Jose A., Anastasia Deckard, Steve B. Haase, and John Harer. 2015. Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics 16(1): 257. ArticleGoogle Scholar
- Reininghaus, Jan, Stefan Huber, Ulrich Bauer, and Roland Kwitt. 2015. A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4741–4748. Google Scholar
- Rouse, David, Adam Watkins, David Porter, John Harer, Paul Bendich, Nate Strawn, Elizabeth Munch, Jonathan DeSena, Jesse Clarke, Jeffrey Gilbert, Peter Chin, and Andrew Newman. 2015. Feature-aided multiple hypothesis tracking using topological and statistical behavior classifiers. In Signal processing, sensor/information fusion, and target recognition XXIV, vol. 9474, 94740L–94740L–12. Google Scholar
- Safavian, S.R., and D. Landgrebe. May 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21(3): 660–674. ArticleGoogle Scholar
- Sibson, R. 1973. Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1): 30–34. ArticleGoogle Scholar
- Singh, Gurjeet, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L. Ringach. 2008. Topological analysis of population activity in visual cortex. Journal of Vision 8(8): 11. ArticleGoogle Scholar
- Turner, Katharine, Yuriy Mileyko, Sayan Mukherjee, and John Harer. 2014. Fréchet means for distributions of persistence diagrams. Discrete and Computational Geometry 52(1): 44–70. ArticleGoogle Scholar
- Venkataraman, Vinay, Karthikeyan Natesan Ramamurthy, and Pavan Turaga. Sept 2016. Persistent homology of attractors for action recognition. In 2016 IEEE international conference on image processing (ICIP), 4150–4154. Washington, DC: IEEE. ChapterGoogle Scholar
- Wang, Yuan, Hernando Ombao, and Moo K. Chung. 2014. Persistence landscape of functional signal and its application to epileptic electroencaphalogram data (unpublished). Google Scholar
- Wood, Peter John, Adrian P. Sheppard, and Vanessa Robins. 2011. Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (undefined): 1646–1658. Google Scholar
- Zhang, Guoqiang Peter. 2000. Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30(4): 451–462. ArticleGoogle Scholar
- Zomorodian, Afra, and Gunnar Carlsson. 2005. Computing persistent homology. Discrete and Computational Geometry 33(2): 249–274. ArticleGoogle Scholar
Author information
Authors and Affiliations
- Mathematics Department, Duke University, Durham, NC, USA Francis C. Motta
- Francis C. Motta