Topological Data Analysis: Developments and Applications

Topological Data Analysis (TDA) and its mainstay computational device, persistent homology (PH), has established a strong track record of providing researchers across the data-driven sciences with new insights and methodologies by characterizing low-dimensional geometric structures in high-dimensional data. When combined with machine learning (ML) methods, PH is valued as a discriminating-feature extraction tool. This work highlights many of the recent successes at the intersection of TDA and ML, introduces some of the foundational mathematics underpinning TDA, and summarizes the efforts to strengthen the bridge between TDA and ML. Thus, this document is a launching point for experimentalists and theoreticians to consider what can be learned from the shape of their data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Persistent Homology Techniques for Big Data and Machine Intelligence: A Survey

Persistent Homology in Data Science

Topology, Big Data and Optimization

Notes

Properties preserved under homeomorphism: a continuous bijective function with continuous inverse.

The sublevel sets of a real-valued function, \(f: X \rightarrow \mathbb\) , are the subsets f −1 ((−∞, a]) = x ∈ X | f(x) ≤ a> ⊂ X.

The level sets of a real-valued function, \(f: X \rightarrow \mathbb\) , are the subsets f −1 (a>) = x ∈ X | f(x) = a> ⊂ X.

A set of vectors \(\ \subset \mathbb^\) is affinely independent if the set v _i − v ₀ | i = 1, …, n> is linearly independent. Often called the 1-skeleton. The trivial vector space over \(\mathbb\) consisting only of the 0 vector.

More generally, the sets of n-chains may be defined to be the free abelian groups with coefficients taken from a commutative ring. In this setting the boundary maps are homomorphisms (Hatcher, 2002).

If \(\mathbb\) is chosen to be a commutative ring, the boundaries and cycles form subgroups which explains the terminology homology groups.

References

Adams, Henry, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. 2017. Persistence images: A stable vector representation of persistent homology. Journal of Machine Learning Research 18: 1–35 Google Scholar
Adcock, Aaron, Erik Carlsson, and Gunnar Carlsson. 2016. The ring of algebraic functions on persistence bar codes. Homology, Homotopy and Applications 18(1): 381–402. ArticleGoogle Scholar
Allili, M., K. Mischaikow, and A. Tannenbaum. Oct 2001. Cubical homology and the topological classification of 2D and 3D imagery. In Proceedings 2001 international conference on image processing (Cat. No.01CH37205), vol. 2, 173–176. Google Scholar
Bendich, Paul, James S. Marron, Ezra Miller, Alex Pieloch, and Sean Skwerer. 2016. Persistent homology analysis of brain artery trees. The Annals of Applied Statistics 10(1): 198–218. ArticleGoogle Scholar
Borsuk, Karol. 1948. On the imbedding of systems of compacta in simplicial complexes. Fundamenta Mathematicae 35(1): 217–234. ArticleGoogle Scholar
Bubenik, Peter. 2015. Statistical topological data analysis using persistence landscapes. The Journal of Machine Learning Research 16(1): 77–102. Google Scholar
Burges, Christopher JC. 1998. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2: 121–167. ArticleGoogle Scholar
Carrière, Mathieu, Steve Y. Oudot, and Maks Ovsjanikov. 2015. Stable topological signatures for points on 3D shapes. Computer Graphics Forum 34: 1–12. ArticleGoogle Scholar
Chazal, Frédéric, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. 2009. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on computational geometry, SCG ’09, 237–246. New York, NY: ACM. Google Scholar
Chung Moo K., Peter Bubenik, and Peter T. Kim. 2009a. Persistence diagrams of cortical surface data. In Information processing in medical imaging, vol. 21, 386–397. Berlin: Springer. ChapterGoogle Scholar
Chung, Moo K., Vikas Singh, Peter T. Kim, Kim M. Dalton, and Richard J. Davidson. 2009b. Topological characterization of signal in brain images using min-max diagrams. In Medical image computing and computer-assisted intervention—MICCAI 2009, ed. Guang-Zhong Yang, David J. Hawkes, Daniel Rueckert, Alison Noble, and Chris Taylor, vol. 5762, 158–166. Berlin/Heidelberg: Springer. Google Scholar
Cohen-Steiner, David, Herbert Edelsbrunner, and John Harer. 2007. Stability of persistence diagrams. Discrete and Computational Geometry 37(1): 103–120. ArticleGoogle Scholar
Dabaghian, Yu, Facundo Memoli, Loren Frank, and Gunnar Carlsson. 2012. A topological paradigm for hippocampal spatial map formation using persistent homology. PLoS Computational Biology 8(8): e1002581. Google Scholar
de Silva, Vin, and Robert Ghrist. 2007. Coverage in sensor networks via persistent homology. Algebraic and Geometric Topology 7: 339–358. ArticleGoogle Scholar
Di Fabio, Barbara, and Massimo Ferri. 2015. Comparing persistence diagrams through complex vectors. In International conference on image analysis and processing 2015 part I, ed. Murino, V., and E. Puppo. Lecture Notes in Computer Science, vol. 9279, 294–305. Heidelberg: Springer. Google Scholar
Donatini, Pietro, Patrizio Frosini, and Alberto Lovato. 1998. Size functions for signature recognition. In SPIE’s international symposium on optical science, engineering, and instrumentation, 178–183. Google Scholar
Edelsbrunner, Herbert, and John Harer. 2008. Persistent homology—a survey. Contemporary Mathematics 453: 257–282. ArticleGoogle Scholar
Edelsbrunner, Herbert, and John Harer. 2010. Computational topology: an introduction. Providence, RI: American Mathematical Society. Google Scholar
Edwards, David A. 1975. The structure of superspace. In Studies in topology, ed. Nick M. Stavrakas and Keith R. Allen, 121–133. New York, NY: Academic. ChapterGoogle Scholar
Ferri, Massimo, Patrizio Frosini, Alberto Lovato, and Chiara Zambelli. 1997. Point selection: a new comparison scheme for size functions (with an application to monogram recognition). In Computer vision ACCV’98, 329–337. Berlin/Heidelberg: Springer. Google Scholar
Florek, K., J. Łukaszewicz, J. Perkal, H. Steinhaus, and S. Zubrzycki. 1951. Sur la liaison et la division des points d’un ensemble fini. Colloquium Mathematicae 2(3–4): 282–285. ArticleGoogle Scholar
Ghrist, Robert. 2008. Barcodes: The persistent topology of data. Bulletin of the American Mathematical Society 45(1): 61–75. ArticleGoogle Scholar
Giusti, Chad, Eva Pastalkova, Carina Curto, and Vladimir Itskov. 2015. Clique topology reveals intrinsic geometric structure in neural correlations. Proceedings of the National Academy of Sciences 112(44): 13455–13460. ArticleGoogle Scholar
Hatcher, Allen. 2002. Algebraic topology. Cambridge: Cambridge University Press. Google Scholar
Hiraoka, Yasuaki, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Matsue, and Yasumasa Nishiura. 2016. Hierarchical structures of amorphous solids characterized by persistent homology. Proceedings of the National Academy of Sciences 113(26): 7035–7040. ArticleGoogle Scholar
Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. 2008. Kernel methods in machine learning. Annals of Statistics 36(3): 1171–1220. ArticleGoogle Scholar
Hopcroft, John E., and Richard M. Karp. 1971. A N5/2 algorithm for maximum matchings in bipartite. In Proceedings of the 12th annual symposium on switching and automata theory (Swat 1971), SWAT ’71, 122–125. Washington, DC: IEEE Computer Society. ChapterGoogle Scholar
Kantz, Holger, and Thomas Schreiber. 1997. Nonlinear time series analysis. Cambridge Nonlinear Science Series. Cambridge/New York: Cambridge University Press. Originally published: 1997. Google Scholar
Kerber, Michael, Dmitriy Morozov, and Arnur Nigmetov. 2016. Geometry helps to compare persistence diagrams. In 2016 Proceedings of the eighteenth workshop on algorithm engineering and experiments (ALENEX), 103–112. Google Scholar
Kuhn, Harold W. 1955. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly 2: 83–97. ArticleGoogle Scholar
Kuo, Wei, Bill Wang, Cindy Bruyere, Tim Scheitlin, and Don Middleton. 2017. Hurricane Isabel data produced by the weather research and forecast (WRF) model. Courtesy of NCAR, and the U.S. National Science Foundation (NSF). http://www.vets.ucar.edu/vg/isabeldata/readme.html.
Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, Ray R., R.S. Vose, B.E. Gleason, and T.G. Houston. 2017. Global historical climatology network - daily (GHCN-daily), version 3.22. NOAA National Climatic Data Center. ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/readme.txt.
Mileyko, Yuriy, Sayan Mukherjee, and John Harer. 2011. Probability measures on the space of persistence diagrams. Inverse Problems 27(12): 124007. ArticleGoogle Scholar
Munch, Elizabeth, Katharine Turner, Paul Bendich, Sayan Mukherjee, Jonathan Mattingly, and John Harer. 2015. Probabilistic fréchet means for time varying persistence diagrams. Electronic Journal of Statistics 9(1): 1173–1204. ArticleGoogle Scholar
Munro, John, Peter Landecker, and Martin Gale. Goes n data book section 3. Technical Report 2, National Aeronautics and Space Administration, Feb 2005. Copyright ©2006 Boeing. Unpublished work. Google Scholar
Nicolau, Monica, Arnold J. Levine, and Gunnar Carlsson. 2011. Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108(17): 7265–7270. ArticleGoogle Scholar
Pachauri, Deepti, Christian Hinrichs, Moo K. Chung, Sterling C. Johnson, and Vikas Singh. 2011. Topology-based kernels with application to inference problems in Alzheimer’s disease. IEEE Transactions on Medical Imaging 30(10): 1760–1770. ArticleGoogle Scholar
Pearson, Daniel A., R. Mark Bradley, Francis C. Motta, and Patrick D. Shipman. Dec 2015. Producing nanodot arrays with improved hexagonal order by patterning surfaces before ion sputtering. Physical Review E 92: 062401. ArticleGoogle Scholar
Perea, Jose A., and John Harer. 2015. Sliding windows and persistence: An application of topological methods to signal analysis. Foundations of Computational Mathematics 15(3): 799–838. ArticleGoogle Scholar
Perea, Jose A., Anastasia Deckard, Steve B. Haase, and John Harer. 2015. Sw1pers: Sliding windows and 1-persistence scoring; discovering periodicity in gene expression time series data. BMC Bioinformatics 16(1): 257. ArticleGoogle Scholar
Reininghaus, Jan, Stefan Huber, Ulrich Bauer, and Roland Kwitt. 2015. A stable multi-scale kernel for topological machine learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4741–4748. Google Scholar
Rouse, David, Adam Watkins, David Porter, John Harer, Paul Bendich, Nate Strawn, Elizabeth Munch, Jonathan DeSena, Jesse Clarke, Jeffrey Gilbert, Peter Chin, and Andrew Newman. 2015. Feature-aided multiple hypothesis tracking using topological and statistical behavior classifiers. In Signal processing, sensor/information fusion, and target recognition XXIV, vol. 9474, 94740L–94740L–12. Google Scholar
Safavian, S.R., and D. Landgrebe. May 1991. A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man, and Cybernetics 21(3): 660–674. ArticleGoogle Scholar
Sibson, R. 1973. Slink: An optimally efficient algorithm for the single-link cluster method. The Computer Journal 16(1): 30–34. ArticleGoogle Scholar
Singh, Gurjeet, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L. Ringach. 2008. Topological analysis of population activity in visual cortex. Journal of Vision 8(8): 11. ArticleGoogle Scholar
Turner, Katharine, Yuriy Mileyko, Sayan Mukherjee, and John Harer. 2014. Fréchet means for distributions of persistence diagrams. Discrete and Computational Geometry 52(1): 44–70. ArticleGoogle Scholar
Venkataraman, Vinay, Karthikeyan Natesan Ramamurthy, and Pavan Turaga. Sept 2016. Persistent homology of attractors for action recognition. In 2016 IEEE international conference on image processing (ICIP), 4150–4154. Washington, DC: IEEE. ChapterGoogle Scholar
Wang, Yuan, Hernando Ombao, and Moo K. Chung. 2014. Persistence landscape of functional signal and its application to epileptic electroencaphalogram data (unpublished). Google Scholar
Wood, Peter John, Adrian P. Sheppard, and Vanessa Robins. 2011. Theory and algorithms for constructing discrete morse complexes from grayscale digital images. IEEE Transactions on Pattern Analysis and Machine Intelligence 33 (undefined): 1646–1658. Google Scholar
Zhang, Guoqiang Peter. 2000. Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 30(4): 451–462. ArticleGoogle Scholar
Zomorodian, Afra, and Gunnar Carlsson. 2005. Computing persistent homology. Discrete and Computational Geometry 33(2): 249–274. ArticleGoogle Scholar

Author information

Authors and Affiliations

Mathematics Department, Duke University, Durham, NC, USA Francis C. Motta

Francis C. Motta