Synthetic Daisies: machine-learning

Showing posts with label machine-learning. Show all posts

December 23, 2022

Learning on Graphs (LoG) conference recap

The Learning on Graphs (LoG) conference took place from December 9-12 and featured a broad diversity of research on Graph Neural Networks (GNNs). GNNs [1] encompass a relatively new area of machine learning research which have a number of interesting connections to applied math and network science. The daily sessions (keynote talks and oral presentations), in addition to the seven workshop sessions, are available from the conference YouTube channel.

GNNs are a way to take data that yield graphical relationships in the real world and analyze then using the power of neural networks. While GNNs are specialized for problems that can be represented as a graph (discrete, interconnected systems), any problem with a set of complex geometric relationships is appropriate for GNNs. While the output of GNNs are typically embeddings (graph topologies embedded in the feature space), some problems require different approaches such as functions or more formal representations.

It is the analysis of these graphical relationships which make it a useful analytical approach. In all their forms, GNNs yield useful representations of graph data partly because they take into consideration the intrinsic symmetries of graphs, such as invariance and equivariance of graph topology with respect to a relabeling of the nodes [2]. Based on what was featured at LoG, GNNs had many potential applications in the biological arena, including precision medicine, drug discovery, and characterizing molecular systems (such as Stefan Gunnemann's (Technical University of Munich) talk in the Friday session).

GNNs can be evaluated using the isomorphism (or k-WL) test, which evaluates whether two graphs are isomorphic. Given that a graph can be drawn from the source data, the source data graph should be isomorphic with the output graph. The Weisfeiler-Lehman heuristic for graph isomorphism can be summarized in the 1-D case as the color refinement algorithm. A related issue in GNN research is algorithmic expressiveness. Expressivity is the breadth of ideas that can be represented and communicated using a particular type of representation. One current challenge of GNNs as they are applied to various problem domains is their ability to be functionally robust. One solution to this is by using GNNs as a generative model. Generating alternate graph representations allows us to use graphons [3], functions that capture different GNN topologies of the same type. The collection of graphs associated with a graphon can then be evaluated. Soledad Villar's (Johns Hopkins) presentation during the Sunday session featured an in-depth discussions of expressiveness and graphons as they relate to GNN performance.

GNNs can be combined with various analytical techniques traditionally used in complex network analysis. One of these involves the analysis of graphical models using tools from network science. These include the use of random graphs and stochastic block models to uncover the presence of topological structure and community formation, respectively. GNNs have ties to category theory as well. The cats.for.ai workshop (October 2022) featured applications of category theory to GNNs. In the Saturday session, Taco Cohen (Qualcomm AI) discussed how the techniques of category theory, monads in particular, can be applied to GNNs. GNNs can also form directed acyclic graphs (DAGs), which are amenable to causal models.

GNNs are constructed using a series of inferential techniques. One technique discussed at LoG is message passing neural networks (MPNNs). Discrete forward passes from node to node (along edges) allow for approximation of the true, original network topology to be reconstructed. MPNN is a standard technique that lends itself to a wide variety of problem domains. The MPNN approach [4] can be extended to directed multigraphs and other types of graphs that capture complex systems, but can suffer shortcomings such as over-smoothing, over-squashing and under-reaching. While message passing has been the standard in the GNN field, continuous methods using approaches inspired by differential geometry and algebraic topology might serve as powerful alternatives [5]. Aside from approximations of real-world networks and graph-like structures, we can also think of GNN outputs in terms of time (capturing delays) and space (capturing translations). GNNs are also well-suited to mapping problems from algorithmic domains, in particular dynamic programming [6].

GNNs are particularly useful for task-specific architectures. The DevoWorm group’s D-GNN work (DevoGraph) is an example of this, being specialized for embryogenetic image processing or capturing biological growth and differentiation processes. But GNNs can also engage in transfer learning, which is the transfer of learned information from one context to another. Successful graph transfer learning is characterized by the reproduction of a graph of a similar but different size, or problems that require changes in network size over time.

From "Do we need deep graph neural networks?" by Michael Bronstein, Towards Data Science, July 20, 2020.

Workshops

Several of the workshops were particularly interesting with respect to some of the points mentioned above. There were also a number of outstanding oral presentations and posters not discussed here, but are worth checking out in the daily session recordings or on OpenReview.

Neural Algorithmic Reasoning (video). GNNs serve as excellent processors (neural networks in latent space) that can be aligned with more traditional algorithms [7]. This recasts many optimization problems as neural representation learning, particularly in cases where optimization algorithms do not represent the system being analyzed in a realistic manner.

Expressive GNNs (video). This tutorial covers a range of techniques that can be used to increase the expressivity of GNNs. Borrowing from areas such as topological data analysis and group theory, there is great potential for a variety of highly effective strategies for improving GNN architectures for a host of problems.

Graph Rewiring (video, web). Graph rewiring is presented as a way to overcome the limitations of the MPNN approach. Rewiring is based on the reconstruction of graph edges from iterative adaptive sampling of the input data. There are a number of different techniques that allow us to evaluate edge relevance using techniques such as diffusion and spectral approaches.

GNNs on TensorFlow (video). This tutorial introduces nascent modelers to implementing their own GNN models in the open-source TF-GNN framework. The tutorial uses heterogeneous input data to show how to implement the GNN and deal with missing label and edge information.

References

[1] Sanchez-Lengeling, B., Reif, E., Pearce, A., and Wiltschko, A.B. (2021). A Gentle Introduction to Graph Neural Networks. Distill, doi:10.23915/distill.00033.

[2] Chen, Z., Villar, S., Chen, L., and Bruna, J. (2019). On the equivalence between graph isomorphism testing and function approximation with GNNs. Proceedings of Neural Information Processing Systems, 32.

[3] Ruiz, L., Chamon, L.F.O., and Ribeiro, A. (2020). Graphon Neural Networks and the Transferability of Graph Neural Networks. arXiv, 2006.03548.

[4] Heydari, S. and Livi, L. (2022). Message Passing Neural Networks for Hypergraphs. arXiv, 2203. 16995.

[5] Bronstein, M. (2022). Beyond Message Passing: a Physics-Inspired Paradigm for Graph Neural Networks. The Gradient, May 7.

[6] Dudzik, A. and Velickovic, P. (2022). Graph Neural Networks are Dynamic Programmers. arXiv, 2203.15544.

[7] Velickovic, P. and Blundell, C. (2021). Neural Algorithmic Reasoning. arXiv, 2105.02761.

June 15, 2022

Google Summer of Code 2022 in the OpenWorm Community (DevoWorm)

Welcome to Google Summer of Code 2022! I am pleased to announce that this year, we have two funded projects: D-GNNs and Digital Microspheres! These projects will both take place in conjunction with the DevoWorm part of the OpenWorm community. DevoWorm is an interdisciplinary group engaged in both computational and biological data analysis. We have weekly meetings on Jit.si, and are a part of the OpenWorm Foundation.

This year, we were able to fund two students per project. They will be working on complementary solutions to each problem, and we will see how far they get by the end of the Summer.

D-GNNs (Developmental Graph Neural Networks)

The description for this project is as follows:

Biological development features many different types of networks: neural connectomes, gene regulatory networks, interactome networks, and anatomical networks. Using cell tracking and high-resolution microscopy, we can reconstruct the origins of these networks in the early embryo. Building on our group's past work in deep learning and pre-trained models, we look to apply graph neural networks (GNNs) to developmental biological analysis.

The contributor will create graph embeddings that resemble actual biological networks found throughout development. Potential activities include growing graph embeddings using biological rules, differentiation of nodes in the network, and GNNs that generate different types of movement output based on movement seen in microscopy movies. The goal is to create a library of GNNs that can simulate developmental processes by analyzing time-series microscopy data.

When completed, D-GNNs will become part of the DevoWorm AI library. Ultimately, we will be integrating the GNN work with the DevoLearn (open-source pre-trained deep learning) software.

Jiahang Li

Jiahang Li is a first year MPhil candidate in Computing Department at Hong Kong Polytechnic University. His research interests cover graph representation learning and its applications. Jiahang's approach to the project is to provide a pipeline that converts microscopic video data of C. elegans and other organisms into graph structures, on which advanced network analysis techniques and graph neural networks will be employed to obtain high-level representation of embryogenesis and to solve applied problems.

Wataru Kawakami

Wataru is a student at Kyoto University with interests in Machine Learning (in particular Graph Neural Networks) and Neuroimaging.

Digital Microspheres

The description for this problem is as follows:

This project will build upon the specialized microscopy techniques to develop a shell composed of projected microscopy images, arranged to represent the full external surface of a sphere. This will allow us to create an atlas of the embryo’s outer surface, which in some species (e.g. Axolotl) enables us to have a novel perspective on neural development.

The contributor will build a computational tool that allows us to visualize 4D data derived from the surface of an Axolotl embryo. The spatial model and animation (4th dimension) of microscopy image data can be created in a 3-D modeling software of your choice.

This project is based on previous research by DevoWorm contributors Richard Gordon and Susan Crawford-Young. The flipping and ball microscopy research involve the design and fabrication of specialized microscopes to image embryos in a 4-D context (3 dimensions of space plus time).

Spherical Embryo Maps: Gordon, R. (2009). Google Embryo for Building Quantitative Understanding of an Embryo As It Builds Itself. II. Progress Toward an Embryo Surface Microscope. Biological Theory, 4, 396–412.

Flipping Microscopy: Crawford-Young, S., Dittapongpitch, S., Gordon, R., and Harrington, K. (2018). Acquisition and reconstruction of 4D surfaces of axolotl embryos with the flipping stage robotic microscope. Biosystems, 173, 214-220.

Ball Microscopy: Crawford-Young, S.J. and Young Williment, J.L. (2021). A ball microscope for viewing the entire surface of amphibian embryos. Biosystems, 208, 104498.

Karan Lohaan

Karan is a student at Amrita Vishwa Vidyapeetham University, and is a member of the AMFoss program there. He is interested in Machine Learning and Image Processing.

Harikrishna Pillai

I am Harikrishna pursuing my B.Tech in Computer Science and Artificial Intelligence from Amrita Vishwa Vidyapeetham University. I completed my schooling in Mumbai. I started with python as my first language and eventually developed interest for AI. Due to my interest in Android apps, I have done Android development in Kotlin. Also, I have been interested in open source for some time now and therefore, I wanted to start my open source journey with GSoC.

We also have two GSoC mentors for these projects: Bradly Alicea is a mentor for D-GNNs and Digital Microspheres, and Jesse Parent is a mentor for D-GNNs. Richard Gordon and Susan Crawford-Young are serving as collaborators for the Digital Microspheres project.

If you would like to check on their progress, please check out our weekly meetings available on our YouTube channel.

September 29, 2021

OpenWorm Annual Meeting 2021 (DevoWorm update)

This week we had our OpenWorm Annual Meeting for 2021, which featured administrative business as well as updates from our research groups and educational initiatives. Much activity going on inside of the OpenWorm Foundation -- join the OpenWorm Slack or follow OpenWorm on Twitter for more information. Below are the slides I presented on progress and the latest activities in the DevoWorm group. If anything looks interesting to you, and you would like to contribute, please let me know. Click on any slide to enlarge.

The last slide is in recognition of OpenWorm's 10th anniversary, or at least the first release of OpenWorm 10 years ago this month. Looking forward to what the next 10 years will bring!

But in terms of what the next month will bring (for DevoWorm), we are hosting our second annual Hacktoberfest! Check out README files in our pinned repositories on DevoLearn and devoworm/digital-bacillaria to get started!

May 2, 2021

DevoLearn (Open-source) Maintenance and Evangelism

2021 has been a busy year for the DevoLearn initiative. Not only has Mayukh Deb been busy maintaining and generating new versions of the DevoLearn pre-trained model, but I (Bradly Alicea) has been engaging in technology evangelism to advance awareness and involvement in the initiative. The DevoLearn pre-trained model software (for C. elegans embryogenesis) is now at version 0.3.0, and has garnered 12 contributors making 165 commits (largely since January 2021). Our involvement in Google Summer of Code has bolstered many of these contributions. While our popularity is currently limited, we are trying to spread the word.

To that end, we have presented two versions of a promotional talk on using DevoLearn for facilitating Computational Developmental Biology research and education. The first presentation (DevoLearn: Engaging learners with Computational Developmental Biology) is a flash talk given to the OSF Education Un-conference on Open Scholarship Practices in Education Research, held in February. The second presentation was a longer (15-minute) presentation to the INCF Assembly (DevoLearn: a platform for open Developmental Data Science, Machine Learning, and Education), held in April.

As for developing the broader platform, Ujjwal Singh and I will be working this Summer to develop algorithms for colony morphogenesis and behavior in the Diatom genus Bacillaria. This will be added to the platform in a manner similar to the DevoLearn pre-trained model. In addition to the software development activities, Mayukh, myself, and Krishna Katyal have been the main contributors to the DevoWorm Onboarding Guide. Looking forward to an exciting future!

Update, 8/19:

All of this work has paid off! From the #devolearn Slack channel (OpenWorm Slack team).

September 30, 2020

OpenWorm Annual Meeting -- DevoWorm Slides

Today we held the OpenWorm Annual Meeting, which is a time for the Board of Directors and Senior Contributors to meet and discuss the latest developments within the foundation (in this case, activities over the past 1.5 years). Overall, a very inspiring meeting! Here are the slides I presented on progress and the latest activities in the DevoWorm group. If anything looks interesting to you, and you would like to contribute, please let me know.

Click to enlarge.

October 30, 2019

Pre-trained Models for Developmental Biology

Authors: Bradly Alicea, Richard Gordon, Abraham Kohrmann, Jesse Parent, Vinay Varma

This content is cross-posted to The Node Developmental Biology blog.

Our virtual discussion group (DevoWormML) has been exploring a number of topics related to the use of pre-trained models in machine learning (specifically deep learning). Pre-trained models such as GPT-2 [1], pix2pix [2], and OpenPose [3] are used for analyzing many specialized types of data (linguistics, image to image translation, and human body features, respectively) and have a number of potential uses for the analysis of biological data in particular. It may be challenging to find large, rich, and specific datasets for training a more general model. This is often the case in the fields of Bioinformatics or Medical Image analysis. Data acquisition in such fields is often restricted due to the following factors:

* privacy restrictions inhibit public access to personal information, and may impose limits on data use.

* a lack of labels and effective metadata for describing cases, variables, and context.

* missing data points, which require a strategy to normalize and can make the input data useless.

We can use these pre-trained models to extract a general description of classes and features without requiring a prohibitive amount of training data. We estimate that the amount of required training data may be reduced by an order of magnitude. To get this advantage, pre-trained models must be suitable to the type of input data. There are a number of models specialized for language processing and general use, but options are fewer within the unique feature space of developmental biology, in particular. In this post, we will propose that developmental biology requires a specialized pre-trained model.

This vision for a developmental biology-specific pre-trained model would be specialized for image data. Whereas molecular data might be better served with existing models specialized for linguistic- and physics-based models, we seek to address several features of developmental biology that might be underfit using current models:

* cell division and differentiation events.

* features demonstrating the relationship between growth and motion.

* mapping between spatial and temporal context.

Successful application of pre-trained models is contingent to our research problem. Most existing pre-trained models operate on two-dimensional data, while data types such as medical images are three-dimensional. A study by Raghu et.al [4] suggests techniques specified by pre-trained models (such as transfer learning by the ImageNet model) applied to a data set of medical images provides little benefit to performance. In this case, performance can be improved using data augmentation techniques. Data Augmentation, such as adding versions of the images that have undergone transformations such as magnification, translation, rotation, or shearing, can be used to add variability of our data and improve the generalizability of a given model.

One aspect of pre-trained models we would like to keep in mind is that models are not perfect representations of the phenomenology we want to study. Models can be useful, but are often not completely accurate. A model of the embryo, for example, might be based on the mean behavior of the phenomenology. Transitional states [5], far-from-equilibrium behaviors [6], and rare events are not well-suited to such a model. By contrast, a generative model that considers many of these features might generally underfit the mean behavior. We will revisit this distinction in the context of “blobs” and “symbols”, but for now, it appears that models are expected to be both imperfect and incomplete.

The inherent imperfection of models is both good and bad news for our pursuit. On the one hand, specialized models cannot be too specific, lest they overfit some aspects of development but not others. Conversely, highly generalized models assume that there are universal features that transcend all types of systems, from physical to social, and from artificial to natural. One example of this is found in complex network models, widely used to represent everything from proteomes to brains to societies. In their general form, complex network models are not customized for specific problems, relying instead on the node and edge formalism to represent interactions between discrete units. But this also requires that the biological system be represented in a specific way to enforce the general rules of the model. For example, a neural network’s focus on connectivity requires representations of a nervous system to be simplified down to nodes and arcs. As opposed to universality, particularism is an approach that favors the particular features of a given system, and does not require an ill-suited representation of the data. Going back to the complex networks example, there are specialized models such as multi-level networks and hybrid models (dynamical systems and complex networks) that solves the problem of universal assumptions.

Another aspect of pre-trained models is in balancing the amount of training data needed to produce an improvement in performance. How much training data can we save by applying a pre-trained model to our data set? We can reformulate this question more specifically to match our specific phenomenon and research interests. To put this in concrete terms, let us consider a hypothetical set of biological images. These images can represent discrete points in developmental time, or a range of biological diversity. Now let us suppose a developmental phenotype for which we want to extract multiple features. What features might be of interest, and are those features immediately obvious?

In the DevoWorm group (where we mostly deal with embryogenetic data), we have approached this in two ways. The first is to model the embryo as a mass of cells, so that the major features of interest are the shape, size, and position of cells in an expanding and shifting whole. Last summer, we worked on applying deep learning to

* Caenorhabditis elegans embryogenesis. Github: https://github.com/devoworm/GSOC-2019.

* colonies of the diatom Bacillaria paradoxa. Github: https://github.com/devoworm/Digital-Bacillaria.

While these models were effective for discovering discrete structural units (cells, filaments), they were not as effective at directly modeling movement, currents, or transformational processes. The second way we have approached this is to model the process of cell division and differentiation as a spatial and discrete temporal process. This includes the application of representational models such as game theory [7] and cellular automata [8]. This allows us to identify more subtle features that are not directly observable in the phenotype, but are less useful for predicting specific events or defining a distinct feature space.

Our model must be capable of modeling multiple structural features concurrently, but also sensitive to scenarios where single sets of attributes might yield more information. Ideally, we desire a training dataset that perfectly balances “biologically-typical” motion and transformations with clearly masked shapes representing cells and other phenotypic structures. Generally speaking, the greater degree of natural variation in the training dataset, the more robust the pre-trained model will turn out to be. More robust models will generally be easier to use during the testing phase, and result in a reduction in the need for subsequent training.

Finally, specialized pre-trained models bring up the issue of how to balance rival strategies for analyzing complex processes and data features. Conventional artificial intelligence techniques have relied on a representation which relies on the manipulation of symbols or a symbolic layer that results from the transformation of raw data to a mental framework. By contrast, modern machine learning methods rely on data to build a series of relationships that inform a classificatory system. While a combination of these two strategies might seem obvious, it is by no means a simple matter of implementation [9]. The notion of “blobs” (data) versus “symbols” (representations) draws on the current debate related to data-intensive representations versus formal (innate) representations [10-12], which demonstrates the timeliness of our efforts. Balancing these competing strategies in a pre-trained model allows us to more easily bring expert knowledge or complementary data (e.g. gene expression data in an analysis of embryonic phenotypes) to bear.

We will be exploring the details of pre-trained models in future discussions and meetings of the DevoWormML group. Please feel free to join us on Wednesdays at 1pm UTC at https://tiny.cc/DevoWorm or find us on Github (https://github.com/devoworm/DW-ML) if you are interested in discussing this further. You can also view our previous discussions on the DevoWorm YouTube channel, DevoWormML playlist (https://bit.ly/2Ni7Fs2).

References:

[1] Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. (2019). Language Models are Unsupervised Multitask Learners. OpenAI, https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf.

[2] Isola, P., Zhu, J-Y., Zhou, T., Efros, A.A. (2017). Image-to-Image Translation with Conditional Adversarial Nets. Proceedings of Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Cao, Z., Hidalgo, G., Simon, T., Wei, S-E., and Sheikh, Y. (2018). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv, 1812.08008.

[4] Raghu, M., Zhang, C., Kleinberg, J.M., and Bengio, S. (2019). Transfusion: Understanding Transfer Learning for Medical Imaging. arXiv, 1902.07208.

[5] Antolovic, V., Lenn, T., Miermont, A., Chubb, J.R. (2019). Transition state dynamics during a stochastic fate choice. Development, 146, dev173740. doi:10.1242/dev.173740.

[6] Goldenfeld, N. and Woese, C. (2011). Life is Physics: Evolution as a Collective Phenomenon Far From Equilibrium. Annual Review of Condensed Matter Physics, 2, 375-399. doi:10.1146/annurev-conmatphys-062910-140509.

[7] Stone, R., Portegys, T., Mikhailovsky, G., and Alicea, B. (2018). Origins of the Embryo: Self-organization through cybernetic regulation. Biosystems, 173, 73-82. doi:10.1016/j.biosystems.2018.08.005.

[8] Portegys, T., Pascualy, G., Gordon, R., McGrew, S., and Alicea, B. (2016). Morphozoic: cellular automata with nested neighborhoods as a metamorphic representation of morphogenesis. In “Multi-Agent Based Simulations Applied to Biological and Environmental Systems“. Chapter 3 in "Multi-Agent-Based Simulations Applied to Biological and Environmental Systems", IGI Global.

[9] Garnelo, M. and Shanahan, M. (2019). Reconciling deep learning with symbolic artiﬁcial intelligence: representing objects and relations. Current Opinion in Behavioral Sciences, 29, 17–23.

[10] Zador, A.M. (2019). A critique of pure learning and what artificial neural networks can learn from animal brains. Nature Communications, 10, 3770.

[11] Brooks, R.A. (1991). Intelligence without representation. Artificial Intelligence, 47, 139–159.

[12] Marcus, G. (2018). Innateness, AlphaZero, and Artificial Intelligence. arXiv, 1801.05667.

Resources:

* Model Zoo: pre-trained models for various platforms: https://modelzoo.co/

* DevoZoo: developmental data for model training and analysis: https://devoworm.github.io/

* Publicly available Medical Image datasets: https://medical-imaging-datasets and open-access-medical-imaging-datasets

* Popular papers on medical image segmentation along with code: https://paperswithcode.com/area/medical/medical-image-segmentation

* Microscopy specific image datasets: http://www.cellimagelibrary.org/pages/datasets and https://idr.openmicroscopy.org

September 2, 2019

Introducing: DevoWormML

This has been cross-posted to The Node blog.

I am pleased to announce a new collaborative interest initiative called DevoWormML, based on work being done in the DevoWorm group. DevoWormML will meet on a weekly basis, and explore the application of machine learning and artificial intelligence to problems in developmental biology. These applications can be geared towards the analysis of imaging data, gaining a better understanding of thought experiments, or anything else relevant to the community.

While "ML" stands for machine learning, participation can include various types of intelligent systems approaches. Our goal is to stimulate interest in new techniques, discover new research domains, and establish new collaborations. Guests are welcome to attend, so if you know an interested colleague, feel free to direct them our way.

Meetings will be Wednesdays at 1pm UTC on Google Meet. Discussions will also take place on the #devowormml channel of OpenWorm Slack (request an invitation). We will discuss organizational details at our first meeting on September 4. If you cannot make this time but are still interested in participating, please contact me. Hope to see you there!

February 26, 2016

Kluged Curiosities and Network Connectivity, February 2016

While this blog has matured past the stamp-collecting phase of inquiry, we will nevertheless review a series of curiosities from the last few months. This includes a few readings from the network science literature that have been percolating (pardon the pun) through my reading queue.

The first of these is a game that relies on your pattern recognition skills as well as a keen eye for outliers. The "Guess the Correlation" game trains you to see the signal through the noise, provided that signal is a linear correlation amongst less than 100 datapoints.

Visual approximation of an embedded signal. COURTESY: guessthecorrelation.com

This is also a nice example of domain expertise versus the precision of statistical techniques [1], and perhaps a lesson in naive feature creation.

Crowdfunding, or raising modest amounts of money from large numbers of individuals, is emerging as an alternative way of raising money for side projects and attaining short-term research goals. A new paper on crowdfunding in PLoS Biology [2] called "A Guide to Scientific Crowdfunding" gives excellent tips for starting your own crowdfunding campaign and a bibliography for further reading on scientific crowdfunding.

Last year was the 55th anniversary marking the discovery of the giant component (a key foundation in the area we now call network science) by Erdos and Renyi [3]. Disrupting this giant component by reducing the connectivity of a complex network has many practical applications [4]. Kovacs and Barabasi [5] introduce us to the concept of connective destruction, which refers to the selective removal of nodes that partitions a network into smaller, disconnected components (effectively isolating subnetworks)?

Morone and Makse [6] attempt to solve this NP-hard problem by developing the collective influence algorithm. Collective influence takes into account a node's extended degree, or the degree of a central nodes as well as its connections up to several links away. By taking into account both the strong and weak links of a given node of high-degree, the computational complexity of optimal network partitioning can be reduced to O(n log n).

Explosive percolation, now officially famous. COURTESY: Allen Beattie and Nature Physics.

A related (and perhaps in some ways inverse) problem is that of explosive percolation. Explosive percolation is the sudden emergence of large-scale connectivity in networks [7]. As connective destruction makes it easier to control, say, a disease outbreak, explosive percolation makes it harder. Fortunately, we are discovering ways to control this transition [8]. For example, approaches based on Achlioptas processes (a form of competitive graph evolution) can be successful at delaying or otherwise controlling the onset of explosive percolation [9].

Recently, I got into a series of conversations about the use of Lena Soderberg's image as a standard computer vision benchmark. Apparently, one reason it is used is because it is the visual equivalent of a pangram [10]. Regardless, here is the historical background on one of the most benchmarked photos in Computer Science history [11].

The oft-benchmarked photo (circa 1972).

NOTES:
[1] Driscoll, M.E. The Data Science Debate: domain expertise or machine learning? Data Utopian blog. Accessed on 2/26/2016.

[2] Vachelard, J., Thaise Gambarra-Soares, T., Augustini, G., Riul, P., Maracaja-Coutinho, V. 2016. A Guide to Scientific Crowdfunding. PLoS Biology, 14(2), e1002373.

[3] Spencer, J. 2010. The Giant Component: the golden anniversary. Notices of the AMS, June/July, 720-724.

* discusses interesting historical links between discovery of the giant component and Galton-Watson processes (the mathematics of branching processes in biology).

[4] Kovacs, I.A. and Barabasi, A-L. 2015. Destruction perfected. Nature News and Views, 524, 38-39.

[5] Keeling, M.J. and Eames, K.T.D. 2005. Networks and epidemic models. Journal of the Royal Society Interface, 2(4), 295–307.

[6] Morone, F. and Makse, H.A. 2015. Influence maximization in complex networks through optimal percolation. Nature, 524, 65-68.

[7] Ouellette, J. 2015. The New Laws of Explosive Networks. Quanta Magazine, July 14.

[8] Achlioptas, D., D'Souza, R.M., and Spencer, J. 2009. Explosive Percolation in Random Networks. Science, 323(5920), 1453-1455.

[9] D'Souza, R.M. and Nagler, J. 2015. D'Souza, R.M. and Nagler, J. 2015. Anomalous critical and supercritical phenomena in explosive percolation. Nature Physics, 11, 531-538.

[10] A phrase that uses all of the letters in the available alphabet. One example: "the quick brown fox jumped over the lazy dog".

[11] Hutchinson, J. 2001. Culture, Communication, and an Information-Age Madonna. IEEE Professional Communication Society Newsletter, 45(3).