The Sequencer algorithm
Orders data to reveal structure & trends
The Sequencer is an algorithm designed to automatically reveal the main trend or sequence in a dataset, if it exists. To do so, it reorders a collection of objects to produce the most elongated manifold describing their similarities estimated in a multi-scale and multi-metric manner. This process can generically reveal the main trend in arbitrary datasets.
For a quick overview, see simple examples and scientific discoveries made with the algorithm.
Most of the time, objects in a dataset are not ordered in any interesting manner. If these objects follow a trend, it should be possible to order the set meaningfully and, as a result, reveal the underlying trend. To automatically search for this trend, the algorithm proceeds as follows:
One challenge in finding a trend in a dataset is being able to “look” at the data from the right point of view. For example, computing a distance matrix for a dataset requires the use of a metric and the choice of a scale. In order to be as generic as possible, one should (i) consider an ensemble of metrics and scales, (ii) identify which carry relevant information about the existence of an underlying trend and (iii) combine them meaningfully. To do so, the algorithm proceeds as follows:
This approach is designed to define a view of the data, through the combination of different metrics and scales, that leads to the most elongated manifold.
Example showing the automatic ordering of time series
showing the number of Google searches for different smartphones.
Example showing the automatic ordering of time series showing the reconstruction
of an image for which rows were randomly shuffled.
Concept & algorithm: Dalya Baron1 & Brice Ménard2
Online platform: Manuchehr Taghizadeh-Popp2
Affiliations: (1) Tel Aviv University, (2) Johns Hopkins University
Support: This project has been supported by the Packard Foundation and the generosity of Eric and Wendy Schmidt by recommendation of the Schmidt Futures program. This online platform uses resources provided by the SciServer project from the Institute for Data Intensive Engineering and Science.