t-Distributed Neighbor Embedding (t-SNE)

Machine learning algorithm for data reduction and visualization developed by Laurens van der Maaten and Geoffrey Hinton

Reduces dimensionality of data using a nonlinear probabilistic approach so visualization of high-dimensional data is in two or three dimensions for easy plotting

Models each high-dimensional object by a two- or three-dimensional point so similar objects are graphically nearby and dissimilar objects are graphically distant

Plotted distances are based on probability from Euclidean distances

The t-SNE algorithm comprises two main stages

A probability distribution over pairs of high-dimensional objects is constructed so that similar objects have a high probability of being picked, while dissimilar objects have a small probability of being picked

A similar probability distribution over the points in the low-dimensional map are obtained, then the Kullback–Leibler divergence between the two distributions is minimized with respect to the locations of the points in the map

t-SNE has been used for visualization in a wide range of applications

We are interested in t-SNE giving quantitative data visual groupings hitherto not observed

Example: Are the number of crater layers differentiable considering location, thermal inertia, crater diameter?