GAMMA: Galactic Attributes of Mass, Metallicity and Age

We introduce the GAMMA (Galactic Attributes of Mass, Metallicity, and Age) dataset, a comprehensive collection of galaxy data tailored for Machine Learning applications. This dataset offers detailed 2D maps and 3D cubes of 11 727 galaxies, capturing essential attributes: stellar age, metallicity, and mass. Together with the dataset we publish our code to extract any other stellar or gaseous property from the raw simulation suite to extend the dataset beyond these initial properties, ensuring versatility for various computational tasks. Ideal for feature extraction, clustering, and regression tasks, GAMMA offers a unique lens for exploring galactic structures through computational methods and is a bridge between astrophysical simulations and the field of scientific machine learning (ML). As a first benchmark, we apply Principal Component Analysis (PCA) on this dataset. We find that PCA effectively captures the key morphological features of galaxies with a small number of components. We achieve a dimensionality reduction by a factor of ∼200 (∼3650) for 2D images (3D cubes) with a reconstruction accuracy below 5%. All the code to generate this dataset and load the data structure will be publically available on GitHub, with an additional documentation page hosted on ReadTheDocs.

Baseline

We calculate Principal Component Analysis as a baseline on the Gamma dataset.

UMAP

We calculate UMAP on the PCA scores to visualize the lower dimensional Image space in two dimensions.

Below there is an interactive plot of the UMAP results. You can zoom in and out, and hover over the points to see the galaxy in the mass map and the corresponding TNG halo id.