MPA :: Current Research Highlight

The public dissemination of a large and compex data set such as the Millennium Run brings challenges which are different from and go beyond those which must be faced when setting up a public archive for observational data. Many of these result from the great variety of relations between the various objects in the database, as well as from the many properties that can be assigned to each one. In practice, most users are interested in the properties of dark matter halos and galaxies, objects created from the simulation output through post-processing. Dark matter halos are the basic nonlinear units of the simulated universe. They have properties such as mass, size and position, in addition to internal substructures (subhalos) which are the remnants of objects which fell into them during their growth. The Millennium Archive contains information for about 750 million halos and subhalos, all linked in a tree structure which describes how each object was built from those present at the immediately preceding time. This is the data structure used by the galaxy formation algorithms.

Galaxy formation is a complicated and uncertain process, and many physical models for its various aspects must be tried in order to establish those which best describe observed phenomena. A principal goal of the Millennium Run project is to provide a framework for comparing different galaxy formation models to observational data. It is thus important to make available simulated galaxy catalogues with a variety of assumptions about the physics of galaxy formation so that users can get a feel for the uncertainties involved. A galaxy catalogue for the full Millennium Run has about 1 billion entries. For each of these galaxies many properties can be calculated by the formation model and must be stored in the database.

In addition, pointers are needed to connect the galaxies present at different times, and these produce a tree data structure which gives the merger history of each galaxy and which parallels (but is different from) the halo formation trees.

An important issue which has to be addressed comes from the fact that users wish to use the Millennium Run for a wide variety of purposes and the view of the data which is most convenient for them depends on their project. This requires that the data be delivered in a manner that is more flexible than the traditional download of "flat files". To this end the MPA/MPE/GAVO group decided to use a relational database for storing the post-processing results of the Millennium database. The main reason for this choice is that relational databases support a flexible and intuitive query language (SQL), which allows users to select out those objects that are of interest, in a form of their own choice and without requiring knowledge of the physical storage of the data. In the database this language is implemented by efficient query engines that interpret the potentially complex requests and execute these in the most efficient way.

Online access to the Millennium database is provided through a

web-based query interface (see Fig. 1). Apart from providing documentation and example queries, users can type in their own SQL queries and execute them. The results can be directly returned to the user, they can be plotted online (see Fig. 2), or they can be stored for further analysis in a private database, that is assigned to registered users. This approach is directly modelled on the highly successful SDSS SkyServer database (

http://cas.sdss.org/dr6/en/). At the time of writing there are over 160 registered users of the Millennium Archive site with local disk space allocated for storage and manipulation of the results of their queries. About 80% of these have successfully executed queries on the main databases. Roughly half appear to be already carrying out significant research programmes (more than 50 successful queries), while about 20% can be characterised as heavy users (more than 1000 successful queries). On average over 500 million rows of data are being downloaded from the site per week. The user group is still growing rapidly and it will probably be several years before the archive's success in generating new science from the Millennium Simulation can be properly assessed.

Supplying simulation data to the world.