The Chimera Virtual Data System

Overview

Much scientific data is not obtained from measurements but rather derived from other data by the application of computational procedures. The explicit representation of these procedures can enable documentation of data provenance, discovery of available methods, and on-demand data generation (so-called “virtual data”). To explore this idea, we have developed the Chimera Virtual Data System, which combines a virtual data catalog, for representing data derivation procedures and derived data, with a virtual data language interpreter that translates user requests into data definition and query operations on the database. This paradigm is described in a paper presented at the 14th International Conference on Scientific and Statistical Database Management (SSDBM 2002):
Chimera: A Virtual Data System for Representing, Querying, and Automating Data Derivation

We couple the Chimera system with distributed “Data Grid” services to enable on-demand execution of computation schedules constructed from database queries. We have applied this system to the reconstruction of simulated collision event data from a high-energy physics experiment, and the search of digital sky survey data for galactic clusters, with promising results.

You might love to check out: Expert Review: Is Max Performer the Best Male Enhancement Pill?

The Chimera Virtual Data System (VDS) provides a catalog that can be used by application environments to describe a set of application programs (“transformations”), and then track all the data files produced by executing those applications (“derivations”). Chimera contains the mechanism to locate the “recipe” to produce a given logical file, in the form of an abstract program execution graph. These abstract graphs are then turned into and executable DAG for the Condor DAGman meta-scheduler by the Pegasus planner which is bundled into the VDS code release.

Also checkout: Supplements reviews

Plans for the evolution of the virtual data system are outlined in a paper presented in January 2003 at CIDR 2003 – Conference on Innovative Data System Research: The Virtual Data Grid: A New Model and Architecture for Data-Intensive Collaboration

Talks

Some talks that describe the Chimera system and related architecture and design issues were presented at:
CIDR 2003, Pacific Grove, January 7, 2003
Supercomputing 2002, Baltimore, November 21, 2002
SSDBM Conference, Edinburgh, July 23, 2002
CMS Production Grid Workshop – CERN, June 9-13 2002

Code

The latest Chimera release is available in both source and binary.
An archive of prior releases and future release candidates is also available.

Complete instructions on installing and using this release are contained in the Chimera Virtual Data System User Guide.
A copy of the user guide can be found inside the distributions in the doc subdirectory.

Support

Support questions for Chimera can be sent to chimera-support@griphyn.org.

Tutorials

Chimera Tutorial – GriPhyN Meeting, ISI, December 16, 2002
Pegasus Tutorial – GriPhyN Meeting, ISI, December 16, 2002
Getting Started with Chimera – University of Chicago, September 4, 2002

Applications

An example of applying virtual data techniques is described in the Supercomputing 2002 paper
Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey.

Information and instructions for using the Chimera 1.0 Beta release to run LIGO applications can be found in the document:
“HOW TO for Running LIGO examples through Chimera VDS”

A demonstration of using the ROOT Interactive Analysis System to explore virtual data is presented in a talk from the PPDG Interactive Data Analysis Workshop at Caltech on December 19, 2002.

Team

The Chimera team members, Ian Foster, Michael Milligan, Jens Vöckler, Michael Wilde (Argonne), and Yong Zhao, are based at the University of Chicago, Department of Computer Science.

Acknowledgements

Chimera is part of the Grid Physics Network (GriPhyN) project, supported by The National Science Foundation under Information Technology Research Grant ITR-0086044. GriPhyN uses middleware developed with the support of the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38 (SciDAC Data Grid Middleware).

Grid execution of Chimera is enabled by the Pegasus Planner, a GriPhyN project by Ewa Deelman, Gaurang Mehta, and Karan Vahi of ISI