urgent computing topline


User Applications

This page provides some highlights regarding our use cases. If you arrived here looking for the SPRUCE user guide and other documentation, please go here.

Linked Environments for Atmospheric Discovery (LEAD)

Excerpt from "LEAD Cyberinfrastructure to Track Real-Time Storms Using SPRUCE Urgent Computing," Marru, S., Gannon, D., Nadella, S., Beckman, P., Weber, D. B., Brewster, K. A., Droegemeier, K. K. CTWatch Quarterly, Volume 4, Number 1, March 2008.

The Linked Environments for Atmospheric Discovery (LEAD) project is pioneering new approaches for integrating, modeling, and mining complex weather data and cyberinfrastructure systems to enable faster-than-real-time forecasts of mesoscale weather systems, including those than can produce tornadoes and other severe weather. Funded by the National Science Foundation Large Information Technology Research program, LEAD is a multidisciplinary effort involving nine institutions and more than 100 scientists, students, and technical staff.

LEAD applied some of its technology, in real time, for on-demand forecasting of severe weather during the 2007 National Oceanic and Atmospheric Administration (NOAA) Hazardous Weather Test Bed (HWT), which is a multi-institutional program designed to study future analysis and prediction technologies in the context of daily operations. The HWT 2007 spring experiment wes a collaboration among university faculty and students, government scientists, NOAA and private forecasters to further our understanding and use of storm-scale, numerical weather prediction in weather forecasting. LEAD researchers and scientists in coordination with the SPRUCE Urgent Computing team were in a unique position to work with HWT participants to expose this technology to real-time forecasters, students, and research scientists.

LEAD HWT spring experiments

HWT 2007 spring experiments

Each day, one or more six- to nine-hour nested grid forecasts at 2 km grid spacing were launched automatically over regions of expected severe weather, as determined by mesoscale discussions at SPC and/or tornado watches, and one six- to nine-hour nested grid forecast, per day, at 2 km grid spacing was launched manually when and where deemed most appropriate. The production workflows were submitted to the computing resources at the National Center for Supercomputing Applications (NCSA). Because of the load on that machine, including other 2007 HWT computing resource needs, the workflow often waited for several hours in queues, before 80 processors were available to be allocated to the workflow. Moreover, the on-demand forecasts were launched based only on the severity of the weather. If a quick turnaround is needed, computing resources have to be pre-reserved and idled, wasting CPU cycles and decreasing the throughput on a busy resource.

LEAD token activation

Launching a urgent computing workflow using SPRUCE token

In order to tackle this problem, LEAD and SPRUCE researchers collaborated with the University of Chicago/Argonne National Laboratory (UC/ANL) TeraGrid resources to perform real-time, on-demand severe weather modeling. Additionally, the UC/ANL IA64 machine currently supports preemption for urgent jobs with highest priority. As an incentive to use the platform even though jobs may be killed, users are given a 10% discount from the standard CPU service unit billing. Deciding which jobs are preempted is determined by an internal scheduler algorithm that considers several aspects, such as the elapsed time for the existing job, number of nodes, and jobs per user. LEAD was given a limited number of tokens for use throughout the tornado season. The LEAD web portal allows users to configure and run a variety of complex forecast workflows. The user initiates workflows by selecting forecast simulation parameters and a region of the country where severe weather is expected. This selection is done graphically through a mash-up of Google maps and the current weather. SPRUCE was deployed directly into the existing LEAD workflow by adding a SPRUCE Web service call and interface to the LEAD portal. The above figure shows how LEAD users can simply enter a SPRUCE token at the required urgency level to activate a session and then submit urgent weather simulations.

The SURA Coastal Ocean Observing and Prediction Program (SCOOP)

Excerpt from "Cyberinfrastructure for Coastal Hazard Prediction," Allen, G., Bogden, P., Kosar, T., Kulshrestha, A., Namala, G., Tummala, S., Seidel, E. CTWatch Quarterly, Volume 4, Number 1, March 2008.

The SCOOP Program is creating an open integrated network of distributed sensors, data and computer models to provide a broad array of services for applications and research involving coastal environmental prediction. The SCOOP community currently engages in distributed coastal modeling across the southeastern US, including both the Atlantic and Gulf of Mexico coasts. Various coastal hydrodynamic models are run on both an on-demand and operational (24/7/365) basis to study physical phenomena such as wave dynamics, storm surge and current flow. The computational models, include Wave Watch 3 (WW3), Wave Model (WAM), Simulating Waves Nearshore (SWAN), ADvanced CIRCulation (ADCIRC) model, ElCIRC, and CH3D.4 In the on-demand scenario, advisories from the National Hurricane Center (NHC) detailing impending tropical storms or hurricanes trigger automated workflows consisting of appropriate hydrodynamical models. The resulting data fields are analyzed and results are published on a community portal, and are also distributed to the SCOOP partners for local visualization and further analysis, as well as being archived for further use in a highly available archive.

The development of low pressure areas and the timelines of these turning into hurricanes can vary from a few hours to a few days. A worst case scenario could have an advance notice of less than 12 hours, making it difficult to quickly obtain resources for an extensive set of investigatory model runs and also making it imperative to be able to rapidly deploy models and analysis data. SCOOP and SPRUCE gave a demonstration on how resources can be quickly obtained at the SuperComputing 2007 conference in Reno, Nevada using the resources of the SURAgrid and LONI. The demo illustrated how a hurricane event triggered the use of on-demand resources, and how the priority-aware scheduler was able to schedule the runs on the appropriate queues in the appropriate order. The guarantee that a job runs as soon as data for it has been generated makes it possible to provide a guarantee that the set of runs chosen as high priority runs will complete before the deadline.

SCOOP best-effort execution

(a) - Ensemble member execution and wait times for best-effort execution

SCOOP on-demand execution

(b) - Ensemble member execution and wait times for on-demand execution

SPRUCE was used to acquire the on-demand processors on some resources, and highlighted several advantages over having to negotiate special access to certain users apriori on resources. SPRUCE provided the resource owners the ability to restrict the usage of the system in on-demand mode, and at the same time providing on-demand resources to anyone who needs them. In the past this could only be done by adding and deleting user access on a case-by-case basis.

Figures (a) and (b) show the execution and wait times for the various stages of execution of the SCOOP workflow. Figure (a) shows the execution with only besteffort resources. The pink bars depict the execution and queue wait times of the core Wave Watch III execution on eight processors. It can be seen that the queue wait times account for most of the total time. Figure (b) depicts the ensemble execution using on-demand resources. In this case, 16 processors were available for on-demand use, hence two ensemble members ran simultaneously while others waited for these to finish.

Grid Enabled Neurosurgical Imaging Using Simulation (GENIUS)

Excerpt from "Life or Death Decision-making: The Medical Case for Large-scale, On-demand Grid Computing," Manos, S., Zasada, S., Coveney, P. V. CTWatch Quarterly, Volume 4, Number 1, March 2008.

Patient-specific medicine is the tailoring of medical treatments based on the characteristics of an individual patient. Decision support systems based on patient specific simulation hold the potential of revolutionising the way clinicians plan courses of treatment for various conditions, such as viral infections and lung cancer, and the planning of surgical procedures, for example in the treatment of arterial abnormalities. Since patient-specific data can be used as the basis of simulation, treatments can be assessed for their effectiveness with respect to the patient in question before being administered, saving the potential expense of ineffective treatments and reducing, if not eliminating, lengthy lab procedures that typically involve animal testing. It thereby promises tailored medical treatment based on the particular characteristics of an individual patient and/or an associated pathogen.

The key factor that transcends all of the current patient-specific medical simulation scenarios is the need to turn simulations around fast enough to make the result clinically relevant. In the case of neurosurgical treatments, this is in the order of 15 to 20 minutes. In the case of HIV or cancer pathology reports, this is in the order of 24 to 48 hours. When used as part of the clinical decision making process, computational resources often need to support more exotic scheduling policies than simple first come, first served, batch scheduling, which is the typical scenario seen in high-performance research computing today. These simulations cannot be run in a resources normal batch mode; they need to be given a higher priority and require some form of on-demand computing to succeed.

GENIUS bloodflow simulation

A real-time simulation and visualisation of neurovascular blood flow using HemeLB showing the pressure field (increasing from yellow to blue).

One example application is a lattice-Boltzmann (LB) method, coined HemeLB, used for blood flow modeling and simulation. A major feature of HemeLB is real-time rendering and computation; fluid flow data is rendered in-situ on the same processors as the LB code, and sent, in realtime, to a lightweight client on a clinical workstation as shown above. The client is also used to steer the computation in real time, allowing the adjustment of physical parameters of the neurovascular system, along with visualisation-specific parameters associated with volume rendering, isosurface rendering, and streamline visualisation. HemeLB is intended to yield patient-specific information, which helps plan embolisation of arterio-venous malformations and aneurysms, amongst other neuropathologies. Using this methodology, patient-specific models can be used to address issues with pulsatile blood flow, phase differences and the effects of treatment, all of which are potentially very powerful both in terms of understanding neurovascular patho-physiology and in planning patient treatment.

GENIUS workflow

The workflow cycle for the simulation of neurovascular blood flow from the viewpoint of a clinician

The neurovascular blood-flow simulator, HemeLB has been used with SPRUCE in a next to run fashion on the large scale Lonestar cluster at the Texas Advanced Computing Center (TACC), and was demonstrated live on the show floor at SuperComputing 2007, where real-time visualisation and steering were used to control HemeLB within an urgent computing session.