urgent computing topline

 

Eucalyptus 1.6.2 with SPRUCE support


This document talks about SPRUCE support in Eucalyptus cloud resources.

Table of Contents

Overview

Eucalyptus version 1.6.2 was modified to support SPRUCE. Five urgent computing policies were targeted and added to the Eucalyptus software. This page provides some details to those changes, as well some installation notes for the patched code, which is available from the software page. This code and information is provided "as is" without warranty of any kind, either expressed or implied.

[TOC]

Policies

All of the Eucalyptus modifications are available from the Software page, including the patch for the Eucalyptus software and the modified euca-tools, which include the new tools for submitting spruce jobs.

Eucalyptus was modified to include the following policies: restricted access, preemption, suspension, migration and VM QoS. Each of these policies along with some of their implementation details are provided below.

Restricted Access

When a token is activated that is associated with the restricted access token, it informs the cloud so that the cloud will no longer accept non-urgent requests. Only urgent requests that are submitted by users with an activated SPRUCE token will be accepted, assuming the necessary resources are available. This policy does not affect currently running virtual machines; if there are insufficient resources for an incoming urgent request, that request will be denied. In a batch queue resource, this policy would be similar to a queue drain.

Preemption

The preemption policy is very similar to the preemption policy in a tradition batch queue resource. If there are insufficient resources to host an incoming urgent request, non-urgent virtual machines are preempted (without warning) to free up the necessary resources. The current targeting algorithm used first attempted to minimize the number of non-urgent VMs to preempt (e.g., target one large VM over two smaller VMs) and second to minimize the size of the targeted VMs (e.g., target two small VMs over two large VMs). In practice, this policy could be implemented with very little overhead as it essentially includes a single call to the hypervisor to terminate the targeted VM and then cleans up the disk space that was allocated to the VM.

Suspension

The suspension policy stops a running VM, writes it memory contents to disk and then releases its memory and CPUs so that they can be allocated to an urgent VM. This policy is potentially less intrusive than preemption in that the suspended VM can be automatically resumed from the point it was suspended once the necessary memory and cores are again available on the node. Of course, there may be some problems with resuming VMs, such as TCP timeouts. However, even in those cases, suspension may be preferable to preemption because a user will still be able to recover data from the VM once it is resumed, which is not possible in a simple preemption scheme. Similar to preemption, the algorithm for selecting targets first tried to minimize the number of VMs and then tried to minimize the size of the VMs. The suspension policy has more overhead than preemption because it has to write the memory contents of the targeted VM to disk.

Migration

It may be possible that the necessary resources to support an incoming VM are available globally throughout the cloud but not on any single node. The migration policy seeks to collect these resources by migrating smaller, non-urgent VMs around. The Xen hypervisor (which was used for this work), supports a feature called live migration, which claims to offer migration with virtually no observable interruption to the user of the migrated VM. However, this feature requires shared storage between the source and destination node---which was not available on our cloud resources. Instead, we instituted an "offline" migration which essentially involved suspending a targeted VM, transferring all of the instance data (disk, memory contents, metadata, etc.) to the destination node, resuming the VM on the destination node, and then cleaning up the source node. As one would expect, this process is signicantly more expensive than the other policies. Also, this policy included a simple heuristic that limited each node to migrate at most one VM. Future work should look at more robust experimentation with this policy (e.g., how many concurrent migrations can a cloud support before network performance degrades?).

Dynamic VM QoS

Virtualization also provides us with the ability to dynamically manipulate the access a VM has to the underlying resources. In this policy, a non-urgent VM may have its memory and/or number of CPUs cut in half and those resources made available to an incoming urgent VM. Similar to preemption, this policy has very little overhead. However, there are a few caveats. Some simple experiments using the NAS Parallel Benchmarks as case study application revealed that dynamically halving the number of CPUs for a running benchmark had a doubling effect on the wall clock time. This is expected given that two processes were now running on each virtual CPU rather than just one. Halving the amount of memory typically had a much smaller impact, though three of the benchmarks actually failed. Xen notes in their manual that reducing the memory of a running VM may result in instability. However, this policy has the potential for the smallest impact on targeted users in that their VMs are allowed to continute to run with no interruption.

[TOC]

Installation

All required source files (patched and unpatched) are available from our software page. As mentioned, all modification were made to Eucalyptus-1.6.2. To start, you will need to donwload and install the eucalyptus source dependencies. There were no modifications made to these files. Additionally, this version was created and tested using the following Xen version 4.0 with linux kernel 2.6.31.13 and libvirt-0.7.0. Also, the port number used by Eucalyptus were also modified ( Cloud Controller: 9773 Cluster Controller: 9774 Node Controller: 9775 Web portal front-end: 9773). Final note, all testing was done using the static networking mode.

Once the dependencies are installed, our version of eucalyptus can be patched and built. There is a patch file in the root directory. The patch can be applied by running the following command: 'patch -p1 -i euca-spruce.patch'. After the patch is applied, you will need to create two empty directories: clc/modules/spruce/src/main/resources and clc/modules/spruce/conf Once that is complete, you can configure and build the Eucalyptus code by following the usual Eucalyptus instructions.

[TOC]

Euca2ools 1.3.1

In order to interact with the modified Eucalyptus cloud, the euca2ools were extended for SPRUCE support. This included modifying existing tools, such as euca-describe-instances to include urgency (e.g., "red") and creating new tools. The new tools created were:

  • euca-spruce-activate-session: activate a restricted access session.
  • euca-spruce-deactivate-session: deactivate a restricted access session.
  • euca-spruce-session-status: check whether a restricted access session is active.
  • euca-spruce-run-instances: submit an urgent request (include urgency parameter).

In addition to creating these new tools, the boto-1.8d dependency was also modified. The patch files for boto and the new tool set are available from the software page.

Installation

First, download the euca2ools-src-deps package and unzip the package along with the two zip files in the root directory. Next, apply the patch (patch -p1 -i euca2ools-1.3.1-src-deps.patch). Then, install according to the euca2ools install notes (run the command "sudo python setup.py install" in both the boto and M2Crypto directories).

Next, download the euca2ools package and unzip the package. Next, cd into the parent directory and apply the patch (patch -p1 -i euca2ools-1.3.1.patch). Finally, build the tools (sudo make).

[TOC]