Eucalyptus 1.6.2 with SPRUCE support
This document talks about SPRUCE support in Eucalyptus cloud resources.
Table of Contents
Eucalyptus version 1.6.2 was modified to support SPRUCE. Five
urgent computing policies were targeted and added to the Eucalyptus
software. This page provides some details to those changes, as well
some installation notes for the patched code, which is available from
the software page. This code and
information is provided "as is" without warranty of any kind, either
expressed or implied.
All of the Eucalyptus modifications are available from the Software page, including the patch for the Eucalyptus software and the modified euca-tools, which include the new tools for submitting spruce jobs.
Eucalyptus was modified to include the following policies: restricted access, preemption, suspension, migration and VM QoS. Each of these policies along with some of their implementation details are provided below.
When a token is activated that is associated with the restricted access token, it informs the cloud so that the cloud will no longer accept non-urgent requests. Only urgent requests that are submitted by users with an activated SPRUCE token will be accepted, assuming the necessary resources are available. This policy does not affect currently running virtual machines; if there are insufficient resources for an incoming urgent request, that request will be denied. In a batch queue resource, this policy would be similar to a queue drain.
The preemption policy is very similar to the preemption policy in a tradition batch queue resource. If there are insufficient resources to host an incoming urgent request, non-urgent virtual machines are preempted (without warning) to free up the necessary resources. The current targeting algorithm used first attempted to minimize the number of non-urgent VMs to preempt (e.g., target one large VM over two smaller VMs) and second to minimize the size of the targeted VMs (e.g., target two small VMs over two large VMs). In practice, this policy could be implemented with very little overhead as it essentially includes a single call to the hypervisor to terminate the targeted VM and then cleans up the disk space that was allocated to the VM.
The suspension policy stops a running VM, writes it memory contents to disk and then releases its memory and CPUs so that they can be allocated to an urgent VM. This policy is potentially less intrusive than preemption in that the suspended VM can be automatically resumed from the point it was suspended once the necessary memory and cores are again available on the node. Of course, there may be some problems with resuming VMs, such as TCP timeouts. However, even in those cases, suspension may be preferable to preemption because a user will still be able to recover data from the VM once it is resumed, which is not possible in a simple preemption scheme. Similar to preemption, the algorithm for selecting targets first tried to minimize the number of VMs and then tried to minimize the size of the VMs. The suspension policy has more overhead than preemption because it has to write the memory contents of the targeted VM to disk.
It may be possible that the necessary resources to support an incoming VM are available globally throughout the cloud but not on any single node. The migration policy seeks to collect these resources by migrating smaller, non-urgent VMs around. The Xen hypervisor (which was used for this work), supports a feature called live migration, which claims to offer migration with virtually no observable interruption to the user of the migrated VM. However, this feature requires shared storage between the source and destination node---which was not available on our cloud resources. Instead, we instituted an "offline" migration which essentially involved suspending a targeted VM, transferring all of the instance data (disk, memory contents, metadata, etc.) to the destination node, resuming the VM on the destination node, and then cleaning up the source node. As one would expect, this process is signicantly more expensive than the other policies. Also, this policy included a simple heuristic that limited each node to migrate at most one VM. Future work should look at more robust experimentation with this policy (e.g., how many concurrent migrations can a cloud support before network performance degrades?).
Dynamic VM QoS
Virtualization also provides us with the ability to dynamically
manipulate the access a VM has to the underlying resources. In this
policy, a non-urgent VM may have its memory and/or number of CPUs cut
in half and those resources made available to an incoming urgent VM.
Similar to preemption, this policy has very little overhead. However,
there are a few caveats. Some simple experiments using the NAS
Parallel Benchmarks as case study application revealed that
dynamically halving the number of CPUs for a running benchmark had a
doubling effect on the wall clock time. This is expected given that
two processes were now running on each virtual CPU rather than just
one. Halving the amount of memory typically had a much smaller
impact, though three of the benchmarks actually failed. Xen notes in
their manual that reducing the memory of a running VM may result in
instability. However, this policy has the potential for the smallest
impact on targeted users in that their VMs are allowed to continute to
run with no interruption.
All required source files (patched and unpatched) are available from our software page. As mentioned, all modification were made to Eucalyptus-1.6.2. To start, you will need to donwload and install the eucalyptus source dependencies. There were no modifications made to these files. Additionally, this version was created and tested using the following Xen version 4.0 with linux kernel 188.8.131.52 and libvirt-0.7.0. Also, the port number used by Eucalyptus were also modified ( Cloud Controller: 9773 Cluster Controller: 9774 Node Controller: 9775 Web portal front-end: 9773). Final note, all testing was done using the static networking mode.
Once the dependencies are installed, our version of eucalyptus can
be patched and built. There is a patch file in the root directory.
The patch can be applied by running the following command: 'patch -p1
-i euca-spruce.patch'. After the patch is applied, you will need to
create two empty directories: clc/modules/spruce/src/main/resources
and clc/modules/spruce/conf Once that is complete, you can configure
and build the Eucalyptus code by following the usual Eucalyptus
In order to interact with the modified Eucalyptus cloud, the euca2ools were extended for SPRUCE support. This included modifying existing tools, such as euca-describe-instances to include urgency (e.g., "red") and creating new tools. The new tools created were:
- euca-spruce-activate-session: activate a restricted access session.
- euca-spruce-deactivate-session: deactivate a restricted access session.
- euca-spruce-session-status: check whether a restricted access session is active.
- euca-spruce-run-instances: submit an urgent request (include urgency parameter).
In addition to creating these new tools, the boto-1.8d dependency was also modified. The patch files for boto and the new tool set are available from the software page.
First, download the euca2ools-src-deps package and unzip the package along with the two zip files in the root directory. Next, apply the patch (patch -p1 -i euca2ools-1.3.1-src-deps.patch). Then, install according to the euca2ools install notes (run the command "sudo python setup.py install" in both the boto and M2Crypto directories).