The Octopus is a software service, designed to provide a high-availability cloud-based storage solution.
The following table provides some information about some of the existing S3-compatible public cloud service offerings and private cloud service solutions:
Service | Public/Private Cloud | Note |
---|---|---|
Amazon S3 | Public | |
Cloudian | Public | |
Connectria Cloud Storage | Public | It is unclear if this service is still available |
Dunkel Cloud Storage | Public | |
Google Cloud Storage | Public | |
Host Europe Cloud Storage | Public | Defunct since end 2014 |
HP Helion Public cloud | Public | Defunct since January 2016 |
Nirvanix | Public | Defunct since September 2013 |
Apache CloudStack | Private | |
Ceph | Private | |
Cumulus (Nimbus) | Private | |
Minio | Private | |
pWalrus | Private | Parallel version of Walrus |
Riak Cloud Storage | Private | |
S3ninja | Private | Emulates the S3 API for development and testing purposes |
Swift (OpenStack) | Private | |
Walrus (Eucalyptus) | Private |
Octopus' aim is to support multiple different S3-compatible services. Support for S3 and Walrus is implemented now. See the list of already implemented features.
- Octopus - A Redundant Array of Independent Services (RAIS). Christian Baun, Marcel Kunze, Denis Schwab, Tobias Kurze. Proceedings of the 3rd International Conference on Cloud Computing and Services Science (CLOSER 2013) in Aachen. SCITEPRESS. ISBN: 978-989-8565-52-5, P.321-328
- Redundant Cloud Storage with Octopus. Christian Baun, Marcel Kunze. This non-published paper from August 2011 summarizes the features and design of Octopus.
Octopus is designed to run inside a PaaS like Google’s AppEngine, AppScale or typhoonAE.
One of the benefits of a cloud platform is that the users don’t need to install the software at client side.
The users can import their credentials to S3 and Walrus services into Octopus. Octopus checks if a bucket with the naming scheme octopus_storage-at- exists. If not, the bucket will be created and the users can upload files - called objects in the S3 world – with one click to the connected storage services into the Octopus bucket.
The following figure shows the steps to upload an object. After the customers login, his client requests (1) the Octopus website with the HTML form and the list of objects.
The object list is requested (2) from the storage services and transferred (3) to Octopus. The synchronicity of the objects is checked (4) by Octopus using checksums. All S3-compatible store a MD5 checksum for each object. These checksums are transferred automatically when a list of objects is requested and they allow to verify if the objects located at the different storage services are synchronized. Any time, when a list of objects is requested, Octopus checks if the objects are still synchronized across the storage services. After the synchronicity check, the web site with the HTML form is transferred (5) to the customers browser. After the customer selected the local file and started the upload with the submit button, the object is transferred (6) to the first storage service. If the upload was successful, a confirmation message is send (7) back to the browser. Step 6 and 7 are repeated for each additional storage service used.
A drawback of Octopus is that the files that shall be uploaded to the cloud storage services cannot be cached by Octopus itself because files cannot be stored by the applications inside the PaaS. This causes another drawback of Octopus. All files need to be transferred to each connected storage service. If a user has credentials for multiple storage services, the file needs to be transferred from the client (browser) to the storage services one after one.
Because each object is transferred directly from the customers browser to all connected storage services, the amount of data that need to be transferred, increases linear with each additional storage service used. Therefore, the use of multiple storage services leads to disproportionately long transfer times.
Octopus is written in Python and JavaScript. The communication with the S3-compatible storage services is done via boto, a Python interface to the Amazon Web Services. The user interface is HTML (generated with Django) and some JavaScript (jQuery).
- Import of credentials for Amazon S3 and Walrus.
- RAID-1 mode. Upload to one or two storage services with a single click.
- Check for synchronicity with help of the MD5 checksums.
- Erase objects inside different storage services with a single click.
- Erase all objects in all storage services with a single click.
- Alter Access Control List (ACL) of objects inside one or two storage services with a single click.
- Implementation of automatic repair when check for synchronicity failed.
- Currently, each user can import credentials for only one Amazon S3 account and a single Walrus Private Cloud storage service.
- Implement support for Google Blobstore. Objects (called blobs) of max. 2 GB size can be upload into the Blobstore via HTTP POST and then accessed from App Engine applications. Blobstore could be used as a proxy for Octopus to avoid multiple uploads from the browser to the storage services.
- Google Storage could be used as a proxy for Octopus too because objects inside Google Storage can be accessed from applications running inside the App Engine.
- Implementation of a RAID-5 mode. Benefits would be that no provider has a full (working) copy of the customers data and if a provider is not operational any more, the customers data is still available.
- Cumulus does not support uploading objects via POST yet. Maybe future releases have this feature and can be used by Octopus.
- In S3 and Google Storage, the MD5 checksums is enclosed by double quotes. In Walrus they are not.
- If no submit button inside a form is used to upload an object into Walrus, some bytes of garbage data is appended to the object.