Online:
Visits:
Stories:
Profile image
Story Views

Now:
Last Hour:
Last 24 Hours:
Total:

A New Data Archive for Gemini – Fast, Cheap and in the Cloud

Friday, November 20, 2015 13:32
% of readers think this story is Fact. Add your two cents.

(Before It's News)

I must congratulate my colleague Paul Hirst at Gemini – he is to my knowledge the first astronomer to use “cheap” and “cloud” in the same title. His talk on building a new archive for the Gemini telescopes was presented at ADASS XXV in Sydney.

Gemini operates two 8.1 m optical/IR telescopes, one on Mauna Kea, Hawaii, and the other on Cerro Pachon, Chile. The data sets these  telescopes produce are not especially large by modern standards – 5GB per night of raw FITS files, with a total volume to date of 27.5TB raw FITS  – yet they are diverse, with  Imaging and Spectroscopy (Long Slit, cross-dispersed, Fiber-fed, Integral Field; Polarimetry, Adaptive Optics) over the 0.3 um to 25 um wavelength range.

The archive architecture looks like this:

2015-11-19_18-54-46

The interesting part of the diagram is the “AWS S3” block, which represents the S3 storage system of Amazon Web Services (AWS), where the data are housed. The data flow from telescope to cloud may be described as follows:

  • Local installs on the summit at each telescope ingest files from local disk during observing and export it (via HTTP post) to the archive.
  • An archive server on the Elastic Cloud 2 (ECS) at AWS stores data on S3 and ingests it into archive database. The server is a single 4-core 16GB ram  EC2 instance(M3.xlarge in Amazon speak)at AWS.
  • The latency from the time the file is written at telescope to being available for user download from the archive is typically 20-60 seconds.

AWS offers many options for scaling the performance on demand, when required.

The really interesting part of this concerns the cost. From Paul’s slides:

  • S3 storage: $0.03/GB/month = $2880 /8TB/yr
  • EC2: M4.xl (4CPU, 16GB) = $2470 for 3 years
    • = $0.09 per hour = $823/yr
    • Hilo power: $0.40 /kWh. Say 250W => $0.10/hour just for power and cooling. Let alone buying the actual hardware!
    • Data transfer in is Free
    • Data transfer internally (eg S3 – EC2) is Free
    • Data transfer out to internet: < $90 /TB.
    • Expecting ~ 200GB/month => $200/yr
    • EBS SSD: $0.10/GB/mo. Say $100GB = $120/yr
  • Glacier backup – approx 0.25 * cost(S3) say $1000/yr.
  • Allow say $250 / yr for extra CPU and EBS time for rebuilds / test, double up during upgrades, storing snapshots…

These charges total ~$6,000/ yr. The bottom line is that Amazon’s current cost structure is such that the cost per hour operating the archive is approximately the same as the power and cooling costs in Hilo (apart from buying the hardware). The cost benefit analysis given above is a fine example of the analysis that needs to be done if you are thinking of migrating a project to the cloud.

Moreover, the archive is fast, with typical search page response of < 1 second. And new data are available generally <1 minute after readout. Staffing consumed 3 FTEs over 3 years from project start to deployment.

I wish to thank Dr Paul Hirst for supplying his charts and supporting the preparation of this blogpost. 



Source: https://astrocompute.wordpress.com/2015/11/20/a-new-data-archive-for-gemini-fast-cheap-and-in-the-cloud/

Report abuse

Comments

Your Comments
Question   Razz  Sad   Evil  Exclaim  Smile  Redface  Biggrin  Surprised  Eek   Confused   Cool  LOL   Mad   Twisted  Rolleyes   Wink  Idea  Arrow  Neutral  Cry   Mr. Green

Top Stories
Recent Stories

Register

Newsletter

Email this story
Email this story

If you really want to ban this commenter, please write down the reason:

If you really want to disable all recommended stories, click on OK button. After that, you will be redirect to your options page.