Online:
Visits:
Stories:
Profile image
Story Views

Now:
Last Hour:
Last 24 Hours:
Total:

Petascale Data Management in Solar Physics: Approach of the DKIST Data Center

Tuesday, November 10, 2015 18:08
% of readers think this story is Fact. Add your two cents.

(Before It's News)

While attending the ADASS XXV meeting in Sydney, I heard this excellent presentation by Steve Berukoff 0n data management at the National Solar Obervatory’s Daniel K. Inouye Solar Telescope, now under construction on Haleakala, Maui, Hawaii.  The telescope is due for completion in Q4 2019, and will then be the world’s  largest-aperture national facility solar telescope. It has a planned lifetime of 44 years, will award time based on competitive proposals, and will maintain a long-term archive and implement an Open Data policy. The DKIST has identified an ambitious 26 topical areas, along with a suite of high-resolution visible & near-IR imaging & spectropolarimetry instrumentation.

The DKIST under construction. Image courtesy of National Solar Observatory/AURA/NSF

The DKIST under construction. Image courtesy of National Solar Observatory/AURA/NSF

This ambitious program gives rise to some serious challenges in data management. This chart summarizes the scale of the storage and curation challenges:

2015-11-10_14-53-04

The data management team are preparing themselves to handle the data and the metadata when the telescope sees first light. This includes the development of adaptive metadata handling algorithms that will lead to the semi-autonous production of calibrated data sets.

The part of the talk I particularly enjoyed was the discussion on scalable storage approaches. The traditional RAID approach offers bit loss protection and fault tolerance at the expense of slow rebuilds and poor scalability per dollar. The team has instead started looking at object and block based storage. This chart compares them side by side:

2015-11-10_15-57-54

Object storage looks very promising. It is widely used in, e.g., cloud storage, Open Stack Swift and others.  In particular, it supports Erasure Coding and “Information Dispersal,” whereby data are divided into m fragments, then recoded into n (n>m) fragments. After assigning  object IDs to all fragments, they can be dispersed into a storage system, and the system can recover the data from any m fragments. The technique offers :

  • Better storage costs at scale than RAID
  • Extreme scalability
  • Excellent aggregate bandwidth (10s Gbps)
  • “Self-healing,” in which the system determines bit loss and replaces replicates the data correctly.

I hope the team continues to report on progress as they develop their data management system, as I think they will learn much of value to the rest of astronomy.

I wish to thank Dr Steve Berukoff for his assistance in preparing this post.



Source: https://astrocompute.wordpress.com/2015/11/10/petascale-data-management-in-solar-physics-approach-of-the-dkist-data-center/

Report abuse

Comments

Your Comments
Question   Razz  Sad   Evil  Exclaim  Smile  Redface  Biggrin  Surprised  Eek   Confused   Cool  LOL   Mad   Twisted  Rolleyes   Wink  Idea  Arrow  Neutral  Cry   Mr. Green

Top Stories
Recent Stories

Register

Newsletter

Email this story
Email this story

If you really want to ban this commenter, please write down the reason:

If you really want to disable all recommended stories, click on OK button. After that, you will be redirect to your options page.