Data Storage

X-Message-Number: 5562
Subject: Data Storage
Date: Thu, 11 Jan 1996 11:45:38 -0500
From: "Perry E. Metzger" <>

> From: Joseph Strout <>

[Comments about one-off CDs and recordable CDs elided]

Let me make something clear once and for all from a professional:

CD-R is NOT NOT NOT an archival medium. I almost guarantee you
substantial data loss from CD-Rs in the long run. They use a
photochemical process to record the data, and the compounds in
question are not stable over decades. All of Edgar Swank and everyone
elses's comments on the subject must be viewed in the light that
one-off CDs are of no use whatsoever for our purposes.

As an example, take this comment...

> > OTOH, CDROM recorders for use with PC's now start in price around
> > $1000, so anyone with a PC system already and especially if he already
> > has a CDROM recorder for other purposes might want to assist on this.
> 
> ...so rather than pay the above company forty bucks a disc, an alert 
> cryonics provider might just invest in their own recorder.

These recorders are absolutely useless for our purposes. I wouldn't
even bother to think about them.

> > 1) Once you get to a small enough level, "physical" and "chemical" are
> >    the same thing.
> 
> Yes, but the CD-ROM pits aren't on a level that small.  I don't have the 
> figures, but I have the impression that they're at least a few microns 
> wide and deep, which is MUCH bigger than a molecular scale.  Can anyone 
> give us the exact numbers?

Its not so big a scale that oxidization or other chemical processes
can't completely wipe out the data. This is not like storing bits in
yellow and red bricks on the side of a house. We are talking about
sensitive stuff.

CD-R is even worse because the photochemical processes in question
are even more unstable.

> >    Some early CDs sold at the beginning of the CD era have
> >    become useless because of glues decaying or opaquing.
> 
> I think you think that the data is gone when you can't put the disk in 
> your home CD reader and read it.  In fact, however, the data is almost 
> certainly there; you'd just need a microscope to see it, and more 
> sophisticated automation to read it.

Its possible that you could read the stuff given enough money for
expensive labs or nanotechnology -- its also possible that the data is
gone. Do you want to take that risk when better methods are available?

I personally am not a fan of the Nanorapture point of view. Just
because the data may be in there somewhere doesn't mean we shouldn't
do our best not to have it become inaccessable in the first place. We
have no idea what the future might hold.

[Commenting on my notes on film]
> 
> This is not comparable, because you're not attempting to read this data 
> digitally.  If you did, they would not have demonstrated nearly this 
> lifetime.  Instead, you look at them visually, as *analog* image data, 
> which your brain (being very good at such things) can interpret despite a 
> great deal of fading and noise.  To digital data, fading and noise mean 
> "loss" to an ordinary reader.  So you're not judging the two by the same 
> criteria at all.

Actually, this is quite thoroughly untrue. Recorded with error
correction schemes, digital data can handle far more degradation than
analog data. The difference is that the digital data has a non-smooth
decay -- that is, the number of bits you can extract looking at a
picture that is decaying goes down smoothly, whereas the number of
bits you can extract from a heavily ECCed digital recording (say,
machine readable patterns on film with heavy ECC) stays steady and
then falls off nearly to zero at once.

> > 2) Put the stuff on multiple redundant machine readable storage media,
> >    use fiendish and expensive error correcting codes that would not
> >    normally be used, and read and re-record the information onto the
> >    most survivable known archival media every couple of years.
> 
> Yes, but I don't think this is as hard as you make it sound.  A simple 
> checksum per block of data would suffice to detect errors.

Oh, really? Read up on statistics some time. A simple checksum will
detect a tiny fraction of errors and provides you with no mechanism
for fixing them. If you really want the data safe, you need a lot
better than simple checksums.

> Keep two copies of the data, and when an error is detected (due to a
> checksum mismatch), make a fresh copy from the good one.

Please learn a bit about information theory. This is NOT a way to keep
your data safe. The odds of losing a block of data completely this way
are very high. The odds of not being able to detect an error in either
of the redundant blocks are high because you are using a simple
checksum, and because you are using simple duplication instead of some
sort of error correcting code, there is no way to regenerate data
given a failure. If you are willing to double your storage space, you
can do far, far better than this.

Basically, with modern techniques, you have to ask yourself what
probability of data loss is acceptable, and then pick a scheme to meet
those standards.

Remember that perfect fidelity is hard. If you want to assure that
there is no loss in a multi-terabyte database over a period of
thousands of years with a fairly normal media failure rate, you have
to get quite radical. Even if the odds of losing an individual block
are low, you have billions of blocks to multiply that loss by.

Incidently, the reason for multiple copies is to make sure that you
can survive a total disaster (i.e. a fire or what have you) and not to
provide normal corrective redundancy.

Perry


Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=5562