X-Message-Number: 5562 Subject: Data Storage Date: Thu, 11 Jan 1996 11:45:38 -0500 From: "Perry E. Metzger" <> > From: Joseph Strout <> [Comments about one-off CDs and recordable CDs elided] Let me make something clear once and for all from a professional: CD-R is NOT NOT NOT an archival medium. I almost guarantee you substantial data loss from CD-Rs in the long run. They use a photochemical process to record the data, and the compounds in question are not stable over decades. All of Edgar Swank and everyone elses's comments on the subject must be viewed in the light that one-off CDs are of no use whatsoever for our purposes. As an example, take this comment... > > OTOH, CDROM recorders for use with PC's now start in price around > > $1000, so anyone with a PC system already and especially if he already > > has a CDROM recorder for other purposes might want to assist on this. > > ...so rather than pay the above company forty bucks a disc, an alert > cryonics provider might just invest in their own recorder. These recorders are absolutely useless for our purposes. I wouldn't even bother to think about them. > > 1) Once you get to a small enough level, "physical" and "chemical" are > > the same thing. > > Yes, but the CD-ROM pits aren't on a level that small. I don't have the > figures, but I have the impression that they're at least a few microns > wide and deep, which is MUCH bigger than a molecular scale. Can anyone > give us the exact numbers? Its not so big a scale that oxidization or other chemical processes can't completely wipe out the data. This is not like storing bits in yellow and red bricks on the side of a house. We are talking about sensitive stuff. CD-R is even worse because the photochemical processes in question are even more unstable. > > Some early CDs sold at the beginning of the CD era have > > become useless because of glues decaying or opaquing. > > I think you think that the data is gone when you can't put the disk in > your home CD reader and read it. In fact, however, the data is almost > certainly there; you'd just need a microscope to see it, and more > sophisticated automation to read it. Its possible that you could read the stuff given enough money for expensive labs or nanotechnology -- its also possible that the data is gone. Do you want to take that risk when better methods are available? I personally am not a fan of the Nanorapture point of view. Just because the data may be in there somewhere doesn't mean we shouldn't do our best not to have it become inaccessable in the first place. We have no idea what the future might hold. [Commenting on my notes on film] > > This is not comparable, because you're not attempting to read this data > digitally. If you did, they would not have demonstrated nearly this > lifetime. Instead, you look at them visually, as *analog* image data, > which your brain (being very good at such things) can interpret despite a > great deal of fading and noise. To digital data, fading and noise mean > "loss" to an ordinary reader. So you're not judging the two by the same > criteria at all. Actually, this is quite thoroughly untrue. Recorded with error correction schemes, digital data can handle far more degradation than analog data. The difference is that the digital data has a non-smooth decay -- that is, the number of bits you can extract looking at a picture that is decaying goes down smoothly, whereas the number of bits you can extract from a heavily ECCed digital recording (say, machine readable patterns on film with heavy ECC) stays steady and then falls off nearly to zero at once. > > 2) Put the stuff on multiple redundant machine readable storage media, > > use fiendish and expensive error correcting codes that would not > > normally be used, and read and re-record the information onto the > > most survivable known archival media every couple of years. > > Yes, but I don't think this is as hard as you make it sound. A simple > checksum per block of data would suffice to detect errors. Oh, really? Read up on statistics some time. A simple checksum will detect a tiny fraction of errors and provides you with no mechanism for fixing them. If you really want the data safe, you need a lot better than simple checksums. > Keep two copies of the data, and when an error is detected (due to a > checksum mismatch), make a fresh copy from the good one. Please learn a bit about information theory. This is NOT a way to keep your data safe. The odds of losing a block of data completely this way are very high. The odds of not being able to detect an error in either of the redundant blocks are high because you are using a simple checksum, and because you are using simple duplication instead of some sort of error correcting code, there is no way to regenerate data given a failure. If you are willing to double your storage space, you can do far, far better than this. Basically, with modern techniques, you have to ask yourself what probability of data loss is acceptable, and then pick a scheme to meet those standards. Remember that perfect fidelity is hard. If you want to assure that there is no loss in a multi-terabyte database over a period of thousands of years with a fairly normal media failure rate, you have to get quite radical. Even if the odds of losing an individual block are low, you have billions of blocks to multiply that loss by. Incidently, the reason for multiple copies is to make sure that you can survive a total disaster (i.e. a fire or what have you) and not to provide normal corrective redundancy. Perry Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=5562