X-Message-Number: 5571 Date: Fri, 12 Jan 1996 08:51:34 -0800 (PST) From: Joseph Strout <> Subject: Re: Data storage [#5562] Comments to Perry: I don't understand why you feel it necessary to get nasty. Perhaps you felt you had asserted your authority already, and when I continued to question or qualify your statements, sarcastic retorts were your only option. I assure you, it isn't so: a far better option would be to explain things patiently, to help us understand, rather than trying to ram your opinion down our throats. General reply: > > Yes, but I don't think this is as hard as you make it sound. A simple > > checksum per block of data would suffice to detect errors. > > [sarcastic comment deleted] A simple checksum will > detect a tiny fraction of errors and provides you with no mechanism > for fixing them. If you really want the data safe, you need a lot > better than simple checksums. As I'm sure you know (or should be able to guess), I do have training in statistics. And yes, I should have said "suffice to detect most errors", but we do sometimes leave details implied. I don't think "a tiny fraction of errors" is accurate, though -- you'll have to show me a reference to make me believe that. I actually used checksums for several years' worth of data entry, and I never made an error that wasn't immediately caught by it. The longer the block of data, the more sophticated (or long) the checksum needs to be, of course. > > Keep two copies of the data, and when an error is detected (due to a > > checksum mismatch), make a fresh copy from the good one. > > [nasty irrelevancy deleted] This is NOT a way to keep > your data safe. The odds of losing a block of data completely this way > are very high. Naturally there are more sophisticated error-correction methods, but this extremely simple approach that's not as bad as all that. If the probability of an error in a block within any given check period is p, then the probability of an error happening in both copies of a block simultaneously somewhere in a set of N blocks is P = 1-(1-p*p)^N. If we let p=0.1% (one error in a thousand, which is quite high), then P doesn't reach even a 1% chance until we have ten thousand blocks of data. (And note, the result of such an happenstance would be an uncorrectable bit-flip somewhere in a block, NOT "losing a block of data completely".) With a more reasonable error rate of p=0.001%, then you'd need a hundred million blocks of data to get a 1% chance of failure. I'm not suggesting it actually be done this way; I merely pointed it out as a simplest-case proof of concept. I apologize for wasting the reader's time with the numbers, but they were apparently necessary. > Basically, with modern techniques, you have to ask yourself what > probability of data loss is acceptable, and then pick a scheme to meet > those standards. Agreed. Why don't you suggest something more sophisticated? > Remember that perfect fidelity is hard. If you want to assure that > there is no loss in a multi-terabyte database over a period of > thousands of years with a fairly normal media failure rate, you have > to get quite radical. Even if the odds of losing an individual block > are low, you have billions of blocks to multiply that loss by. Right, but it would take a lot of cryonicists to come up with multiple terabytes of notes and letters. My data is probably 10 MB of text, and maybe 100 MB of images if I gathered them all together. Give me 250MB to be safe. And I suspected that this was more than most people have. Also, perfect fidelity isn't necessary. Anything would be better than what we have now, which is almost certain complete loss of data (except for the papers or whatever that CI keeps for its patients). Similarly, my suggestions may be primitive, but they are better than no suggestions at all. Unless I have missed it, you have so far suggested only microfilm. Can you (or anyone) tell us more about this option -- how expensive is it to make? Can we store data on it digitally to be more compact and allow lossless copying? It does have advantages in dealing with info which is currently only in hardcopy; I have some old magazine articles, for example, which I would have to scan in to store by computer methods. Microfilm might be more straightforward and compact. Are there other options available? Let's work together on this, rather than against each other. ,------------------------------------------------------------------. | Joseph J. Strout Department of Neuroscience, UCSD | | http://www-acs.ucsd.edu/~jstrout/ | `------------------------------------------------------------------' Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=5571