Re: Data storage [#5562]

X-Message-Number: 5571
Date: Fri, 12 Jan 1996 08:51:34 -0800 (PST)
From: Joseph Strout <>
Subject: Re: Data storage [#5562]

Comments to Perry:

	I don't understand why you feel it necessary to get nasty.  
Perhaps you felt you had asserted your authority already, and when I 
continued to question or qualify your statements, sarcastic retorts were 
your only option.  I assure you, it isn't so: a far better option would 
be to explain things patiently, to help us understand, rather than trying 
to ram your opinion down our throats.

General reply:

> > Yes, but I don't think this is as hard as you make it sound.  A simple 
> > checksum per block of data would suffice to detect errors.
> 
> [sarcastic comment deleted]  A simple checksum will
> detect a tiny fraction of errors and provides you with no mechanism
> for fixing them. If you really want the data safe, you need a lot
> better than simple checksums.

As I'm sure you know (or should be able to guess), I do have training in 
statistics.  And yes, I should have said "suffice to detect most errors", 
but we do sometimes leave details implied.  I don't think "a tiny 
fraction of errors" is accurate, though -- you'll have to show me a 
reference to make me believe that.  I actually used checksums for several 
years' worth of data entry, and I never made an error that wasn't 
immediately caught by it.  The longer the block of data, the more 
sophticated (or long) the checksum needs to be, of course.

> > Keep two copies of the data, and when an error is detected (due to a
> > checksum mismatch), make a fresh copy from the good one.
> 
> [nasty irrelevancy deleted]  This is NOT a way to keep
> your data safe. The odds of losing a block of data completely this way
> are very high.

Naturally there are more sophisticated error-correction methods, but this
extremely simple approach that's not as bad as all that.  If the
probability of an error in a block within any given check period is p,
then the probability of an error happening in both copies of a block 
simultaneously somewhere in a set of N blocks is P = 1-(1-p*p)^N.  If we 
let p=0.1% (one error in a thousand, which is quite high), then P doesn't 
reach even a 1% chance until we have ten thousand blocks of data.  (And 
note, the result of such an happenstance would be an uncorrectable 
bit-flip somewhere in a block, NOT "losing a block of data completely".)  
With a more reasonable error rate of p=0.001%, then you'd need a hundred 
million blocks of data to get a 1% chance of failure.

I'm not suggesting it actually be done this way; I merely pointed it out 
as a simplest-case proof of concept.  I apologize for wasting the 
reader's time with the numbers, but they were apparently necessary.

> Basically, with modern techniques, you have to ask yourself what
> probability of data loss is acceptable, and then pick a scheme to meet
> those standards.

Agreed.  Why don't you suggest something more sophisticated?

> Remember that perfect fidelity is hard. If you want to assure that
> there is no loss in a multi-terabyte database over a period of
> thousands of years with a fairly normal media failure rate, you have
> to get quite radical. Even if the odds of losing an individual block
> are low, you have billions of blocks to multiply that loss by.

Right, but it would take a lot of cryonicists to come up with multiple 
terabytes of notes and letters.  My data is probably 10 MB of text, and 
maybe 100 MB of images if I gathered them all together.  Give me 250MB to 
be safe.  And I suspected that this was more than most people have.

Also, perfect fidelity isn't necessary.  Anything would be better than
what we have now, which is almost certain complete loss of data (except
for the papers or whatever that CI keeps for its patients).  Similarly, my
suggestions may be primitive, but they are better than no suggestions at
all.  Unless I have missed it, you have so far suggested only microfilm. 
Can you (or anyone) tell us more about this option -- how expensive is it
to make?  Can we store data on it digitally to be more compact and allow
lossless copying?  It does have advantages in dealing with info which is
currently only in hardcopy; I have some old magazine articles, for
example, which I would have to scan in to store by computer methods. 
Microfilm might be more straightforward and compact. 

Are there other options available?  Let's work together on this, rather 
than against each other.

,------------------------------------------------------------------.
|    Joseph J. Strout           Department of Neuroscience, UCSD   |
|               http://www-acs.ucsd.edu/~jstrout/  |
`------------------------------------------------------------------'


Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=5571