X-Message-Number: 26124
From: 
Date: Tue, 3 May 2005 07:45:10 EDT
Subject: Uploading (3.iii.0) The linked neurons.

 Uploading (3.iii.0) The linked neurons.
 
 A first organization for fast signals processed by FPGA would exploit the 
speed of electronics devices. At synapse level, the shortest signal takes from 

two to five milliseconds, a digitalised form suitable for FPGA processing would
be sampled at two time that value to recover all the information content. 
This is merly the Shannon theorem for digital data sampling. That put the FPGA 
clock for one neuron at 1 kHz. That would let some time to reconfigure the 

circuit and simulate another neuron or neuron block. Assume half the time is 
spent 
in such a reconfiguration process, one neuron block would needs a frequency 

near 2 kHz. Now the real clocking frequency in a current FPGA is beyond 300 MHz
or 150,000 times faster. If the processing takes 1,000 clock cycles, a single 
material circuit can then simulate 150,000/1,000 neuron blocks, that is: 150.
 
What if a neuron, say X1, in block X must communicate with another, say Y1 on 
the same FPGA in block Y ? 
The output of X must be sent to a memory and after each reconfiguration the 
memory must broadcast the output of neuron X1. The memory unit don't know when 
Y1 will be simulated, it may be one reconfiguration after X or 149 later or 
anything between these values. How did a synapse, says Syn1Y1, from Y1, will 

know that the signal it receives is from neuron X1? The only way it can select 
it 
is to tag the signal with a code giving its origin. At the synapse entry 

there must be a filter looking at all possible codes and selecting the right 
one. 
The code number is set at the computer level, the slow processing element. 
Changing the code would wire the dendrite to another axon terminal from X1 or 
another neuron.
 
 
The big problem in that scheme is that the broadcasting system must send the 
information 150 times faster than it receive it, beacause at each simulated 
bloc it must broadcast all the informations of the 149 other blocks. The real 

situation is even worst: a single FPGA don't contains all neurons, so there are
many chips and one neuron on chip T must be able to sent its information to 

chip U. If there is four billions neurons, each address will have 32 bits and in
each simulation cycle with duration 1/150,000 second, all 4 billons addesses, 
each with 32 bits must be broadcast. This is unworkable, so there must be a 
way to reduce the data flow.
 
The first step is to see that a single neuron may be linked at most with 
10,000 others, not four billions. If at a given time there is 1,000 simulated 

neurons on the FPGA, at most 10 millions addresses must be broadcast. With 32 
bits 
each, that is 320 Mbites or 4 MBytes, something far more manageable it seems. 
But don't forget that must be done 150,000 times a second!
 
 From block to block, the signal will be not the same. The output of a neuron 
is no more a yes/no action potential, it is the neuron address. The receptor 
will look in a table what are the neurons receiving that signal. For example 
neuron X send its address: ADDR_X, the receptor will load the signal in 
ADDR_Y1, ADDR_Z124, ADDR_X28,... and so on. ADDR_Y1 will be put in the table 

broadcasted to the same chip 41 blocks after, ADDR_Z124 will be sent to the 
manager of 
another chip where it will be put in a table 12 blocks later and so on. There 
is one restriction in this system: Neurons simulated at the same time on 

different chips can't communicate. It is the job of the computer level 
management 
not to allocate linked neurons to the same time slot on different chips.
 
The next round in data reduction comes from the fact that very few signal are 
"on" in a given computing cycle and even less display a change of state. This 
may be the case for one in one hundred at most. Only these addresses have to 
be broadcasted. The signal for each block contains 32 bits x 100,000 active 

addresses or 3.2 Mb, the frequency is 150,000 times greather or 480 GHz. With a
32 bits bus that is a "mere" 15 GHz, something at the limit of HEMT 
technology.
 
There may be more gain with reduced address lenght, an addressing power in 
the10 millions range is sufficient, no need for a full brain address at each 

chip, only the difference from one address to the next may be sent, another way
to gain a factor between 2 and 4. With these well known schemes, the frequency 
could be brought back to what is done on the current microprocessor 
generation.
 
It is common to read about the wiring impossibility of a full brain, here 
this problem has been solved for the most part.
 
Yvan Bozzonetti.


 Content-Type: text/html; charset="US-ASCII"

[ AUTOMATICALLY SKIPPING HTML ENCODING! ] 

Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=26124