X-Message-Number: 26124 From: Date: Tue, 3 May 2005 07:45:10 EDT Subject: Uploading (3.iii.0) The linked neurons. Uploading (3.iii.0) The linked neurons. A first organization for fast signals processed by FPGA would exploit the speed of electronics devices. At synapse level, the shortest signal takes from two to five milliseconds, a digitalised form suitable for FPGA processing would be sampled at two time that value to recover all the information content. This is merly the Shannon theorem for digital data sampling. That put the FPGA clock for one neuron at 1 kHz. That would let some time to reconfigure the circuit and simulate another neuron or neuron block. Assume half the time is spent in such a reconfiguration process, one neuron block would needs a frequency near 2 kHz. Now the real clocking frequency in a current FPGA is beyond 300 MHz or 150,000 times faster. If the processing takes 1,000 clock cycles, a single material circuit can then simulate 150,000/1,000 neuron blocks, that is: 150. What if a neuron, say X1, in block X must communicate with another, say Y1 on the same FPGA in block Y ? The output of X must be sent to a memory and after each reconfiguration the memory must broadcast the output of neuron X1. The memory unit don't know when Y1 will be simulated, it may be one reconfiguration after X or 149 later or anything between these values. How did a synapse, says Syn1Y1, from Y1, will know that the signal it receives is from neuron X1? The only way it can select it is to tag the signal with a code giving its origin. At the synapse entry there must be a filter looking at all possible codes and selecting the right one. The code number is set at the computer level, the slow processing element. Changing the code would wire the dendrite to another axon terminal from X1 or another neuron. The big problem in that scheme is that the broadcasting system must send the information 150 times faster than it receive it, beacause at each simulated bloc it must broadcast all the informations of the 149 other blocks. The real situation is even worst: a single FPGA don't contains all neurons, so there are many chips and one neuron on chip T must be able to sent its information to chip U. If there is four billions neurons, each address will have 32 bits and in each simulation cycle with duration 1/150,000 second, all 4 billons addesses, each with 32 bits must be broadcast. This is unworkable, so there must be a way to reduce the data flow. The first step is to see that a single neuron may be linked at most with 10,000 others, not four billions. If at a given time there is 1,000 simulated neurons on the FPGA, at most 10 millions addresses must be broadcast. With 32 bits each, that is 320 Mbites or 4 MBytes, something far more manageable it seems. But don't forget that must be done 150,000 times a second! From block to block, the signal will be not the same. The output of a neuron is no more a yes/no action potential, it is the neuron address. The receptor will look in a table what are the neurons receiving that signal. For example neuron X send its address: ADDR_X, the receptor will load the signal in ADDR_Y1, ADDR_Z124, ADDR_X28,... and so on. ADDR_Y1 will be put in the table broadcasted to the same chip 41 blocks after, ADDR_Z124 will be sent to the manager of another chip where it will be put in a table 12 blocks later and so on. There is one restriction in this system: Neurons simulated at the same time on different chips can't communicate. It is the job of the computer level management not to allocate linked neurons to the same time slot on different chips. The next round in data reduction comes from the fact that very few signal are "on" in a given computing cycle and even less display a change of state. This may be the case for one in one hundred at most. Only these addresses have to be broadcasted. The signal for each block contains 32 bits x 100,000 active addresses or 3.2 Mb, the frequency is 150,000 times greather or 480 GHz. With a 32 bits bus that is a "mere" 15 GHz, something at the limit of HEMT technology. There may be more gain with reduced address lenght, an addressing power in the10 millions range is sufficient, no need for a full brain address at each chip, only the difference from one address to the next may be sent, another way to gain a factor between 2 and 4. With these well known schemes, the frequency could be brought back to what is done on the current microprocessor generation. It is common to read about the wiring impossibility of a full brain, here this problem has been solved for the most part. Yvan Bozzonetti. Content-Type: text/html; charset="US-ASCII" [ AUTOMATICALLY SKIPPING HTML ENCODING! ] Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=26124