Random Ramblings II

X-Message-Number: 4393
Date: Fri, 12 May 1995 17:27:44 +0200 (MET DST)
From: Eugen Leitl <>
Subject: Random Ramblings II 

Random Ramblings II.

0. Motivation and Apology.

I promised to continue some of my Random Ramblings, sorry it
took so long. This post strives even farther from the cryonics
mainstream, even connections to uploading it purports to contain
might not be instantly detectable. Nevertheless, they are there.
The picture will be clearer in the final part yet to come, so
hang on. I could not mention all topics, the text is incomplete
while certainly too large already. Again, I apologize for
wasting bandwidth of this group.

I tried to be succinct without introducing intolerably high bit
rates. I didn't use any mathematics (there should be lots) since
I have trouble with it. Moreover its ASCII transcription would
be hard to read. Go for Penrose and Kauffman and (mostly) their
references to obtain the real hog.

Things still to go:

- what is a packet switched network; how the internet routes
- what is a hypergrid and why is it a good network topology
- what are CA', what is their relation to NNs, how can one
  do computations with CAs, how does one implement them
  efficiently in silicon
- an abstract CA-like NN model in EBNF
- efficient implementation of it in silicon, estimations of performance
- efficient Si implementation of a hypergrid router
- how does one implement CAs in molecular circuitry, Soft Circuits (SC),
  hypergrid mapping upon 2d/3d grid, difficulties, perspectives, etc.

1. The internet, an instance of a packet switched network

1.1 Why mentioning it at all?

Good grief, what has the bloody internet to do with uploading? -
you will ask. Simple. The architecture of the emulation engine
to hold and process the structure representing your mind has to
be a massively parallel (maspar) fine-grained machine, a
deeply-intertwined conglomerate of active nodes communicating
via a packet-switched network. For many essential reasons not to
be discussed here due to lack of space.

I chose internet as our starting point for several reasons. Most
readers will be familiar to internet, why, this very text has
been distributed and delivered by it. A packet-swiched (PS)
network needs its own language and imagery, so the internet will
be a model case to start with. Moreover, analyzing its features
and deficiences we will be able to extrapolate a better indexing
and routing scheme. An overambitious goal? You bet. An arrogant
attitude? As hell. But woosh, here we go.

1.2 Introduction to PS Networks

Contrary to the old phone network, there is usually no direct
physical connection between sender/target in the packet switched
network. The venerable snail mail is a much better metaphor: the
data stream is hacked up into pieces, which are packed up in
envelopes (provided with a time stamp, address, etc.), and
thrown into the net. Jumping from node to node, the packets find
their destination via different routes like intelligent beings,
get sorted according to the logical sequence of departure, strip
their envelopes and thus regenerate the original continous data
stream, as issued by individual talking objects.

But first things first. The ISO Open System Interconnection (OSI)
model describes all information transfer in terms of seven layers.

  7 Application  - end-user services
  6 Presentation - formatting, encryption
  5 Session	 - user interface
  4 Transport	 - end-to-end message control
  3 Network	 - message exchange control (e.g. Routing)
  2 Data link	 - subnetwork data transmission
  1 Physical	 - electrical interface (e.g. RS 232 definition)

Only the lowest layer, where the signal levels, pin layout etc.
are defined, is responsible for direct information transfer.
Actually, OSI is a hierarchy of virtual channels, each defined
in terms of the underlying layer. Only.

What for?

Because of this information insulation/protection not the whole beast
has to be redesigned if one layer gets changed.

1.3 Some features of a good packet-switched network:

The sender as well as the destination must have an _identity_,
which ideally should not change over time. This
uniquity/distinguishablity is usually achieved through
assignation of an unique ID/address to each node. The set of all
valid IDs is called ID space. Currently, all networks nodes' IDs
are assigned manually using a following scheme: a central
administrating authority cuts up ID space into segments, to be
distributed to several subadministrators, and these... all the
way down to the local node. Sometimes, the original ID space
cake turns out to be too small (as recently happened to the
internet) and has to be expanded.

Ideally, each node should derive it's ID from it's wiring
constraints and the portion of the net (more correctly, its IDs)
it perceives directly, instead of being assigned by a central
authority. This labeling scheme should be robust and
expandable/scalable up to wide area network (WAN) scale. Imagine
it as a phase transition, where each node strives to obtain it's
position in ID space utilizing only locally available
information (purely local information flow). In a finite
iterative process, each node can obtain enough information to be
able to pinpoint his position in the ID space to perfect
certainty (zero fuzzy factor). Note, that this network
description bears striking resemblance to the cellular automaton
(CA) information processing paradigm, to be discussed ad nausem
below. This is deliberate. Of course administratory packet
overhead should not reduce bandwidth noticeably.

Since the net will mapped to realspace (2d for a wafer, 3d for a
module of several), we can say where a particular ID will be
(predict its physical coordinates). This gives us the minimum
value for wire lengths.

Since the total net can be very big, each node should hold only
a small fraction of it in its memory (local view, overlapping
spheres). This requires the node ID to provide sufficient
information for the data packet which local path to choose.
This, in turn, requires the network topology to be regular, a
kind of a higher-dimensional lattice, where data packets route
via a higher-dimensional Bresenham homologue. In the Bresenham,
a 2d/3d line-drawing algorithm, a decision which pixel from
several adjacent ones to choose from is derived from a)
knowledge where we are, b) where we want to go and c) from
intrinsic regularity of the grid/lattice. Blockage due to
defective cells is virtually nonexistant in higher-n spaces.

I refer to this higher-dimensional Bresenham homologue routing
as grassrouting due to its low overhead. See n-cube and n-grid
for instances of grassrouting.

Very high degrees of defectivity should be tolerated without
impairing routing efficiency. Higher-order protocols should take
care of packet lossage, do routine encryption and provide
authenisation/authorisation, a major security issue.

Of course, none of it is relevant on an island single-user
implementation. The neural engine for uploading is one. I/O to
realspace/VR is done via defined interfaces (sensomotoric I/O).
Read/Write to the "me" data structure is forbidden/restricted
e.g. to you/a group of your clones or to your virtual physician.
A link to uploading: the individual objects in the net are
neurons (there might be several in a single node), they have a
state at time t. They change the states of other neurons at t+1
by sending them information packets, modulated by synapse
strengths assotiated with each link/connection/edge. In turn,
incoming packets will influence the local state at t+1, a
function of local state at t and the sum of incoming packet
influences. Cryptic? Read the CA part, then it will be obvious.

1.4 Glossary

Caution: these are my definitions and may differ from those in
other sources. A lot is missing. Read at your own risk. Due to
many xrefs this should have been done as hypertext, actually.

adaptive routing (generic term)
  a smart routing scheme, which watches/benchmarks
  the network and utilizes this information to
  increase efficiency. Opposite non-adaptive (dumb)
  routing. Administrative information uses bandwidth
  but increases effective point-to-point bandwidth.
  Obviously, the gains should be greater than expenses
  in terms of complexity and fragility.

address/name (generic)
  an unique ID. Member of address/name/ID space
  Makes nodes discernible; unique.

Backward learning (generic)
  a routing scheme using a packet delay list
  in each node. Delta-updates entries continously
  examining incoming packet's time stamps.
  Can begin with no prior knowledge (without map).
  Is sometimes utilized by intelligent bridges.

Bandwidth (information theory)
  the amount of information per second a device
  can handle. The information flow rate in bits/s (bps) or
  packets/s (pps). The total network bandwidth is
  integrated over the individual
  channel bandwidth it consists of.

Bridge (device)
  A device to bridge two otherwise unconnected
  networks. Addressing scheme has to be compatible.
  Can be somewhat smart.

Channel (information theory)
  a channel to move information. It can be mono or bidirectional.
  It has a latency or a signal delay. It has a bandwidth, usually
  measured in bits/s and an error rate (errors/bits) or frequency
  (errors/s).

Cluster (generic)
  an additional information allowing objects to be
  classified in groups. Alternatively, an object's property.
  There is a binary metric (a predicate)
  telling which objects belong to which cluster. Each object
  can belong to several clusters. In security lingo, they
  are usually referred to as groups. Each member of a group
  has the same rights. A mapping of vector space element pairs
  to boolean space. Fuzzy clustering is possible utilizing
  nonboolean metrics.

Connectivity matrix
  a boolean matrix. Aij being TRUE means the
  nodes i j are connected (have an edge). An
  implementation of the boolean connectivity
  metric.
  See also map, cluster, metric, edge, node.

Connected nodes
  two nodes having at least one path between them

Connectivity
  a list of neighbour node's IDs

DNS
  internet's domain name system. Maps the string
  namespace to node address space. Isolating
  abstractional layer, allows transparent change
  of the underlying layer.

Domain
  a cluster of nodes. Used by hierarchical routing.
  n-grids uses an orthogonal domain addressing,
  a binary clustering of subdomains ((n-1)-subgrids,
  connected by (n-1) links).

Edge
  a boolean metric (predicate) for node pairs
  in a given space (graph).

Global Optimization
  A routing scheme utlizing fixed path tables.
  Needs a central agency. Performs bad at
  high loads.

Group
  see cluster.

Hierarchical routing
  due to limited node grain size, in
  WANs spatially neighboured nodes are
  clustered in domains. Each domain is
  connected to each others via a single
  link. A node can only route within its
  domain. Moreover, it knows which
  nodes within his domain have links to
  the other domains. A domain is
  nonpenetrable to
  foreign domain' packets.

Flooding
  A routing scheme where several cloned packets
  traverse different paths concurrently.

  Plus: automagically circumvents defects
	finds fastest/shortest path automatically

  Minus: reduces overall bandwidth
	 clones must be recognizable
	 packets must have limited life time to
	 prevent indefinitely looping packets

  Geographical flooding: flooding utilizing
  position information.

Hot-potato routing
  a local-knowledge routing scheme. Tries to
  get rid of data packets as fast as possible
  giving it to the link with minimum queue
  length. Infinite packet lifetime. Can
  produce oscillating data packets. Performs
  better with some extensions (Random link selection,
  etc.).

Information
  the raw stuff we are pumping. The amount of
  knowledge necessary to select a single case
  from a set. Is measured in bits.

Link
  Connection between two nodes. Synonym: edge,
  vertice.

Loop
  a path having the same node at least twice

Lossage
  ratio of packets not reaching its destination
  due to hardware or routing trouble.
  lossage=arrived/issued.

Map
  a global list of connectivities for each
  node (see Connectivity matrix).

Metric
  assigns a scalar (real or integer) to each two
  elements of n-space. E.g. euclidian metric (geometric
  distance d(a,b)=sqr(dx^2+dy^2+dz^2) for 3-space) or
  the Hamming distance for binary vector pairs (number
  of bits where two vectors do differ). Special case:
  boolean metric (predicate). Maps n-space vector pairs to
  the boolean (=true,false) space.

Node
  Graph theory lingo (acc. vertice, edge)..
  Source and sink of data packets which travel
  the edges (links, connections) between two nodes.

Neigbour nodes
  two nodes having at least one edge between them.

Path
  a list of nodes, each pair of them connected with
  least one edge

Packet
  Atom of information in a packet switched network. Can
  be fixed/variable sized (up to a maximum). Consists
  of an envelope which auxiliary information and
  the data cargo.

Ping (program)
  a internet troubleshoot program. Gives benchmark
  data on packet throughput/lossage between two
  TCP/IP addresses.

Queue
  a FIFO (first in first out) buffer

Routing
  3,4 OSI layer. The art of finding the right path
  for the data packets.

Repeater (device)
  A device to discern signals from noise
  and ampflify to increase signalling length.
  Pretty dumb.

Stafette routing
  each node benchmarks the node periodically.
  Empirical values are exchanged between direct
  neighbours. Information spreads successively
  over the net. A dead link has infinite delay.
  In the beginning internet worked thus. Now each
  node has a total map. Benchmarked results
  are spread by flooding. To prevent packet
  oscillation IP uses maximum packet life time
  of 255 secs.

TCP/IP
  transfer control protocol/internet protocol
  is data-moving internet layer. OSI 3..4 (?)

Virtual transfer
  finds a good path, then allocates
  the resources for route to be used
  during whole transfer.

WAN
  Wide Area Network, the big brother of LAN
  (Local Area Network). Internet is a WAN.
  Note that TCP/IP can also be used in a LAN.

Weight
  a number attached to an edge. Can mean a
  delay or mean packet throughput (speed), queue
  length etc.
  An edge can have several weights.

Wire length
  the total physical wire length in the network.
  For financial/space cost reasons, has to be kept
  at a minimum at maximum bandwidth.

1.5 How the Internet Routes

The internet utilizes a hierarchical routing with staffete
routing (+flooding).

For a variety of reasons, internet's addressing and thus also
the routing is clearly suboptimal. Especially, the high
administratory overhead in address assignment and necessary
size/complexity quantization of the routing grain (very coarse)
are negative. Use of n-grid network topology allows the router
to become simple enough to be on-die integrable. This is
qualitatively different approach to current networks, where
routers are 100 k$ hardware boxes. Implementations of failure
tolerant ultra high performance wafer-scale-integrated (WSI)
MISC-CPUs/DRAMs/Routers or special-purpose-engines/Routers (all
on single die) appears possible and desirable. Unfortunately,
this is not mainstream (but instances there are) as the market
seems not to demand it.

2. A better mousetrap

Cryonics-irrelevant stuff about the gloomy future of the net
ommited.

3. The Boolean Hypercube and the Hypergrid

Imagine a standard three-dimensional cube with an edge of length
1, the first node lying at coordinates (0,0,0), the second at
(0,0,1), the third ... (0,1,0), (0,1,1), (1,0,0), (1,0,1),
(1,1,0) and the eighth at (1,1,1) (see picture). It is a member

 Y  ^
    |	    Z
    |	   /
 011| o---/----o 111
    |/|  /    /|
010 o-+-+----o | 110   A 3-cube with binary labeled
    | |/     | |       nodes in a cartesian coordinate
 001| o------|-o 101   system
    |/	     |/
    o--------o------->
  000	   100	     X

of the boolean n-cube homologue sequence with total number of
nodes being 2^n. Thus one can label the nodes with a binary
integer 0,..,(2^n)-1. Alternatively, one can say there a space
of IDs for each n-cube. If we consider a single reference node
in a boolean n-cube, it is connected to n nodes _with binary
offsets_ (2^i). The sign of the offset is derived from the
binary count pattern bit sequence. This is a constructive
definition. (Look up connectivity matrix of boolean 5-cube way
down. Note that it contains _four_ 3-cubes).

 This table represents the 3-cube:

 ref. Binary Signs  binary     connected
 ID   Count	    Offsets    IDs
 +---+-----+------+-----------+-------+  (alt.:
 | 0 | 000 |  +++ |  +4 +2 +1 | 4 2 1 |   perfect
 | 1 | 001 |  ++- |  +4 +2 -1 | 5 3 0 |   shuffle
 | 2 | 010 |  +-+ |  +4 -2 +1 | 6 0 3 |   stages (1,2,3)
 | 3 | 011 |  +-- |  +4 -2 -1 | 7 1 2 |   of the initial
 | 4 | 100 |  -++ |  -4 +2 +1 | 1 6 5 |   ref. ID)
 | 5 | 101 |  -+- |  -4 +2 -1 | 2 7 4 |
 | 6 | 110 |  --+ |  -4 -2 +1 | 3 4 7 |
 | 7 | 111 |  --- |  -4 -2 -1 | 4 5 6 |
 +---+-----+------+-----------+-------+

This boolean n-cube is a peculiar beast. On the one hand it is a
very orthogonal structure, easily reasoned upon mathematically.
The signs/offsets look after themselves, it is a
recursive/fractal structure (each n-cube consist of 2
interconnected (via (n-1) number of links) (n-1)-type subcubes),
etc. Routing in it resembles binary search: the first step
brings one halfway there, the second halfway again, etc. We
descend the binary cluster hierarchy until we arrive at the
0-cube (single node). The "distance distribution versus
dimensionality" table is the Pascal triangle, by the way. It can
derive its own ID from wiring constraints and the other IDs it
is directly connected to. (This is a crucial point). Etc.
Routing in a perfect n-cube runs roughly: "going from the left,
find the first bit in destination ID differing from source ID.
Move message packet to the according subcube via direct link
(there is only one). Repeat until you reach outer (rightmost)
bit. Bingo." This can be coded in some 10 assembly statements or
implemented via a very primitive logic. If a part of the links
is missing (the n-cube is defective) we have to visit all the
nodes which should have links to the second subcube until we can
prove the task impossible.

The n-grid, on the other hand, can be considered a better n-cube
(is a superset of it). We can derive it from the n-cube by
making link offsets symmetrical. Links which would reach out of
ID space can be either wrapped around (circular space) or
considered void/free (open space/grid region). See connectivity
table below. Routing is again completely trivial. You just
choose the link which distance (arithmetical ID difference) is
smallest and shove the packet through. In a perfect n-grid, this
works every time and is moreover the best possible algorithm.
Especially, if the near-range order (shortest offsets) is highly
preserved, this scheme works also for severely defective n-grids
(there is a lot of space in those higher dimensions). If the
message _blocks_ (its path has a periode/loop) we can circumvent
the blockage either giving the message packet a random kick or
executing a plan, subsequently visiting all the nodes which
should have links in this direction in ever-increasing
distances.

Computer simulations I did have shown this routing scheme to
work and, even better, to work well.

Connectivity Tables of 5-cube and 5-grid (2^5=32 nodes each)

# means a link,
- means there is no link.

Note that the n-cube is contained in the n-grid (as it is a
superset). The n-cube has a fractal/recursive makeup: it
contains two interconnected (n/2 links) (n-1)-cubes. A n-grid
node has (2n-1) links as compared to the n-cube (free links are
not shown). Two (n-1) subgrids are connected by (2n-1) links as
compared by n links in the n-cube. The simple n-grid routing
algorithm does not work for the n-cube (it blocks/loops in most
cases). Moreover, if the _farthest_ link of each node (n-cube)
is cut, the cube collapses into 2 unconnected (n-1) cubes. This
does not happen to the n-grid. Notice that a defective structure
(some links missing) still routes all right because of built-in
redundancy (which is higher in the n-grid).

The n-grid is a superset of the n-cube. Each node has

boolean 5-cube			  5-grid (open-space version.
					  free links not shown)

-##-#---#-------#---------------  -##-#---#-------#---------------
#--#-#---#-------#--------------  #-##-#---#-------#--------------
#--#--#---#-------#-------------  ##-##-#---#-------#-------------
-##----#---#-------#------------  -##-##-#---#-------#------------
#----##-----#-------#-----------  #-##-##-#---#-------#-----------
-#--#--#-----#-------#----------  -#-##-##-#---#-------#----------
--#-#--#------#-------#---------  --#-##-##-#---#-------#---------
---#-##--------#-------#--------  ---#-##-##-#---#-------#--------
#--------##-#-----------#-------  #---#-##-##-#---#-------#-------
-#------#--#-#-----------#------  -#---#-##-##-#---#-------#------
--#-----#--#--#-----------#-----  --#---#-##-##-#---#-------#-----
---#-----##----#-----------#----  ---#---#-##-##-#---#-------#----
----#---#----##-------------#---  ----#---#-##-##-#---#-------#---
-----#---#--#--#-------------#--  -----#---#-##-##-#---#-------#--
------#---#-#--#--------------#-  ------#---#-##-##-#---#-------#-
-------#---#-##----------------#  -------#---#-##-##-#---#-------#
#----------------##-#---#-------  #-------#---#-##-##-#---#-------
-#--------------#--#-#---#------  -#-------#---#-##-##-#---#------
--#-------------#--#--#---#-----  --#-------#---#-##-##-#---#-----
---#-------------##----#---#----  ---#-------#---#-##-##-#---#----
----#-----------#----##-----#---  ----#-------#---#-##-##-#---#---
-----#-----------#--#--#-----#--  -----#-------#---#-##-##-#---#--
------#-----------#-#--#------#-  ------#-------#---#-##-##-#---#-
-------#-----------#-##--------#  -------#-------#---#-##-##-#---#
--------#-------#--------##-#---  --------#-------#---#-##-##-#---
---------#-------#------#--#-#--  ---------#-------#---#-##-##-#--
----------#-------#-----#--#--#-  ----------#-------#---#-##-##-#-
-----------#-------#-----##----#  -----------#-------#---#-##-##-#
------------#-------#---#----##-  ------------#-------#---#-##-##-
-------------#-------#---#--#--#  -------------#-------#---#-##-##
--------------#-------#---#-#--#  --------------#-------#---#-##-#
---------------#-------#---#-##-  ---------------#-------#---#-##-

Where's the beef?

you might ask. What's the reason to introduce a new topology at
all? There are several reasons:

- n-cube and n-grid have the best routing performance at
  minimum wire length (when mapped to real physical 2- and 3-space).

- the amount of computation (space/time complexity) for
  efficient routing is very small. On-chip routers for
  wafer-local high-speed serial networks are possible.

- n-grids tolerate a high degree of defectivity at still good
  performance.

- the IDs can be derived from (directly visible) neighbour's IDs
  without any need for administration. I call it dynamic ID
  derivation (DID).

- the total communication bandwidth is very high and increases
  with increasing n-grid size.

- the addressing and routing scheme is scalable to extremely
  big (WAN) nets.

Finally, I should say there seems to be a way of relaxing wiring
constraints even further without sacrificing too much
efficiency. Random local wirings with decreasing wiring density
(as realspace projection) can be harnessed with an ID subspace
ID derivation and routing scheme. This I call fuzzy routing.

4. Cellular Automata

Below is a somewhat theoretical section. It is still a
worthwhile read since presentation is somewhat unusual.
Important concepts and emphasized words are marked _with
underscores_. Skip the section if you are interested only in
applications of CAs.

4.1 What Are They?

Finite-grid _cellular automata_ (CA) are finite state machines
with discrete timespace. Differently put, CAs are n-dimensional
(n = 1,2,3; usually) cell arrays of a finite size, each cell
having a distinct state (chosen from set of defined states;
usually quite small) at time t.

_Central_, or _reference_ cell's state at time t+1 (future/next)
is determined by its and its _neighbourhood_ (cells in the
immediate vicinity, neigbourhood size being usually small in
relation to the total array size) at time t (current/present).

The function which performs this mapping of the
"central-cell+neighbourhood-current-states" vector (base of the
_light cone_) upon "central-cell-future-state" vector (tip of
the light cone, usually a single cell) is called a _rule_. The
rule is applied upon upon all cells _simultaneously_ at each
time step. The set of all possible rules, to map all
permutations of a given light cone base vector to the light cone
tip vector is called the _rule space_.

Let us restrict the cell states to binary integers each of n
bits. Since we are be interested in practical, also
implementable, systems, this does not introduce any
restrictions. Obviously each cell can assume one of 2^n states,
ranging from 0..(2^n)-1. The neigbourhood size is z cells,
including the reference cell. If we consider the compound
neighbourhood vector (consisting of individual cells in a given
order) as a new integer, sized zn bit, it will have i=2^(zn)
states. Obviously, the number grows very fast both with
increasing number of states and the bigger neighbourhood size.
Imagine this integer being an index to a table T_i of n-bit
integers, the rule. Obviously, the space of all possible tables
(having j=((2^n)^i)=(((2^n)^2)^(zn)) elements) is equivalent to
the rule space. Let this rule space be a new table R_j with
individual tables T_i being its entries. Thus R[n,z] is the
generic rule space for any CA.

Obviously, if T_i is truly random, there is no method
(algorithm) to collapse the table. Particulary, the attempt of
substituting it by code is prone to fail. T_i is a superset of
any code solution, provided, code size < T_i. On the other hand,
if the table is smooth/highly repetitive there is clearly a way
to put a much simpler/smaller ersatz automaton in its place.
Particularly, the main class of collapsible CAs is those with an
_isotropic_, a high symmetry neighbourhood. This means, similiar
configuration subpatterns in the neighbourhood sphere have about
the same impact upon the reference cell, whenever there are
located. (Particularly, in CAs of high dimensionality, this
constrains rule complexity dramatically). If similiar
configurations appearing at the light cone base are treated in a
known simple way, e.g. additively, a further very noticeable
reduction is possible.

Note that this definition encompasses CAs of all dimensionality,
which is given by the neighbourhood definition. Lattices of
every dimensionality and symmetry, and even networks, can be
defined in terms of e.g. offsets to the reference cell. Of
course, individual neighbourhoods for each cell are possible. A
high-neighbourhood CA is always a superset of a
small-neigbourhood one (provided the neighbourhood is a true
superset and the number of states is the same). The
everything-is-influencing-everything-totally-holistic-yes-please
CA (which is, admittedly, a somewhat perverse case) has the
whole of the cell space as it neighbourhood. All finite-state
neural nets in existance are comfortably embedded in its
allencompassing depths.

The only loose end left to define is whether this cellular space
is _open_ or _circular_ (wrapped-around). Since in open spaces
boundary cells neighbourhood reaches out into cell-free "void",
they must be handled separately.

This definition is constructive. Apart from being
instrumentative for an straighforward look-up implementation
(some MByte-sized look-up for very modest n and z; z being
hard-coded or defined as list of offsets relative to the
reference cell or absolute pointers. Both codings allow varying
neighbourhoods.) it states that for every mapping of any cell
space a upon b, denoted b=T_j1[a] there is an a=T_j2[b], the
inverse mapping. T_j2[T_j1[a]]=T_j1[T_j2[a]]=a. Which also
means, that after n iterations b=nT_j1[a] upon given space
vector a, nT_j2[b] restores the original pattern a. How's that
for a public-key cryptosystem? In this context, an efficient way
of deriving T_j2 from T_j1 is of interest. (Even more so is a
mapping T_j3, which T_j3[nT_j1[a]]=a and especially a convenient
;) way of finding it, using only (partial) knowledge of a.). But
I got carried away.

The term light cone stems from the maximum information
propagation velocity (the _light velocity_) in a given CA which
is limited by the neighbourhood size. Consider our usual
physical space: information about one _event_ (e.g. snipping of
one's fingers; detonation of a nuke) spreads at light velocity
in all directions as a _sphere of knowledge_, a four-dimensional
light cone. For the sake of illustration: assume our world to
one-dimensional. A stack of this _world sheets_ in the 2nd
dimension, the time, is called the _timespace_. Godlike, we can
scan the future and the past in a single glance:


 The Arrow of Time

       ^
       |
       |  \			 /
 t+n   |---\--------------------/------ World sheet at t+n (Future)
       |    \		       /
       |     \		      /
       |      \ 	     /
       |       \	    /	<------ future light cone
       |	\	   /		(maximal area of knowledge/
       |	 \	  /		 maximum area to be influenced
       |	  \	 /		 by the event)
       |	   \	/
       |	    \  /
       |	     \/ event!
 t     +-------------------------------  World sheet at t (Present)
       |	     /\
       |	    /  \
       |	   /	\
       |	  /	 \
       |	 /	  \	<------ past light cone
       |	/	   \		(maximum amount of knowledge/
       |       /	    \		 area necessary to predict
       |      / 	     \		 the event)
       |     /		      \
       |    /		       \
 t-n   |---/--------------------\------ World sheet at t-n (Past)
       |  /			 \

(Read "A Brief History of Time", by Stephen W. Hawking (1989 Bantam)
and (especially) Penrose's "The Emperor's New Mind" for an in-depth
light cone discussion and _much_ more).

We can predict the future of a pattern lying within its light
cone. Eventually, disturbances from outside creeping in with
light speed, let the area of predictability dwindle to a point.
The special case of the light cone base is the reference cell
with neighbourhood z, future pattern is the reference cell.

To be able to predict/foresee the state of a given _pattern_
(a set of cells with their individual states at time t) at time t+n
we need:

- the space (cell vector pattern) and its size
- the number of generations (integer)
- the neighbourhood (list of cells)
- the index function (algorithm to compute the integer
  index from neighbourhood)
- the mapping function (rule)
- knowledge whether the space is open (edge artifacts) or
  closed (toroidal space)

time flow

   ^  w
   |  o 	  end of predictability
   |  r 	  /\
   |  l 	 /  \
   |  d 	/    \
   |	       /------\ half-height future pattern
   |  s       /        \			    ^ function/
   |  h      /		\			    | mapping
   |  e     /------------\ current pattern	    |
   |  e
   |  t
   |  s

Actually, this has been the logical base of a very efficient
implementation of a special edge-of-chaos 2d cellular
automation: Life (see Moravec, "Mind children", Harvard
University Press, pp. 154). In 1982, Bill Gosper, a former MIT
Life group hacker, wrote a Life implementation, which was able
to simulate cellular spaces one billion cells on a side. Using a
recursive light-cone hash coding, hash table being filled in by
the program at run time, the acceleration was spectacular. A
typical run would take 1 min for first 100 generations, then 10
seconds for 10 000, then a sustainable rate of 1 000 000
generations within _one_ second. (If you have ever tried to
implement a CA on a PC you know what that means).

						       /\
						      /  \
						     /	  \
\   /\	 /\   /\   /\	/	  /\		    /\ /\ /\
 \ /  \ /  \ /	\ /  \ /	 /  \		   /  X  X  \
--X----X----X----X----X---	/    \		  /  / \/ \  \
 / \  / \  / \	/ \  / \       /      \ 	 /\ /\ /\ /\ /\
/   \/	 \/   \/   \/	\     /        \	/  X  X  X  X  \
					       /  / \/ \/ \/ \	\

Light cone mosaic,	     A single cone     A recursive cone
overlapping at half-height

To explain the reason why hashing worked so well we need a rough
classification criteria for CAs. A well-known classification to
to walk through the whole rule space R[n,z], investigating how
each individual rule T maps a random vector over some
generations. Obviously, an exhaustive investigation is possible
only for very small n,z. For higher n,z values, representative
sampling must do.

Surprisingly, only four qualitatively different CA categories
are discovered:

- very dull_1 (maps everything to one state)  ="solid"
- dull (stable isles of (oszillating) states) ="(melting) solid"
- interesting (particles/gliders all kinds of fascinating
  complex patterns, wow!)		      ="liquid"
- very dull_2 (produces only high entropy patterns) ="gas"

Obviously, the light cone hashing could work only so well
because the probability of certain patterns to appear/persist
was much higher then the others in the Life automaton. Starting
as a Kaspar Hauser, the program rapidly learned to predict the
future of this frequently recurring features, effectively making
bigger and bigger leaps into the future.

We can consider each rule vector T to be a point in the rule
space R[n,z]. If we map this space e.g. with the euclidic
distance metric, we notice that these areas form contigous isles
in the space, very like physic's phase diagrams for substances.
Moreover, as n,z go up we notice that very dull_2-type automata
begin to dominate. Actually, even for very modest n,z the
probability of finding a CA of "interesting" or even "dull" by
pure random walk is nil. The liquid phase at the edge of chaos
("chaos", (Greek): gas), where the most promising (read:
computationally universal) CAs are found, is only a _very_ thin
layer on the boundary between solid and gas.

Above fact appears very interesting especially in the context of
Stuart Kaufmann's "edge of chaos" hypothesis. On a even more
grandiose scale, Penrose's pin-equipped creator aiming for a tiny
universe phase space comes to mind. In light of recent "fluctuating
crazy quilt multiverse" models (Scientific American) an evolution-
driven drift of the universe population, each featuring a different
rule towards edge of chaos ones. Of course there must be a)
inheritance b) mutation c) fitness, a function of rule, which
increases the individual's propagation probability. Enough science
fiction.

One final effort: the (strange) attractors in CA spaces.

In physics, the concept of _state_ or _phase space_ is used to
visualize the temporal _evolution_ of a system. The total state
of a system at time t is entirely represented by a single point
in n-space, e.g. 6N-dimensional (three position coordinates and
three momentum coordinates for each particle) for N
unconstrained particles. A _Hamiltonian_ (the rule in our
terminology) defines a vector field, a mapping	on that system.
Differently put, the Hamiltonian is an _energy gradient_, a
pathway for a system to slide towards a (local) minimum. In our
version of finite binary integer CAs with quantized time, each
vector at t is mapped by the rule to the future vector at t+1.

Nothing new, so far. Now consider a initial _region_, a
contigous block of vectors. Each element of the block at t gets
mapped by the rule to another set of elements at t+1. Obviously,
the _volume_, the total number of cells in a given region cannot
grow bigger, it can shrink or remain constant, at most. Due to
the Liouville theorem a true Hamiltonian system's region volume
remains invariant (constant) during temporal evolution (note
that the CA rule is a superset of a Hamiltonian as it does
_also_ allow volume reduction). This has profound implications:
for a majority of systems the elements of an contigous regions
diverge with each evolution step, loosing their coherence they
are getting spread over increasingly larger volumes. The extreme
cases of this system behaviour is referred to as _ergodic_. The
"gas" phase type CAs are excellent specimens of this behaviour.
As a minimal disturbance in the starting vector produces a
maximal deviation within a few iterations, the gaseous CAs are
thus an excellent illustration of the "butterfly" effect.

Another important theorem states that exists only one
_trajectory_ to pass through each point of the state space. The
_periode_ of the trajectory, the number of transitions until the
randomly walking vector hits upon itself by chance is _very_
long in high-dimensional CAs (there are a great many directions
to go). This property, coupled with the lack of attractors/high
path entropy at equidistributed statespace point visit
probability makes this class for CAs particularly attractive for
cryptography.

A truly ergodic system performs a random walk through its state
space with the probability of visiting each point being
equidistributed. Additionally, the systems's evolution velocity
(the fraction of transition length over time step) along its
trajectory using the euclidian distance metric (square root of
the sum of quadrates of the element's differences) gets
increasingly evened out with increasing size of the space, the
mean translational velocity is roughly a constant. The bigger
the system, the less detectable is the individual's contribution
to the whole picture, an obvious parallele to the "billiard
table" ideal gas picture. Though subsystem components may
experience vigorous brownian motion, the whole system stands
rock firm.

One should remark, however, that for
Morse-modulated-potential-type ideal gases boxes there _are_
constraints, as introduced by the shape of the potential
function. One can calculate some kind of crazy
Schroedinger-square, telling which points are likely to be
visited and which not.

A system can also be seen as a state-space balett of interacting
subsystems of smaller dimensionality, choreographed by
higher-order rule. Particularly, using binary subsystem
clustering may appear constructive in implementation/analysis.

The dull_1 (solid) CAs are remain trapped in single points or
constrained to tiny volumes ping-pongying to and fro. If we
start to melt the solid rule, thus drawing nearer and nearer to
the liquid phase edge, the traversed regions become increasingly
larger and larger, the paths more and more complex. A liquid
system may stay liquid for a long while or make an expedition
into the chaos region, either to stay there indefinitely (there
is no way to tell) or returning back, eventually. It might also
freeze out, remaining trapped in a solid forever.

A typical liquid CA will also show special points (typically,
many), called _attractors_, around which on complex trajectories
the system will revolve. In a finite system, several convergent
paths will congeal into a single one, which might eventually be
trapped in a single point, the attractor. (This would never
happen in a continous system, of course).

The entire volume belonging to converging trajectories is called
the _basin of attraction_, a simile to basins of drainage in a
montainous massive. A system, released within its basin of
attraction, will remain there forever, unless disturbed by an
outside force. Two basins of attraction are meet at a ridge,
called _separatrix_, a region with one degree of freedom fewer
than the number of variables in the system.

Instead of being a point, an attractor might be a loop, a toroid
(of course, it's higher-dimensional analogues) or any such
thing. Moreover, there is a class of _strange_ or _chaotic_
attractors. Their main property is that two very close systems
let loose within the basin remain on the attractor surface but
float forth _diverging from each other_. Tiny initial deviations
gets amplified with time. For obvious reasons, real systems
featuring these strange attractor (or ergodic) property are
virtually noncomputable as they would require infinite precision
both at measurement and simulation stage. Both are unattainable.
The main distinction between ergodic systems and these
containing strange attractors is that a strange attractor/its
basin is tiny in relation to the total statespace size.

By now you must think I must be strangely attracted to those
attractors, but these beasts are actually useful. There is
_heavy_ experimental/theoretical evidence (both direct and
indirect) that the brain uses state-spaces and generic mappings
(coordinate transforms) to represent and process information
(see Churchland for an excellent introduction). Particularly,
the existance of attractors is ascertained (Nicolis/Prigogine).
Most, maybe all, part of machinery of the brain does
calculations utilizing trajectories and attractors in state
space (see Churchland).

Actually, I believe any system (including us) which does any
kind of computation/life must be an edge-of-chaos one to be
efficient. Our computers, alas, reside quite firmly in the solid
region.

4.2 What Are They Good For?

A good question. Are CAs just another irrelevant theoretical
construct? I think not.

- CAs are great for applied cryptography, both for rnd-stream
  and public-key. Sadly, CA are largely unknown among
  cryptographers as yet.

- CAs are great for efficient modeling (e.g. fluid dynamics),
  reducing the amount of necessary computation by many orders of
  magnitude. A modest fluid dynamic package will run on a
  PC in realtime. (I've seen some). Systems with purely
  local interactions (essentially all systems fall within
  this class) and (best) a noticeable number of constraints are
  prior candidates for CA modeling. Sadly again, most scientists
  haven't heard about CAs yet.

- neural nets (NNs) are special cases of CAs, which are
  special cases of automata networks (ANs). Knowledge transfer
  seems to be possible.

- CAs and ANs are great for self-organization and non-linearity
  science. Intuitive reasoning and/or a well-developed
  body of mathematical machinery is readily available.

- CAs can help up to understand embryonal morphogenesis.

- CAs are great toy systems for artificial life (AL).

- A family of CAs is computationally universal. They compare
  favourably (in time/space-complexity) to the Turing machine/von
  Neumann architectures. CAs might be computationally equivalent
  to Turing engines, but not in terms of time complexity. Their
  implementation in silicon is straightforward. Achievable
  computing power with the same transistor resources is several
  orders of magnitude higher than current architectures. Being
  redundant arrays of identical, extremely primitive elements,
  they are the major candidates for molecular electronics
  implementations. Achievable integration density and computation
  performance are of all architectures the highest theoretically
  possible.

- CAs are interesting candidates for universe theories. They
  mimic a lot (reversibility of the time arrow, particles,
  light speed, anthropic principle, edge of chaos theories,
  information conservation, etc.) of the real-world properties.

4.3 A Recipe for a CA XOR-Rnd-Stream Cryptosystem.

- Choose a CA which does very good rands. Most CAs from the gas
  region will do nicely.
- Define all the boundary conditions. Code it.
- Choose a random space vector as initial key. Give the key to
  your partner via a safe channel.
- XOR your data stream with CA output stream to obtain cypher
  text. Your partner does the same to retrieve plain.
  Of course, this works also the other way round.

Why bother?

CA's form a _family_ of random generators, which are
well-behaved, produce excellent rands and run fast even on
von-Neumann machines. Internet nodes will be able to crypt
TCP/IP packets individually via a different crypt for each
target node. Routine eavesdropping will become much more
difficult this way, as the cryptoattacker will have to know the
right crypt method for each packet. Since crypt is transparent
(hidden somewhere in the lower OSI levels), top level
applications could (and should) use a different/better
cryptosystem, such as:

4.4 A Recipe for a CA Public Key Cryptosystem.

- Search CA (automaton network space, actually)  space (_not_
  rule space) for a cryptography-suitable automaton family. Choose
  one member. Implement it in C, give away code. Invite people to
  try cryptoattacks.

- Fill rule table with very good rands. Do _not_ use compiler
  built-int rands. Hardware rnd generator or another CA may do.

- Construct the inverse rule.

- Decide, which rule table to be your public key. Give it into
  the net. Keep the other in your safe.

All data sent to you, transformed n times (you must agree upon
how many) with your public key can be only decoded applying your
private key n times.

All data encoded by your private key, can be only read with
your public key.

You want to  know whether the sender is the person it purports.
Choose a big pseudo random number (best CA rnds, again), encode
it with your friend's public key and send cypher to the
pretender. Pretender codes the number with its private key, then
sends it back to you. You decode it using your friend's public
key. Is the obtained the same random number you chose initially?
If yes: Pretender=Friend. Bingo! No? Pretender=bad guy. Call the
police!

Why bother, again?

Robust trapdoor cryptosystems with two inverse keys are not
exactly frequent in cryptology. The only one I know of is based
on large prime factoring, is supposed to be breakable with a
finite effort (and might become even more fragile with further
advances in number theory). Moreover, it takes a lot of
horsepower to compute. Using CAs might come quite handy here.

4.5 Two Simulations  \
4.5.1 Fluid Dynamics  \ though extremely interesting
4.5.2 Protein Folding / these chapters are ommited due
		     /	to lack of relevance to uploading
			If there are doubts CAs to be
			instrumental for large-scale
			scientific simulations I will
			supplement.

--- to be continued ----------------------------------------------

Literature/References

[1]  Roger Penrose, "The Emperor's New Mind", Vintage (1990).
[2]  Stephen Hawking, "A Brief History of Time", Bantam (1989).
[3]  Hans Moravec, "Mind Children", Harvard University Press (1988).
[4]  Stuart A. Kaufmann, "The Origins of Order", Oxford University
     Press (1993)
[5]  Gregoire Nicolis, Ilya Prigogine (ed.), "Die Erforschung des
     Komplexen", Piper (1987)
[6]  Dawkins, "The Blind Watchmaker", (1986)
[7]  R. Vollmar, "Algorithmen in Zellularautomaten", B.G. Teubner
     Stuttgart (1979).
[8]  G. E. Gorelik, "Rasmernost' Prostranstva", Isdatel'stvo
     Moskovskogo Universiteta (1983).
[9]  Andrew Hodges, "Alan Turing: the enigma", Vintage (1992).
[10] Koch/Segev (ed.), "Methods in Neuronal Modeling", MIT Press,
     2nd printing (1990)
[11] Yoh-Han Pao, "Adaptive Pattern Recognition and Neural Networks",
     Addison Wesley (1989).
[12] Igor Aleksander (ed.), "Neural Computer Architectures",
     North Oxford Academic (1989)
[13] Paul M. Churchland, "Representation and high-speed computation
     in neural networks" in "Foundation of artificial intelligence,
     a sourcebook", by Derek Patridge and Yorick Wilks, (ed.),
     Cambridge University Press (1990).
[14] P.W. Atkins, "Physical Chemistry", Oxford University Press (1987)


Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=4393