X-Message-Number: 4393 Date: Fri, 12 May 1995 17:27:44 +0200 (MET DST) From: Eugen Leitl <> Subject: Random Ramblings II Random Ramblings II. 0. Motivation and Apology. I promised to continue some of my Random Ramblings, sorry it took so long. This post strives even farther from the cryonics mainstream, even connections to uploading it purports to contain might not be instantly detectable. Nevertheless, they are there. The picture will be clearer in the final part yet to come, so hang on. I could not mention all topics, the text is incomplete while certainly too large already. Again, I apologize for wasting bandwidth of this group. I tried to be succinct without introducing intolerably high bit rates. I didn't use any mathematics (there should be lots) since I have trouble with it. Moreover its ASCII transcription would be hard to read. Go for Penrose and Kauffman and (mostly) their references to obtain the real hog. Things still to go: - what is a packet switched network; how the internet routes - what is a hypergrid and why is it a good network topology - what are CA', what is their relation to NNs, how can one do computations with CAs, how does one implement them efficiently in silicon - an abstract CA-like NN model in EBNF - efficient implementation of it in silicon, estimations of performance - efficient Si implementation of a hypergrid router - how does one implement CAs in molecular circuitry, Soft Circuits (SC), hypergrid mapping upon 2d/3d grid, difficulties, perspectives, etc. 1. The internet, an instance of a packet switched network 1.1 Why mentioning it at all? Good grief, what has the bloody internet to do with uploading? - you will ask. Simple. The architecture of the emulation engine to hold and process the structure representing your mind has to be a massively parallel (maspar) fine-grained machine, a deeply-intertwined conglomerate of active nodes communicating via a packet-switched network. For many essential reasons not to be discussed here due to lack of space. I chose internet as our starting point for several reasons. Most readers will be familiar to internet, why, this very text has been distributed and delivered by it. A packet-swiched (PS) network needs its own language and imagery, so the internet will be a model case to start with. Moreover, analyzing its features and deficiences we will be able to extrapolate a better indexing and routing scheme. An overambitious goal? You bet. An arrogant attitude? As hell. But woosh, here we go. 1.2 Introduction to PS Networks Contrary to the old phone network, there is usually no direct physical connection between sender/target in the packet switched network. The venerable snail mail is a much better metaphor: the data stream is hacked up into pieces, which are packed up in envelopes (provided with a time stamp, address, etc.), and thrown into the net. Jumping from node to node, the packets find their destination via different routes like intelligent beings, get sorted according to the logical sequence of departure, strip their envelopes and thus regenerate the original continous data stream, as issued by individual talking objects. But first things first. The ISO Open System Interconnection (OSI) model describes all information transfer in terms of seven layers. 7 Application - end-user services 6 Presentation - formatting, encryption 5 Session - user interface 4 Transport - end-to-end message control 3 Network - message exchange control (e.g. Routing) 2 Data link - subnetwork data transmission 1 Physical - electrical interface (e.g. RS 232 definition) Only the lowest layer, where the signal levels, pin layout etc. are defined, is responsible for direct information transfer. Actually, OSI is a hierarchy of virtual channels, each defined in terms of the underlying layer. Only. What for? Because of this information insulation/protection not the whole beast has to be redesigned if one layer gets changed. 1.3 Some features of a good packet-switched network: The sender as well as the destination must have an _identity_, which ideally should not change over time. This uniquity/distinguishablity is usually achieved through assignation of an unique ID/address to each node. The set of all valid IDs is called ID space. Currently, all networks nodes' IDs are assigned manually using a following scheme: a central administrating authority cuts up ID space into segments, to be distributed to several subadministrators, and these... all the way down to the local node. Sometimes, the original ID space cake turns out to be too small (as recently happened to the internet) and has to be expanded. Ideally, each node should derive it's ID from it's wiring constraints and the portion of the net (more correctly, its IDs) it perceives directly, instead of being assigned by a central authority. This labeling scheme should be robust and expandable/scalable up to wide area network (WAN) scale. Imagine it as a phase transition, where each node strives to obtain it's position in ID space utilizing only locally available information (purely local information flow). In a finite iterative process, each node can obtain enough information to be able to pinpoint his position in the ID space to perfect certainty (zero fuzzy factor). Note, that this network description bears striking resemblance to the cellular automaton (CA) information processing paradigm, to be discussed ad nausem below. This is deliberate. Of course administratory packet overhead should not reduce bandwidth noticeably. Since the net will mapped to realspace (2d for a wafer, 3d for a module of several), we can say where a particular ID will be (predict its physical coordinates). This gives us the minimum value for wire lengths. Since the total net can be very big, each node should hold only a small fraction of it in its memory (local view, overlapping spheres). This requires the node ID to provide sufficient information for the data packet which local path to choose. This, in turn, requires the network topology to be regular, a kind of a higher-dimensional lattice, where data packets route via a higher-dimensional Bresenham homologue. In the Bresenham, a 2d/3d line-drawing algorithm, a decision which pixel from several adjacent ones to choose from is derived from a) knowledge where we are, b) where we want to go and c) from intrinsic regularity of the grid/lattice. Blockage due to defective cells is virtually nonexistant in higher-n spaces. I refer to this higher-dimensional Bresenham homologue routing as grassrouting due to its low overhead. See n-cube and n-grid for instances of grassrouting. Very high degrees of defectivity should be tolerated without impairing routing efficiency. Higher-order protocols should take care of packet lossage, do routine encryption and provide authenisation/authorisation, a major security issue. Of course, none of it is relevant on an island single-user implementation. The neural engine for uploading is one. I/O to realspace/VR is done via defined interfaces (sensomotoric I/O). Read/Write to the "me" data structure is forbidden/restricted e.g. to you/a group of your clones or to your virtual physician. A link to uploading: the individual objects in the net are neurons (there might be several in a single node), they have a state at time t. They change the states of other neurons at t+1 by sending them information packets, modulated by synapse strengths assotiated with each link/connection/edge. In turn, incoming packets will influence the local state at t+1, a function of local state at t and the sum of incoming packet influences. Cryptic? Read the CA part, then it will be obvious. 1.4 Glossary Caution: these are my definitions and may differ from those in other sources. A lot is missing. Read at your own risk. Due to many xrefs this should have been done as hypertext, actually. adaptive routing (generic term) a smart routing scheme, which watches/benchmarks the network and utilizes this information to increase efficiency. Opposite non-adaptive (dumb) routing. Administrative information uses bandwidth but increases effective point-to-point bandwidth. Obviously, the gains should be greater than expenses in terms of complexity and fragility. address/name (generic) an unique ID. Member of address/name/ID space Makes nodes discernible; unique. Backward learning (generic) a routing scheme using a packet delay list in each node. Delta-updates entries continously examining incoming packet's time stamps. Can begin with no prior knowledge (without map). Is sometimes utilized by intelligent bridges. Bandwidth (information theory) the amount of information per second a device can handle. The information flow rate in bits/s (bps) or packets/s (pps). The total network bandwidth is integrated over the individual channel bandwidth it consists of. Bridge (device) A device to bridge two otherwise unconnected networks. Addressing scheme has to be compatible. Can be somewhat smart. Channel (information theory) a channel to move information. It can be mono or bidirectional. It has a latency or a signal delay. It has a bandwidth, usually measured in bits/s and an error rate (errors/bits) or frequency (errors/s). Cluster (generic) an additional information allowing objects to be classified in groups. Alternatively, an object's property. There is a binary metric (a predicate) telling which objects belong to which cluster. Each object can belong to several clusters. In security lingo, they are usually referred to as groups. Each member of a group has the same rights. A mapping of vector space element pairs to boolean space. Fuzzy clustering is possible utilizing nonboolean metrics. Connectivity matrix a boolean matrix. Aij being TRUE means the nodes i j are connected (have an edge). An implementation of the boolean connectivity metric. See also map, cluster, metric, edge, node. Connected nodes two nodes having at least one path between them Connectivity a list of neighbour node's IDs DNS internet's domain name system. Maps the string namespace to node address space. Isolating abstractional layer, allows transparent change of the underlying layer. Domain a cluster of nodes. Used by hierarchical routing. n-grids uses an orthogonal domain addressing, a binary clustering of subdomains ((n-1)-subgrids, connected by (n-1) links). Edge a boolean metric (predicate) for node pairs in a given space (graph). Global Optimization A routing scheme utlizing fixed path tables. Needs a central agency. Performs bad at high loads. Group see cluster. Hierarchical routing due to limited node grain size, in WANs spatially neighboured nodes are clustered in domains. Each domain is connected to each others via a single link. A node can only route within its domain. Moreover, it knows which nodes within his domain have links to the other domains. A domain is nonpenetrable to foreign domain' packets. Flooding A routing scheme where several cloned packets traverse different paths concurrently. Plus: automagically circumvents defects finds fastest/shortest path automatically Minus: reduces overall bandwidth clones must be recognizable packets must have limited life time to prevent indefinitely looping packets Geographical flooding: flooding utilizing position information. Hot-potato routing a local-knowledge routing scheme. Tries to get rid of data packets as fast as possible giving it to the link with minimum queue length. Infinite packet lifetime. Can produce oscillating data packets. Performs better with some extensions (Random link selection, etc.). Information the raw stuff we are pumping. The amount of knowledge necessary to select a single case from a set. Is measured in bits. Link Connection between two nodes. Synonym: edge, vertice. Loop a path having the same node at least twice Lossage ratio of packets not reaching its destination due to hardware or routing trouble. lossage=arrived/issued. Map a global list of connectivities for each node (see Connectivity matrix). Metric assigns a scalar (real or integer) to each two elements of n-space. E.g. euclidian metric (geometric distance d(a,b)=sqr(dx^2+dy^2+dz^2) for 3-space) or the Hamming distance for binary vector pairs (number of bits where two vectors do differ). Special case: boolean metric (predicate). Maps n-space vector pairs to the boolean (=true,false) space. Node Graph theory lingo (acc. vertice, edge).. Source and sink of data packets which travel the edges (links, connections) between two nodes. Neigbour nodes two nodes having at least one edge between them. Path a list of nodes, each pair of them connected with least one edge Packet Atom of information in a packet switched network. Can be fixed/variable sized (up to a maximum). Consists of an envelope which auxiliary information and the data cargo. Ping (program) a internet troubleshoot program. Gives benchmark data on packet throughput/lossage between two TCP/IP addresses. Queue a FIFO (first in first out) buffer Routing 3,4 OSI layer. The art of finding the right path for the data packets. Repeater (device) A device to discern signals from noise and ampflify to increase signalling length. Pretty dumb. Stafette routing each node benchmarks the node periodically. Empirical values are exchanged between direct neighbours. Information spreads successively over the net. A dead link has infinite delay. In the beginning internet worked thus. Now each node has a total map. Benchmarked results are spread by flooding. To prevent packet oscillation IP uses maximum packet life time of 255 secs. TCP/IP transfer control protocol/internet protocol is data-moving internet layer. OSI 3..4 (?) Virtual transfer finds a good path, then allocates the resources for route to be used during whole transfer. WAN Wide Area Network, the big brother of LAN (Local Area Network). Internet is a WAN. Note that TCP/IP can also be used in a LAN. Weight a number attached to an edge. Can mean a delay or mean packet throughput (speed), queue length etc. An edge can have several weights. Wire length the total physical wire length in the network. For financial/space cost reasons, has to be kept at a minimum at maximum bandwidth. 1.5 How the Internet Routes The internet utilizes a hierarchical routing with staffete routing (+flooding). For a variety of reasons, internet's addressing and thus also the routing is clearly suboptimal. Especially, the high administratory overhead in address assignment and necessary size/complexity quantization of the routing grain (very coarse) are negative. Use of n-grid network topology allows the router to become simple enough to be on-die integrable. This is qualitatively different approach to current networks, where routers are 100 k$ hardware boxes. Implementations of failure tolerant ultra high performance wafer-scale-integrated (WSI) MISC-CPUs/DRAMs/Routers or special-purpose-engines/Routers (all on single die) appears possible and desirable. Unfortunately, this is not mainstream (but instances there are) as the market seems not to demand it. 2. A better mousetrap Cryonics-irrelevant stuff about the gloomy future of the net ommited. 3. The Boolean Hypercube and the Hypergrid Imagine a standard three-dimensional cube with an edge of length 1, the first node lying at coordinates (0,0,0), the second at (0,0,1), the third ... (0,1,0), (0,1,1), (1,0,0), (1,0,1), (1,1,0) and the eighth at (1,1,1) (see picture). It is a member Y ^ | Z | / 011| o---/----o 111 |/| / /| 010 o-+-+----o | 110 A 3-cube with binary labeled | |/ | | nodes in a cartesian coordinate 001| o------|-o 101 system |/ |/ o--------o-------> 000 100 X of the boolean n-cube homologue sequence with total number of nodes being 2^n. Thus one can label the nodes with a binary integer 0,..,(2^n)-1. Alternatively, one can say there a space of IDs for each n-cube. If we consider a single reference node in a boolean n-cube, it is connected to n nodes _with binary offsets_ (2^i). The sign of the offset is derived from the binary count pattern bit sequence. This is a constructive definition. (Look up connectivity matrix of boolean 5-cube way down. Note that it contains _four_ 3-cubes). This table represents the 3-cube: ref. Binary Signs binary connected ID Count Offsets IDs +---+-----+------+-----------+-------+ (alt.: | 0 | 000 | +++ | +4 +2 +1 | 4 2 1 | perfect | 1 | 001 | ++- | +4 +2 -1 | 5 3 0 | shuffle | 2 | 010 | +-+ | +4 -2 +1 | 6 0 3 | stages (1,2,3) | 3 | 011 | +-- | +4 -2 -1 | 7 1 2 | of the initial | 4 | 100 | -++ | -4 +2 +1 | 1 6 5 | ref. ID) | 5 | 101 | -+- | -4 +2 -1 | 2 7 4 | | 6 | 110 | --+ | -4 -2 +1 | 3 4 7 | | 7 | 111 | --- | -4 -2 -1 | 4 5 6 | +---+-----+------+-----------+-------+ This boolean n-cube is a peculiar beast. On the one hand it is a very orthogonal structure, easily reasoned upon mathematically. The signs/offsets look after themselves, it is a recursive/fractal structure (each n-cube consist of 2 interconnected (via (n-1) number of links) (n-1)-type subcubes), etc. Routing in it resembles binary search: the first step brings one halfway there, the second halfway again, etc. We descend the binary cluster hierarchy until we arrive at the 0-cube (single node). The "distance distribution versus dimensionality" table is the Pascal triangle, by the way. It can derive its own ID from wiring constraints and the other IDs it is directly connected to. (This is a crucial point). Etc. Routing in a perfect n-cube runs roughly: "going from the left, find the first bit in destination ID differing from source ID. Move message packet to the according subcube via direct link (there is only one). Repeat until you reach outer (rightmost) bit. Bingo." This can be coded in some 10 assembly statements or implemented via a very primitive logic. If a part of the links is missing (the n-cube is defective) we have to visit all the nodes which should have links to the second subcube until we can prove the task impossible. The n-grid, on the other hand, can be considered a better n-cube (is a superset of it). We can derive it from the n-cube by making link offsets symmetrical. Links which would reach out of ID space can be either wrapped around (circular space) or considered void/free (open space/grid region). See connectivity table below. Routing is again completely trivial. You just choose the link which distance (arithmetical ID difference) is smallest and shove the packet through. In a perfect n-grid, this works every time and is moreover the best possible algorithm. Especially, if the near-range order (shortest offsets) is highly preserved, this scheme works also for severely defective n-grids (there is a lot of space in those higher dimensions). If the message _blocks_ (its path has a periode/loop) we can circumvent the blockage either giving the message packet a random kick or executing a plan, subsequently visiting all the nodes which should have links in this direction in ever-increasing distances. Computer simulations I did have shown this routing scheme to work and, even better, to work well. Connectivity Tables of 5-cube and 5-grid (2^5=32 nodes each) # means a link, - means there is no link. Note that the n-cube is contained in the n-grid (as it is a superset). The n-cube has a fractal/recursive makeup: it contains two interconnected (n/2 links) (n-1)-cubes. A n-grid node has (2n-1) links as compared to the n-cube (free links are not shown). Two (n-1) subgrids are connected by (2n-1) links as compared by n links in the n-cube. The simple n-grid routing algorithm does not work for the n-cube (it blocks/loops in most cases). Moreover, if the _farthest_ link of each node (n-cube) is cut, the cube collapses into 2 unconnected (n-1) cubes. This does not happen to the n-grid. Notice that a defective structure (some links missing) still routes all right because of built-in redundancy (which is higher in the n-grid). The n-grid is a superset of the n-cube. Each node has boolean 5-cube 5-grid (open-space version. free links not shown) -##-#---#-------#--------------- -##-#---#-------#--------------- #--#-#---#-------#-------------- #-##-#---#-------#-------------- #--#--#---#-------#------------- ##-##-#---#-------#------------- -##----#---#-------#------------ -##-##-#---#-------#------------ #----##-----#-------#----------- #-##-##-#---#-------#----------- -#--#--#-----#-------#---------- -#-##-##-#---#-------#---------- --#-#--#------#-------#--------- --#-##-##-#---#-------#--------- ---#-##--------#-------#-------- ---#-##-##-#---#-------#-------- #--------##-#-----------#------- #---#-##-##-#---#-------#------- -#------#--#-#-----------#------ -#---#-##-##-#---#-------#------ --#-----#--#--#-----------#----- --#---#-##-##-#---#-------#----- ---#-----##----#-----------#---- ---#---#-##-##-#---#-------#---- ----#---#----##-------------#--- ----#---#-##-##-#---#-------#--- -----#---#--#--#-------------#-- -----#---#-##-##-#---#-------#-- ------#---#-#--#--------------#- ------#---#-##-##-#---#-------#- -------#---#-##----------------# -------#---#-##-##-#---#-------# #----------------##-#---#------- #-------#---#-##-##-#---#------- -#--------------#--#-#---#------ -#-------#---#-##-##-#---#------ --#-------------#--#--#---#----- --#-------#---#-##-##-#---#----- ---#-------------##----#---#---- ---#-------#---#-##-##-#---#---- ----#-----------#----##-----#--- ----#-------#---#-##-##-#---#--- -----#-----------#--#--#-----#-- -----#-------#---#-##-##-#---#-- ------#-----------#-#--#------#- ------#-------#---#-##-##-#---#- -------#-----------#-##--------# -------#-------#---#-##-##-#---# --------#-------#--------##-#--- --------#-------#---#-##-##-#--- ---------#-------#------#--#-#-- ---------#-------#---#-##-##-#-- ----------#-------#-----#--#--#- ----------#-------#---#-##-##-#- -----------#-------#-----##----# -----------#-------#---#-##-##-# ------------#-------#---#----##- ------------#-------#---#-##-##- -------------#-------#---#--#--# -------------#-------#---#-##-## --------------#-------#---#-#--# --------------#-------#---#-##-# ---------------#-------#---#-##- ---------------#-------#---#-##- Where's the beef? you might ask. What's the reason to introduce a new topology at all? There are several reasons: - n-cube and n-grid have the best routing performance at minimum wire length (when mapped to real physical 2- and 3-space). - the amount of computation (space/time complexity) for efficient routing is very small. On-chip routers for wafer-local high-speed serial networks are possible. - n-grids tolerate a high degree of defectivity at still good performance. - the IDs can be derived from (directly visible) neighbour's IDs without any need for administration. I call it dynamic ID derivation (DID). - the total communication bandwidth is very high and increases with increasing n-grid size. - the addressing and routing scheme is scalable to extremely big (WAN) nets. Finally, I should say there seems to be a way of relaxing wiring constraints even further without sacrificing too much efficiency. Random local wirings with decreasing wiring density (as realspace projection) can be harnessed with an ID subspace ID derivation and routing scheme. This I call fuzzy routing. 4. Cellular Automata Below is a somewhat theoretical section. It is still a worthwhile read since presentation is somewhat unusual. Important concepts and emphasized words are marked _with underscores_. Skip the section if you are interested only in applications of CAs. 4.1 What Are They? Finite-grid _cellular automata_ (CA) are finite state machines with discrete timespace. Differently put, CAs are n-dimensional (n = 1,2,3; usually) cell arrays of a finite size, each cell having a distinct state (chosen from set of defined states; usually quite small) at time t. _Central_, or _reference_ cell's state at time t+1 (future/next) is determined by its and its _neighbourhood_ (cells in the immediate vicinity, neigbourhood size being usually small in relation to the total array size) at time t (current/present). The function which performs this mapping of the "central-cell+neighbourhood-current-states" vector (base of the _light cone_) upon "central-cell-future-state" vector (tip of the light cone, usually a single cell) is called a _rule_. The rule is applied upon upon all cells _simultaneously_ at each time step. The set of all possible rules, to map all permutations of a given light cone base vector to the light cone tip vector is called the _rule space_. Let us restrict the cell states to binary integers each of n bits. Since we are be interested in practical, also implementable, systems, this does not introduce any restrictions. Obviously each cell can assume one of 2^n states, ranging from 0..(2^n)-1. The neigbourhood size is z cells, including the reference cell. If we consider the compound neighbourhood vector (consisting of individual cells in a given order) as a new integer, sized zn bit, it will have i=2^(zn) states. Obviously, the number grows very fast both with increasing number of states and the bigger neighbourhood size. Imagine this integer being an index to a table T_i of n-bit integers, the rule. Obviously, the space of all possible tables (having j=((2^n)^i)=(((2^n)^2)^(zn)) elements) is equivalent to the rule space. Let this rule space be a new table R_j with individual tables T_i being its entries. Thus R[n,z] is the generic rule space for any CA. Obviously, if T_i is truly random, there is no method (algorithm) to collapse the table. Particulary, the attempt of substituting it by code is prone to fail. T_i is a superset of any code solution, provided, code size < T_i. On the other hand, if the table is smooth/highly repetitive there is clearly a way to put a much simpler/smaller ersatz automaton in its place. Particularly, the main class of collapsible CAs is those with an _isotropic_, a high symmetry neighbourhood. This means, similiar configuration subpatterns in the neighbourhood sphere have about the same impact upon the reference cell, whenever there are located. (Particularly, in CAs of high dimensionality, this constrains rule complexity dramatically). If similiar configurations appearing at the light cone base are treated in a known simple way, e.g. additively, a further very noticeable reduction is possible. Note that this definition encompasses CAs of all dimensionality, which is given by the neighbourhood definition. Lattices of every dimensionality and symmetry, and even networks, can be defined in terms of e.g. offsets to the reference cell. Of course, individual neighbourhoods for each cell are possible. A high-neighbourhood CA is always a superset of a small-neigbourhood one (provided the neighbourhood is a true superset and the number of states is the same). The everything-is-influencing-everything-totally-holistic-yes-please CA (which is, admittedly, a somewhat perverse case) has the whole of the cell space as it neighbourhood. All finite-state neural nets in existance are comfortably embedded in its allencompassing depths. The only loose end left to define is whether this cellular space is _open_ or _circular_ (wrapped-around). Since in open spaces boundary cells neighbourhood reaches out into cell-free "void", they must be handled separately. This definition is constructive. Apart from being instrumentative for an straighforward look-up implementation (some MByte-sized look-up for very modest n and z; z being hard-coded or defined as list of offsets relative to the reference cell or absolute pointers. Both codings allow varying neighbourhoods.) it states that for every mapping of any cell space a upon b, denoted b=T_j1[a] there is an a=T_j2[b], the inverse mapping. T_j2[T_j1[a]]=T_j1[T_j2[a]]=a. Which also means, that after n iterations b=nT_j1[a] upon given space vector a, nT_j2[b] restores the original pattern a. How's that for a public-key cryptosystem? In this context, an efficient way of deriving T_j2 from T_j1 is of interest. (Even more so is a mapping T_j3, which T_j3[nT_j1[a]]=a and especially a convenient ;) way of finding it, using only (partial) knowledge of a.). But I got carried away. The term light cone stems from the maximum information propagation velocity (the _light velocity_) in a given CA which is limited by the neighbourhood size. Consider our usual physical space: information about one _event_ (e.g. snipping of one's fingers; detonation of a nuke) spreads at light velocity in all directions as a _sphere of knowledge_, a four-dimensional light cone. For the sake of illustration: assume our world to one-dimensional. A stack of this _world sheets_ in the 2nd dimension, the time, is called the _timespace_. Godlike, we can scan the future and the past in a single glance: The Arrow of Time ^ | | \ / t+n |---\--------------------/------ World sheet at t+n (Future) | \ / | \ / | \ / | \ / <------ future light cone | \ / (maximal area of knowledge/ | \ / maximum area to be influenced | \ / by the event) | \ / | \ / | \/ event! t +------------------------------- World sheet at t (Present) | /\ | / \ | / \ | / \ | / \ <------ past light cone | / \ (maximum amount of knowledge/ | / \ area necessary to predict | / \ the event) | / \ | / \ t-n |---/--------------------\------ World sheet at t-n (Past) | / \ (Read "A Brief History of Time", by Stephen W. Hawking (1989 Bantam) and (especially) Penrose's "The Emperor's New Mind" for an in-depth light cone discussion and _much_ more). We can predict the future of a pattern lying within its light cone. Eventually, disturbances from outside creeping in with light speed, let the area of predictability dwindle to a point. The special case of the light cone base is the reference cell with neighbourhood z, future pattern is the reference cell. To be able to predict/foresee the state of a given _pattern_ (a set of cells with their individual states at time t) at time t+n we need: - the space (cell vector pattern) and its size - the number of generations (integer) - the neighbourhood (list of cells) - the index function (algorithm to compute the integer index from neighbourhood) - the mapping function (rule) - knowledge whether the space is open (edge artifacts) or closed (toroidal space) time flow ^ w | o end of predictability | r /\ | l / \ | d / \ | /------\ half-height future pattern | s / \ ^ function/ | h / \ | mapping | e /------------\ current pattern | | e | t | s Actually, this has been the logical base of a very efficient implementation of a special edge-of-chaos 2d cellular automation: Life (see Moravec, "Mind children", Harvard University Press, pp. 154). In 1982, Bill Gosper, a former MIT Life group hacker, wrote a Life implementation, which was able to simulate cellular spaces one billion cells on a side. Using a recursive light-cone hash coding, hash table being filled in by the program at run time, the acceleration was spectacular. A typical run would take 1 min for first 100 generations, then 10 seconds for 10 000, then a sustainable rate of 1 000 000 generations within _one_ second. (If you have ever tried to implement a CA on a PC you know what that means). /\ / \ / \ \ /\ /\ /\ /\ / /\ /\ /\ /\ \ / \ / \ / \ / \ / / \ / X X \ --X----X----X----X----X--- / \ / / \/ \ \ / \ / \ / \ / \ / \ / \ /\ /\ /\ /\ /\ / \/ \/ \/ \/ \ / \ / X X X X \ / / \/ \/ \/ \ \ Light cone mosaic, A single cone A recursive cone overlapping at half-height To explain the reason why hashing worked so well we need a rough classification criteria for CAs. A well-known classification to to walk through the whole rule space R[n,z], investigating how each individual rule T maps a random vector over some generations. Obviously, an exhaustive investigation is possible only for very small n,z. For higher n,z values, representative sampling must do. Surprisingly, only four qualitatively different CA categories are discovered: - very dull_1 (maps everything to one state) ="solid" - dull (stable isles of (oszillating) states) ="(melting) solid" - interesting (particles/gliders all kinds of fascinating complex patterns, wow!) ="liquid" - very dull_2 (produces only high entropy patterns) ="gas" Obviously, the light cone hashing could work only so well because the probability of certain patterns to appear/persist was much higher then the others in the Life automaton. Starting as a Kaspar Hauser, the program rapidly learned to predict the future of this frequently recurring features, effectively making bigger and bigger leaps into the future. We can consider each rule vector T to be a point in the rule space R[n,z]. If we map this space e.g. with the euclidic distance metric, we notice that these areas form contigous isles in the space, very like physic's phase diagrams for substances. Moreover, as n,z go up we notice that very dull_2-type automata begin to dominate. Actually, even for very modest n,z the probability of finding a CA of "interesting" or even "dull" by pure random walk is nil. The liquid phase at the edge of chaos ("chaos", (Greek): gas), where the most promising (read: computationally universal) CAs are found, is only a _very_ thin layer on the boundary between solid and gas. Above fact appears very interesting especially in the context of Stuart Kaufmann's "edge of chaos" hypothesis. On a even more grandiose scale, Penrose's pin-equipped creator aiming for a tiny universe phase space comes to mind. In light of recent "fluctuating crazy quilt multiverse" models (Scientific American) an evolution- driven drift of the universe population, each featuring a different rule towards edge of chaos ones. Of course there must be a) inheritance b) mutation c) fitness, a function of rule, which increases the individual's propagation probability. Enough science fiction. One final effort: the (strange) attractors in CA spaces. In physics, the concept of _state_ or _phase space_ is used to visualize the temporal _evolution_ of a system. The total state of a system at time t is entirely represented by a single point in n-space, e.g. 6N-dimensional (three position coordinates and three momentum coordinates for each particle) for N unconstrained particles. A _Hamiltonian_ (the rule in our terminology) defines a vector field, a mapping on that system. Differently put, the Hamiltonian is an _energy gradient_, a pathway for a system to slide towards a (local) minimum. In our version of finite binary integer CAs with quantized time, each vector at t is mapped by the rule to the future vector at t+1. Nothing new, so far. Now consider a initial _region_, a contigous block of vectors. Each element of the block at t gets mapped by the rule to another set of elements at t+1. Obviously, the _volume_, the total number of cells in a given region cannot grow bigger, it can shrink or remain constant, at most. Due to the Liouville theorem a true Hamiltonian system's region volume remains invariant (constant) during temporal evolution (note that the CA rule is a superset of a Hamiltonian as it does _also_ allow volume reduction). This has profound implications: for a majority of systems the elements of an contigous regions diverge with each evolution step, loosing their coherence they are getting spread over increasingly larger volumes. The extreme cases of this system behaviour is referred to as _ergodic_. The "gas" phase type CAs are excellent specimens of this behaviour. As a minimal disturbance in the starting vector produces a maximal deviation within a few iterations, the gaseous CAs are thus an excellent illustration of the "butterfly" effect. Another important theorem states that exists only one _trajectory_ to pass through each point of the state space. The _periode_ of the trajectory, the number of transitions until the randomly walking vector hits upon itself by chance is _very_ long in high-dimensional CAs (there are a great many directions to go). This property, coupled with the lack of attractors/high path entropy at equidistributed statespace point visit probability makes this class for CAs particularly attractive for cryptography. A truly ergodic system performs a random walk through its state space with the probability of visiting each point being equidistributed. Additionally, the systems's evolution velocity (the fraction of transition length over time step) along its trajectory using the euclidian distance metric (square root of the sum of quadrates of the element's differences) gets increasingly evened out with increasing size of the space, the mean translational velocity is roughly a constant. The bigger the system, the less detectable is the individual's contribution to the whole picture, an obvious parallele to the "billiard table" ideal gas picture. Though subsystem components may experience vigorous brownian motion, the whole system stands rock firm. One should remark, however, that for Morse-modulated-potential-type ideal gases boxes there _are_ constraints, as introduced by the shape of the potential function. One can calculate some kind of crazy Schroedinger-square, telling which points are likely to be visited and which not. A system can also be seen as a state-space balett of interacting subsystems of smaller dimensionality, choreographed by higher-order rule. Particularly, using binary subsystem clustering may appear constructive in implementation/analysis. The dull_1 (solid) CAs are remain trapped in single points or constrained to tiny volumes ping-pongying to and fro. If we start to melt the solid rule, thus drawing nearer and nearer to the liquid phase edge, the traversed regions become increasingly larger and larger, the paths more and more complex. A liquid system may stay liquid for a long while or make an expedition into the chaos region, either to stay there indefinitely (there is no way to tell) or returning back, eventually. It might also freeze out, remaining trapped in a solid forever. A typical liquid CA will also show special points (typically, many), called _attractors_, around which on complex trajectories the system will revolve. In a finite system, several convergent paths will congeal into a single one, which might eventually be trapped in a single point, the attractor. (This would never happen in a continous system, of course). The entire volume belonging to converging trajectories is called the _basin of attraction_, a simile to basins of drainage in a montainous massive. A system, released within its basin of attraction, will remain there forever, unless disturbed by an outside force. Two basins of attraction are meet at a ridge, called _separatrix_, a region with one degree of freedom fewer than the number of variables in the system. Instead of being a point, an attractor might be a loop, a toroid (of course, it's higher-dimensional analogues) or any such thing. Moreover, there is a class of _strange_ or _chaotic_ attractors. Their main property is that two very close systems let loose within the basin remain on the attractor surface but float forth _diverging from each other_. Tiny initial deviations gets amplified with time. For obvious reasons, real systems featuring these strange attractor (or ergodic) property are virtually noncomputable as they would require infinite precision both at measurement and simulation stage. Both are unattainable. The main distinction between ergodic systems and these containing strange attractors is that a strange attractor/its basin is tiny in relation to the total statespace size. By now you must think I must be strangely attracted to those attractors, but these beasts are actually useful. There is _heavy_ experimental/theoretical evidence (both direct and indirect) that the brain uses state-spaces and generic mappings (coordinate transforms) to represent and process information (see Churchland for an excellent introduction). Particularly, the existance of attractors is ascertained (Nicolis/Prigogine). Most, maybe all, part of machinery of the brain does calculations utilizing trajectories and attractors in state space (see Churchland). Actually, I believe any system (including us) which does any kind of computation/life must be an edge-of-chaos one to be efficient. Our computers, alas, reside quite firmly in the solid region. 4.2 What Are They Good For? A good question. Are CAs just another irrelevant theoretical construct? I think not. - CAs are great for applied cryptography, both for rnd-stream and public-key. Sadly, CA are largely unknown among cryptographers as yet. - CAs are great for efficient modeling (e.g. fluid dynamics), reducing the amount of necessary computation by many orders of magnitude. A modest fluid dynamic package will run on a PC in realtime. (I've seen some). Systems with purely local interactions (essentially all systems fall within this class) and (best) a noticeable number of constraints are prior candidates for CA modeling. Sadly again, most scientists haven't heard about CAs yet. - neural nets (NNs) are special cases of CAs, which are special cases of automata networks (ANs). Knowledge transfer seems to be possible. - CAs and ANs are great for self-organization and non-linearity science. Intuitive reasoning and/or a well-developed body of mathematical machinery is readily available. - CAs can help up to understand embryonal morphogenesis. - CAs are great toy systems for artificial life (AL). - A family of CAs is computationally universal. They compare favourably (in time/space-complexity) to the Turing machine/von Neumann architectures. CAs might be computationally equivalent to Turing engines, but not in terms of time complexity. Their implementation in silicon is straightforward. Achievable computing power with the same transistor resources is several orders of magnitude higher than current architectures. Being redundant arrays of identical, extremely primitive elements, they are the major candidates for molecular electronics implementations. Achievable integration density and computation performance are of all architectures the highest theoretically possible. - CAs are interesting candidates for universe theories. They mimic a lot (reversibility of the time arrow, particles, light speed, anthropic principle, edge of chaos theories, information conservation, etc.) of the real-world properties. 4.3 A Recipe for a CA XOR-Rnd-Stream Cryptosystem. - Choose a CA which does very good rands. Most CAs from the gas region will do nicely. - Define all the boundary conditions. Code it. - Choose a random space vector as initial key. Give the key to your partner via a safe channel. - XOR your data stream with CA output stream to obtain cypher text. Your partner does the same to retrieve plain. Of course, this works also the other way round. Why bother? CA's form a _family_ of random generators, which are well-behaved, produce excellent rands and run fast even on von-Neumann machines. Internet nodes will be able to crypt TCP/IP packets individually via a different crypt for each target node. Routine eavesdropping will become much more difficult this way, as the cryptoattacker will have to know the right crypt method for each packet. Since crypt is transparent (hidden somewhere in the lower OSI levels), top level applications could (and should) use a different/better cryptosystem, such as: 4.4 A Recipe for a CA Public Key Cryptosystem. - Search CA (automaton network space, actually) space (_not_ rule space) for a cryptography-suitable automaton family. Choose one member. Implement it in C, give away code. Invite people to try cryptoattacks. - Fill rule table with very good rands. Do _not_ use compiler built-int rands. Hardware rnd generator or another CA may do. - Construct the inverse rule. - Decide, which rule table to be your public key. Give it into the net. Keep the other in your safe. All data sent to you, transformed n times (you must agree upon how many) with your public key can be only decoded applying your private key n times. All data encoded by your private key, can be only read with your public key. You want to know whether the sender is the person it purports. Choose a big pseudo random number (best CA rnds, again), encode it with your friend's public key and send cypher to the pretender. Pretender codes the number with its private key, then sends it back to you. You decode it using your friend's public key. Is the obtained the same random number you chose initially? If yes: Pretender=Friend. Bingo! No? Pretender=bad guy. Call the police! Why bother, again? Robust trapdoor cryptosystems with two inverse keys are not exactly frequent in cryptology. The only one I know of is based on large prime factoring, is supposed to be breakable with a finite effort (and might become even more fragile with further advances in number theory). Moreover, it takes a lot of horsepower to compute. Using CAs might come quite handy here. 4.5 Two Simulations \ 4.5.1 Fluid Dynamics \ though extremely interesting 4.5.2 Protein Folding / these chapters are ommited due / to lack of relevance to uploading If there are doubts CAs to be instrumental for large-scale scientific simulations I will supplement. --- to be continued ---------------------------------------------- Literature/References [1] Roger Penrose, "The Emperor's New Mind", Vintage (1990). [2] Stephen Hawking, "A Brief History of Time", Bantam (1989). [3] Hans Moravec, "Mind Children", Harvard University Press (1988). [4] Stuart A. Kaufmann, "The Origins of Order", Oxford University Press (1993) [5] Gregoire Nicolis, Ilya Prigogine (ed.), "Die Erforschung des Komplexen", Piper (1987) [6] Dawkins, "The Blind Watchmaker", (1986) [7] R. Vollmar, "Algorithmen in Zellularautomaten", B.G. Teubner Stuttgart (1979). [8] G. E. Gorelik, "Rasmernost' Prostranstva", Isdatel'stvo Moskovskogo Universiteta (1983). [9] Andrew Hodges, "Alan Turing: the enigma", Vintage (1992). [10] Koch/Segev (ed.), "Methods in Neuronal Modeling", MIT Press, 2nd printing (1990) [11] Yoh-Han Pao, "Adaptive Pattern Recognition and Neural Networks", Addison Wesley (1989). [12] Igor Aleksander (ed.), "Neural Computer Architectures", North Oxford Academic (1989) [13] Paul M. Churchland, "Representation and high-speed computation in neural networks" in "Foundation of artificial intelligence, a sourcebook", by Derek Patridge and Yorick Wilks, (ed.), Cambridge University Press (1990). [14] P.W. Atkins, "Physical Chemistry", Oxford University Press (1987) Rate This Message: http://www.cryonet.org/cgi-bin/rate.cgi?msg=4393