Artificial Neural Networks 3. Department of Cybernetics, CTU Prague.

Artificial Neural Networks 3 Jiří Kubaĺık Department of Cybernetics, CTU Prague http://labe.felk.cvut.cz/~posik/xe33scp/

pcontents Hopfield Neural Network topology, Hebb learning rule, energy function and capacity, example - character recognition. Self-organization unsupervised learning, vector quantization, Lloyd s algorithm, Kohonen learning, Kohonen Self-Organizing Map, Example - building a model of a corridor from robotic data. Artificial Neural Networks 3

phopfield Neural Network :: Linear associative memory record in the memory is indexed by partial knowledge auto-associative refinement of the information given on the input Ex.: input: b/w portrait, output: corresponding colors hetero-associative knowledge related to the knowledge on the input is retrieved Ex.: input: b/w portrait, output: name of the person :: Topological structure all n neurons are of I/O type, bipolar neuron s output: y j { 1, 1}, neuron s potential: ξ j Z, w ji Z, w ii = 0, bias i = 0. Artificial Neural Networks 3

phopfield Neural Network: Adaptation :: Training set τ = {x k x k = (x k1,..., x kn ) { 1, 1} n, k = 1,..., p)} :: Hebb rule, named after neurophysiologist Donald Hebb that explains how conditional reflexes are established. Change in synaptic weight of the connection between two neurons is proportional to their mutual activity, i.e. to the product of the neurons states. If two neurons have the same value then the synaptic weight is strengthened, otherwise it is weakened. first neuron represents a condition, second neuron represents an action. :: Adaptation 1. t = 0: all weights are set to 0 w ji = 0 (j = 1,..., m, i = 1,..., n). 2. t = 1,..., p (p is a number of training samples): k th training sample is connected to the network and weights are adapted according to the Hebb rule: w (t) ji = w (t 1) ji + x kj x ki 1 j i n. Artificial Neural Networks 3

phopfield Neural Network: Adaptation cont. Resulting in a final configuration w ji = p x kj x ki 1 j i n. k=1 Symmetric network as w ji = w ij. Voting - training samples vote for links between neurons. Weight w ji = w ij represents a difference between a number of consistent states x kj = x ki, where each contributes by (x kj x ki = 1) to the final value of w ji, and a number of inconsistent states x kj x ki states (that contribute by x kj x ki = 1). sign of w ji indicates the result of voting, w ji is the strength of the winning alternative. Artificial Neural Networks 3

phopfield Neural Network: Active Mode :: Sequential mode 1. t = 0: y (0) i = x i (i = 1,..., n) 2. t > 0: neuron j is updated. First, its inner potential is calculated as n ξ (t 1) j = w ji y (t 1) i, then its new bipolar state is determined as i=1 y (t) j 1 ξ (t 1) j > 0 ξ (t 1) j = 0 = y (t 1) j -1 ξ (t 1) j < 0 Neurons are taken one by one, j th neuron at time t given as t = τn + j where τ is so-called macroscopic time, a number of periods all neurons were updated. Other neurons stay intact. Artificial Neural Networks 3

phopfield Neural Network: Active Mode 3. Calculation stops at t when the network gets into a stable state (states of the neurons do not change any more): y (t +n) j = y (t ) j (j = 1,..., n). Given the weights are symmetric, the sequential process stops for any input data. Thus, Hopfield network realizes a function y(w) : { 1, 1} n { 1, 1} n Output depends on the configuration w as well as the order in which the neurons are updated. :: Parallel mode in each time step, multiple neurons are updated; this might result in an unstable state when the network switches between two different states. Artificial Neural Networks 3

phopfield Neural Network: Energy Function :: Hopfield network resembles simple models of magnetic materials (spin glasses) in statistical physics. :: Energy function E(y) assigns a potential energy to every state of the network according to: E(y) = 1 n n w ji y j y i. 2 j=1 i=1 low E(y)... stable states; sign of w ji corresponds to a mutual relation between states of y i and y j. highe(y)... unstable states. :: Minimization of E(y) in active mode, the network starts with y (0) that generates energy E(y (0) ), that is iteratively eliminated E(y (t) ) E(y (t+1) ) till the process stops in time t at some local minimum of the energy function E(y (t ) ). This resembles a minimization of the error function by a gradient method since the new state y (t) j is equal to an inverse gradient of the energy function at y (t 1) j E n (y (t 1) j ) = w ji y (t 1) i. y j i=1 Artificial Neural Networks 3

phopfield Neural Network: Energy Function cont. :: Goal of the adaptation is to find a configuration w such that the network realizes autoassociative memory this means that for any input that is close to some training sample, the output will correspond to that training sample; so, every training sample should represent a local minimum of E(y) (a stable state, in other words). Surface of the energy function splits into several regions of attraction each region of attraction represents all input states of the network that converge to the same local minimum. phantoms do not correspond to any training sample. Removing phantoms: w ji = w ji x j x i for 1 j i n. Artificial Neural Networks 3

phopfield Neural Network: Capacity :: Capacity of the network is defined as the ratio p/n of the training samples p to the number of neurons n. determines its capability of reproducing the training samples. Given the states of neurons of the training samples are chosen by random with the same probability, then the probability P that the state of a given neuron in a training sample will be stable is P = 1 2 1 π n/2p 0 e x2 dx. for p = 0.185n we can assume that a number of unstable neuron states in training patterns will not be greater than 1%; does not anything about whether the network will converge to a stable state that is close to the corresponding training pattern. :: Capacity analyzes For p 0.138n training patterns correspond to local minima of the energy function. So, the network can be used as an auto-associative memory. Ex.: In order the network to be able to work well for 10 training patterns, a number of 200 neurons would have to be used, which implies 40.000 connections. Artificial Neural Networks 3

phopfield Neural Network: Example :: Character recognition Characters are represented by a matrix of 12 10 pixels. Each pixel corresponds to one neuron, whose state y j = 1 and y j = 1 represents black and white color, respectively. Training set consists of 8 training patterns. Trained network was tested on a picture of character 3 which was partially damaged by changing 25% of its pixels. Question: What would be the output of the network if the input vector was an inversion of some of the training patterns? c J. Šíma and R. Neruda: Teoretické otázky neuronových sítí. Artificial Neural Networks 3

pself-organization and Vector Quantization :: Competitive learning output neurons compete for being active (only one neuron is active at a time). :: Goal is to find a set of representatives such that each of them would have the same probability of being the closest one to an input pattern chosen randomly from the same distribution as the distribution of training patterns. representatives have the same probability of being selected. :: Vector Quantization (VQ) a problem from a field of signal processing and approximation. Goal of VQ is to approximate a probability density p(x) of real input vectors x R n by means of a finite number of representatives w i R n ; (i = 1,..., h). Given a set of representatives, we can find to every vector x R n the closest w c : c = arg min { x w l } l=1,...,h Artificial Neural Networks 3

pvector Quantization One way to find a solution to this problem is to minimize an error of VQ defined as E = x w c 2 p(x) dx, when the probability density of p(x) is known, or E = 1 k k x (t) w c 2, when the problem is given by a finite set of training patterns. Where is Euclidean norm and c is defined as c = arg min l=1,...,h { x w l }. Formulas look simple, but t=1 c depends on both patterns x and representatives w, so it is not easy to express a gradient of error function w.r.t. parameters of w and use it in a standard minimization procedure. Instead, heuristic iterative procedures have been proposed for finding a solution. Artificial Neural Networks 3

pself-organization: Lloyd s Algorithm Lloyd s algorithm also known as Voronoi iteration or relaxation, a method for evenly distributing samples or objects, usually points. Input: Training set T = {x (t) ; t = 1,..., k} and parameter h that specifies a number of representatives w i. Output: Weights of the representatives w j ; j = 1,..., h. 1. Initialization: Set the representatives by random. 2. Assign representative w c to each vector x (t) T according to c = arg min l=1,...,h { x w l }. 3. Calculate error E = 1 k k t=1 x(t) w c 2. 4. If E < ε, then stop. 5. For each j = 1,..., h calculate t j according to t j = 1 T j 6. Assign w j = t j. 7. Goto step 2. x j T j x j. The algorithm iterates until the distribution is good enough. Another common termination criterion is when the maximum distance a point moves in one iteration is below some set limit. Representatives are updated after the whole training set has been processed. Artificial Neural Networks 3

pself-organizing Network: Kohonen Learning :: Topological structure 2-layer network n input neurons (x R n ), h output neurons (representatives), each representative j is connected to all input neurons; w j = (w ji,..., w jn ), j = 1,..., h. :: Active mode winner-takes-all strategy Output neurons take values y j {0, 1}, and just one output neuron is active. Output of each neuron with respect to its distance to the input vector x (t) is calculated as :: Adaptation Kohonen Learning Go through the training set and for each training vector run a competition among the representatives. Winner of each competition is updated according to Parameter 0 < θ 1 defines the change rate (decreases 1 0). Artificial Neural Networks 3

pkohonen Self-Organizing Map :: Self-organizing map (SOM) is trained using unsupervised learning to produce a lowdimensional, discretized representation of the input space of the training samples, called a map. The map seeks to preserve the topological properties of the input space. :: Topological structure like in self-organizing network, plus the output units are arranged into a topological structure (1D or 2D lattice). The topological structure defines for each unit c a neighborhood N s (c) of size s as a number of neurons whose distance to neuron c is less than or equal to s as N s (c) = {j; d(j, c) s} :: Active mode The neighborhood information is not considered. The output unit that is closest to the input vector is activated (y winner = 1). Other units are inactive (y loser = 0). Artificial Neural Networks 3

pkohonen Self-Organizing Map: Adaptation Takes into consideration the topological structure of neurons so that the winner neuron is updated along with all its neighbors. Neurons that are neighbors in the network should not be far apart in the input space as well. The size of the neighborhood is not a constant. At the beginning of the learning phase s is set to a rather big value (for example a half of the network size) and decreases towards zero. So, at the final stage of the process, only the winner neuron is considered for being updated. Weights of the representatives are updated according to where c is the winner neuron. This can be re-written as w (t) ji = w (t 1) ji + h c (j)(x (t) i w (t 1) ji ) if we define a function h c (j) as Artificial Neural Networks 3

pkohonen Self-Organizing Map: Adaptation cont. Usually, h c (j) is defined so that a transition between zero and non-zero values is continuous. Typically, a Gaussian function of the form d(j, c)2 h c (j) = h 0 exp( ) σ 2 is used with the center in c, width σ R, and parameter h 0 R defines a maximal shift of units. h 0 and σ decrease in time. More time-consuming approach. Hints for running the learning algorithm Representatives should be initialized so that they are maximally different. A number of iterations should be at least 500 h (typically, 10 4 to 10 5 ). We distinguish two phases 1. coarse-learning short stage, up to 1000 iterations; θ drops from 0.9 to 0.01, s drops from a value that is comparable to the size of a network to 1. Units are globally distributed. 2. fine-learning both θ and s decrease to 0. Several proofs of convergence have been given for one-dimensional Kohonen networks in onedimensional domains. There is no general proof of convergence for multidimensional networks. Artificial Neural Networks 3

pkohonen Self-Organizing Map: Example 1 :: Mapping a square with a two-dimensional lattice c R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996. The four diagrams display the state of the network after 100,1000, 5000, and 10000 iterations. In the second diagram several iterations have been overlapped to give a feeling of the iteration process. Since in this experiment the dimension of the input domain and of the network are the same, the learning process reaches a very satisfactory result. Artificial Neural Networks 3

pkohonen Self-Organizing Map: Example 2 :: Planar network with a knot c R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996. An example of a network which has reached a state very difficult to correct. A knot has appeared during the training process and, if the plasticity of the network has reached a low level, the knot will not be undone by further training, as the overlapped iterations in the diagram on the right, in figure show. Artificial Neural Networks 3

pexample: Building 3D Models by means of Self-Organization (1) Using one 2D lattice. Using two 2D lattices. c J. Koutník, Computational Intelligence Group, CTU Prague. Artificial Neural Networks 3

pexample: Building 3D Models by means of Self-Organization (2) c J. Koutnı k, R. Ma zl and M. Kulich: Building of 3D Environment Models for Mobile Robotics Using Self-Organization, In proceedings of PPSN 2006. 1. Data Acquisition Experimental data were gathered by two laser range-finders orthogonally mounted on a mobile robot. Artificial Neural Networks 3

pexample: Initial Clustering 2. Data ( 105 vectors) were clustered using K-means algorithm. Artificial Neural Networks 3

pexample: Building Sub-maps 3. Each cluster is covered by a rectangular mesh constructed by Kohonen SOM algorithm. Artificial Neural Networks 3

pexample: Joining Phase 4. All meshes are joined together using nearest neighbor search algorithm with an adaptive threshold, which depends on mean distances between nodes in meshes being joined. Artificial Neural Networks 3

pexample: Re-Optimization 5. The SOM algorithm is executed again on the complex non-planar mesh for joints re-optimization. Artificial Neural Networks 3

preferences 1. Šíma, J., Neruda, R.: Teoretické otázky neuronových sítí. Praha: MATFYZPRESS, 1996. 2. Rojas, R.: Neural Networks - A Systematic Introduction, Springer-Verlag, Berlin, New-York, 1996. (on-line: http://page.mi.fu-berlin.de/rojas/neural/index.html) Artificial Neural Networks 3