Archive-name: ai-faq/neural-nets/part7 Last-modified: 2000-12-18 URL: ftp://ftp.sas.com/pub/neural/FAQ7.html Maintainer: saswss@unx.sas.com (Warren S. Sarle)
This is part 7 (of 7) of a monthly posting to the Usenet newsgroup comp.ai.neural-nets. See the part 1 of this posting for full information what it is all about.
------------------------------------------------------------------------
More information on NN chips can be obtained from the Electronic Engineers Toolbox web page. Go to http://www.eg3.com/ebox.htm, type "neural" in the quick search box, click on "chip co's" and then on "search".
Further WWW pointers to NN Hardware:
Here is a short list of companies:
HNC Inc. 5930 Cornerstone Court West San Diego, CA 92121-3728 619-546-8877 Phone 619-452-6524 FaxHNC markets:
10260 Campus Point Drive MS 71, San Diego CA 92121 (619) 546 6148 Fax: (619) 546 6736
30 Skyline Drive Lake Mary FL 32746-6201 (407) 333-4379MicroDevices makes MD1220 - 'Neural Bit Slice'. Each of the products mentioned sofar have very different usages. Although this sounds similar to Intel's product, the architectures are not.
2250 Mission College Blvd Santa Clara, Ca 95052-8125 Attn ETANN, Mail Stop SC9-40 (408) 765-9235Intel was making an experimental chip (which is no longer produced): 80170NW - Electrically trainable Analog Neural Network (ETANN) It has 64 'neurons' on it - almost fully internally connectted and the chip can be put in an hierarchial architecture to do 2 Billion interconnects per second. Support software by
California Scientific Software 10141 Evening Star Dr #6 Grass Valley, CA 95945-9051 (916) 477-7481Their product is called 'BrainMaker'.
7a Lavant Street Peterfield Hampshire GU32 2EL United Kingdom Tel: +44 730 60256
1400 NW Compton Drive Suite 340 Beaverton, OR 97006 U. S. A. Tel: 503-690-1236; FAX: 503-690-1249
P.O. Box 14 Marion, OH 43301-0014 Voice (740) 387-5074 Fax: (740) 382-4533 Internet: jwrogers@on-ramp.net http://www.neurodynamx.comInfoTech Software Engineering purchased the software and trademarks from NeuroDynamX, Inc. and, using the NeuroDynamX tradename, continues to publish the DynaMind, DynaMind Developer Pro and iDynaMind software packages.
Via S. Maria Maddalena, 38100 Trento, Italy Tel: +39 0461 260 552 Fax: +39 0461 260 617 Email: info@neuricam.comNC3001 TOTEM - Digital Processor for Neural Networks
------------------------------------------------------------------------
Jay Scott, Machine Learning in Games: http://forum.swarthmore.edu/~jay/learn-game/index.html
METAGAME Game-Playing Workbench: ftp://ftp.cl.cam.ac.uk/users/bdp/METAGAME
R.S. Sutton, "Learning to predict by the methods of temporal differences", Machine Learning 3, p. 9-44 (1988).
David E. Moriarty and Risto Miikkulainen (1994). "Evolving Neural Networks to Focus Minimax Search," In Proceedings of Twelfth National Conference on Artificial Intelligence (AAAI-94, Seattle, WA), 1371-1377. Cambridge, MA: MIT Press, http://www.cs.utexas.edu/users/nn/pages/publications/neuro-evolution.html
Games World '99 at http://gamesworld99.free.fr/menuframe.htm
G. Tesauro and T.J. Sejnowski (1989), "A Parallel Network that learns to play Backgammon," Artificial Intelligence, vol 39, pp. 357-390.
G. Tesauro and T.J. Sejnowski (1990), "Neurogammon: A Neural Network Backgammon Program," IJCNN Proceedings, vol 3, pp. 33-39, 1990.
G. Tesauro (1995), "Temporal Difference Learning and TD-Gammon," Communications of the ACM, 38, 58-68, http://www.research.ibm.com/massive/tdl.html
Pollack, J.P. and Blair, A.D. (1997), "Co-Evolution in the Successful Learning of Backgammon Strategy," Brandeis University Computer Science Technical Report CS-97-193, http://www.demo.cs.brandeis.edu/papers/long.html#hcgam97
METAGAME: ftp://ftp.cl.cam.ac.uk/users/bdp/bridge.ps.Z
He Yo, Zhen Xianjun, Ye Yizheng, Li Zhongrong (19??), "Knowledge acquisition and reasoning based on neural networks - the research of a bridge bidding system," INNC '90, Paris, vol 1, pp. 416-423.
M. Kohle and F. Schonbauer (19??), "Experience gained with a neural network that learns to play bridge," Proc. of the 5th Austrian Artificial Intelligence meeting, pp. 224-229.
Mark Lynch (1997), "NeuroDraughts: an application of temporal difference learning to draughts," http://www.ai.univie.ac.at/~juffi/lig/Papers/lynch-thesis.ps.gz Software available at http://forum.swarthmore.edu/~jay/learn-game/archive/NeuroDraughts-1.00.zip
K. Chellapilla and D. B. Fogel, "Co-Evolving Checkers Playing Programs using Only Win, Lose, or Draw," SPIE's AeroSense'99: Applications and Science of Computational Intelligence II, Apr. 5-9, 1999, Orlando, Florida, USA, http://vision.ucsd.edu/~kchellap/Publications.html
David Fogel (1999), Evolutionary Computation: Toward a New Philosophy of Machine Intelligence (2nd edition), IEEE, ISBN: 078035379X
Not NNs, but classic papers:
A.L. Samuel (1959), "Some studies in machine learning using the game of checkers," IBM journal of Research and Development, vol 3, nr. 3, pp. 210-229.
A.L. Samuel (1967), "Some studies in machine learning using the game of checkers 2 - recent progress," IBM journal of Research and Development, vol 11, nr. 6, pp. 601-616.
Sebastian Thrun, NeuroChess: http://forum.swarthmore.edu/~jay/learn-game/systems/neurochess.html
Luke Pellen, Octavius: http://home.seol.net.au/luke/octavius/
H. Chen1, P. Buntin, L. She, S. Sutjahjo, C. Sommer, D. Neely (1994), "Expert Prediction, Symbolic Learning, and Neural Networks: An Experiment on Greyhound Racing," IEEE Expert, December 1994, 21-27, http://ai.bpa.arizona.edu/papers/dog93/dog93.html
Kuonen Diego, "Statistical Models for Knock-out Soccer Tournaments", http://dmawww.epfl.ch/~kuonen/CALCIO/ (not neural nets, but relevant)
David Stoutamire (19??), "Machine Learning, Game Play, and Go," Center for Automation and Intelligent Systems Research TR 91-128, Case Western Reserve University. http://www.stoutamire.com/david/publications.html
David Stoutamire (1991), Machine Learning Applied to Go, M.S. thesis, Case Western Reserve University, ftp://ftp.cl.cam.ac.uk/users/bdp/go.ps.Z
Norman Richards, David Moriarty, and Risto Miikkulainen (1998), "Evolving Neural Networks to Play Go," Applied Intelligence, 8, 85-96, http://www.cs.utexas.edu/users/nn/pages/publications/neuro-evolution.html
Markus Enzenberger (1996), "The Integration of A Priori Knowledge into a Go Playing Neural Network," http://www.cgl.ucsf.edu/go/Programs/neurogo-html/neurogo.html
Schraudolph, N., Dayan, P., Sejnowski, T. (1994), "Temporal Difference Learning of Position Evaluation in the Game of Go," In: Neural Information Processing Systems 6, Morgan Kaufmann 1994, ftp://bsdserver.ucsf.edu/Go/comp/td-go.ps.Z
Freisleben, B., "Teaching a Neural Network to Play GO-MOKU," in I. Aleksander and J. Taylor, eds, Artificial Neural Networks 2, Proc. of ICANN-92, Brighton UK, vol. 2, pp. 1659-1662, Elsevier Science Publishers, 1992
Katz, W.T. and Pham, S.P. "Experience-Based Learning Experiments using Go-moku", Proc. of the 1991 IEEE International Conference on Systems, Man, and Cybernetics, 2: 1405-1410, October 1991.
E.M.Condon, B.L.Golden, E.A.Wasil (1999), "Predicting the success of nations at the Summer Olympics using neural networks", Computers & Operations Research, 26, 1243-1265.
http:// www.engin.umd.umich.edu/~watta/MM/pong/pong5.html
David E. Moriarty and Risto Miikkulainen (1995). Discovering Complex Othello Strategies through Evolutionary Neural Networks. Connection Science, 7, 195-209, http://www.cs.utexas.edu/users/nn/pages/publications/neuro-evolution.html
Richard S. Sutton and Andrew G. Barto (1998), Reinforcement Learning: An Introduction The MIT Press, ISBN: 0262193981
Abstract
This paper describes a four-channel real-time system for the detection
and measurement of sheep rumination and mastication time periods by the
analysis of jaw sounds transmitted through the skull. The system is
implemented using an 80486 personal computer, a proprietary data
acquisition card (PC-126) and a custom made variable gain preamplifier
and bandpass filter module. Chewing sounds are transduced and
transmitted to the system using radio microphones attached to the top of
the sheep heads. The system's main functions are to detect and estimate
rumination and mastication time periods, to estimate the number of chews
during the rumination and mastication periods, and to provide estimates
of the number of boli in the rumination sequences and the number of
chews per bolus. The individual chews are identified using a special
energy threshold detector. The rumination and mastication time periods
are determined by neural network classifier using a combination of time
and frequency domain features extracted from successive 10 second
acoustic signal blocks.
------------------------------------------------------------------------
For unsupervised learning, conventional statistical methods for missing data are often appropriate (Little and Rubin, 1987; Schafer, 1997). There is a concise introduction to these methods in the University of Texas statistics FAQ at http://www.utexas.edu/cc/faqs/stat/general/gen25.html.
For supervised learning, the considerations are somewhat different, as discussed by Sarle (1998). The statistical literature on missing data deals almost exclusively with training rather than prediction (e.g., Little, 1992). For example, if you have only a small proportion of cases with missing data, you can simply throw those cases out for purposes of training; if you want to make predictions for cases with missing inputs, you don't have the option of throwing those cases out! In theory, Bayesian methods take care of everything, but a full Bayesian analysis is practical only with special models (such as multivariate normal distributions) or small sample sizes. The neural net literature contains a few good papers that cover prediction with missing inputs (e.g., Ghahramani and Jordan, 1997; Tresp, Neuneier, and Ahmad 1995), but much research remains to be done.
References:
Donner, A. (1982), "The relative effectiveness of procedures commonly used in multiple regression analysis for dealing with missing values," American Statistician, 36, 378-381.
Ghahramani, Z. and Jordan, M.I. (1994), "Supervised learning from incomplete data via an EM approach," in Cowan, J.D., Tesauro, G., and Alspector, J. (eds.) Advances in Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufman, pp. 120-127.
Ghahramani, Z. and Jordan, M.I. (1997), "Mixture models for Learning from incomplete data," in Greiner, R., Petsche, T., and Hanson, S.J. (eds.) Computational Learning Theory and Natural Learning Systems, Volume IV: Making Learning Systems Practical, Cambridge, MA: The MIT Press, pp. 67-85.
Jones, M.P. (1996), "Indicator and stratification methods for missing explanatory variables in multiple linear regression," J. of the American Statistical Association, 91, 222-230.
Little, R.J.A. (1992), "Regression with missing X's: A review," J. of the American Statistical Association, 87, 1227-1237.
Little, R.J.A. and Rubin, D.B. (1987), Statistical Analysis with Missing Data, NY: Wiley.
McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition, Wiley.
Sarle, W.S. (1998), "Prediction with Missing Inputs," in Wang, P.P. (ed.), JCIS '98 Proceedings, Vol II, Research Triangle Park, NC, 399-402, ftp://ftp.sas.com/pub/neural/JCIS98.ps.
Schafer, J.L. (1997), Analysis of Incomplete Multivariate Data, London: Chapman & Hall, ISBN 0 412 04061 1.
Tresp, V., Ahmad, S. and Neuneier, R., (1994), "Training neural networks with deficient data", in Cowan, J.D., Tesauro, G., and Alspector, J. (eds.) Advances in Neural Information Processing Systems 6, San Mateo, CA: Morgan Kaufman, pp. 128-135.
Tresp, V., Neuneier, R., and Ahmad, S. (1995), "Efficient methods for dealing with missing data in supervised learning", in Tesauro, G., Touretzky, D.S., and Leen, T.K. (eds.) Advances in Neural Information Processing Systems 7, Cambridge, MA: The MIT Press, pp. 689-696.
------------------------------------------------------------------------
The most common violation of the independence assumption occurs when cases are observed in a certain order relating to time or space. That is, case (X_i,Y_i) corresponds to time T_i, with T_1 < T_2 < ... < T_N. It is assumed that the current target Y_i may depend not only on X_i but also on (X_i,Y_i) in the recent past. If the T_i are equally spaced, the simplest way to deal with this dependence is to include additional inputs (called lagged variables, shift registers, or a tapped delay line) in the network. Thus, for target Y_i, the inputs may include X_i, Y_{i-1}, X_{i-1}, Y_{i-1}, X_{i-2}, etc. (In some situations, X_i would not be known at the time you are trying to forecast Y_i and would therefore be excluded from the inputs.) Then you can train an ordinary feedforward network with these targets and lagged variables. The use of lagged variables has been extensively studied in the statistical and econometric literature (Judge, Griffiths, Hill, L\"utkepohl and Lee, 1985). A network in which the only inputs are lagged target values is called an "autoregressive model." The input space that includes all of the lagged variables is called the "embedding space."
If the T_i are not equally spaced, everything gets much more complicated. One approach is to use a smoothing technique to interpolate points at equally spaced intervals, and then use the interpolated values for training instead of the original data.
Use of lagged variables increases the number of decisions that must be made during training, since you must consider which lags to include in the network, as well as which input variables, how many hidden units, etc. Neural network researchers have therefore attempted to use partially recurrent networks instead of feedforward networks with lags (Weigend and Gershenfeld, 1994). Recurrent networks store information about past values in the network itself. There are many different kinds of recurrent architectures (Hertz, Krogh, and Palmer 1991; Mozer, 1994; Horne and Giles, 1995; Kremer, 199?). For example, in time-delay neural networks (Lang, Waibel, and Hinton 1990), the outputs for predicting target Y_{i-1} are used as inputs when processing target Y_i. Jordan networks (Jordan, 1986) are similar to time-delay neural networks except that the feedback is an exponential smooth of the sequence of output values. In Elman networks (Elman, 1990), the hidden unit activations that occur when processing target Y_{i-1} are used as inputs when processing target Y_i.
However, there are some problems that cannot be dealt with via recurrent networks alone. For example, many time series exhibit trend, meaning that the target values tend to go up over time, or that the target values tend to go down over time. For example, stock prices and many other financial variables usually go up. If today's price is higher than all previous prices, and you try to forecast tomorrow's price using today's price as a lagged input, you are extrapolating, and extrapolating is unreliable. The simplest methods for handling trend are:
There are several different ways to compute forecasts. For simplicity, let's assume you have a simple time series, Y_1, ..., Y_99, you want to forecast future values Y_f for f > 99, and you decide to use three lagged values as inputs. The possibilities include:
If a time series is a random walk, a well-trained network will predict Y_i by simply outputting Y_{i-1}. If you make a plot showing both the target values and the outputs, the two curves will almost coincide, except for being offset by one time step. People often mistakenly intrepret such a plot to indicate good forecasting accuracy, whereas in fact the network is virtually useless. In such situations, it is more enlightening to plot multi-step forecasts or N-step-ahead forecasts.
For general information on time-series forecasting, see the following URLs:
Elman, J.L. (1990), "Finding structure in time," Cognitive Science, 14, 179-211.
Hertz, J., Krogh, A., and Palmer, R. (1991). Introduction to the Theory of Neural Computation. Addison-Wesley: Redwood City, California.
Horne, B. G. and Giles, C. L. (1995), "An experimental comparison of recurrent neural networks," In Tesauro, G., Touretzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems 7, pp. 697-704. The MIT Press.
Jordan, M. I. (1986), "Attractor dynamics and parallelism in a connectionist sequential machine," In Proceedings of the Eighth Annual conference of the Cognitive Science Society, pages 531-546. Lawrence Erlbaum.
Judge, G.G., Griffiths, W.E., Hill, R.C., L\"utkepohl, H., and Lee, T.-C. (1985), The Theory and Practice of Econometrics, NY: John Wiley & Sons.
Kremer, S.C. (199?), "Spatio-temporal Connectionist Networks: A Taxonomy and Review," http://hebb.cis.uoguelph.ca/~skremer/Teaching/27642/dynamic2/review.html.
Lang, K. J., Waibel, A. H., and Hinton, G. (1990), "A time-delay neural network architecture for isolated word recognition," Neural Networks, 3, 23-44.
Masters, T. (1993). Practical Neural Network Recipes in C++, San Diego: Academic Press.
Moody, J. (1998), "Forecasting the economy with neural nets: A survey of challenges and solutions," in Orr, G,B., and Mueller, K-R, eds., Neural Networks: Tricks of the Trade, Berlin: Springer.
Mozer, M.C. (1994), "Neural net architectures for temporal sequence processing," in Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction: Forecasting the Future and Understanding the Past, Reading, MA: Addison-Wesley, 243-264, http://www.cs.colorado.edu/~mozer/papers/timeseries.html.
Weigend, A.S. and Gershenfeld, N.A., eds. (1994) Time Series Prediction: Forecasting the Future and Understanding the Past, Reading, MA: Addison-Wesley.
------------------------------------------------------------------------
For example, in robotics (DeMers and Kreutz-Delgado, 1996, 1997), X might describe the positions of the joints in a robot's arm, while Y would describe the location of the robot's hand. There are simple formulas to compute the location of the hand given the positions of the joints, called the "forward kinematics" problem. But there is no simple formula for the "inverse kinematics" problem to compute positions of the joints that yield a given location for the hand. Furthermore, if the arm has several joints, there will usually be many different positions of the joints that yield the same location of the hand, so the forward kinematics function is many-to-one and has no unique inverse. Picking any X such that Y = f(X) is OK if the only aim is to position the hand at Y. However if the aim is to generate a series of points to move the hand through an arc this may be insufficient. In this case the series of Xs need to be in the same "branch" of the function space. Care must be taken to avoid solutions that yield inefficient or impossible movements of the arm.
As another example, consider an industrial process in which X represents settings of control variables imposed by an operator, and Y represents measurements of the product of the industrial process. The function Y = f(X) can be learned by a NN using conventional training methods. But the goal of the analysis may be to find control settings X that yield a product with specified measurements Y, in which case an inverse of f(X) is required. In industrial applications, financial considerations are important, so not just any setting X that yields the desired result Y may be acceptable. Perhaps a function can be specified that gives the cost of X resulting from energy consumption, raw materials, etc., in which case you would want to find the X that minimizes the cost function while satisfying the equation Y = f(X).
The obvious way to try to learn an inverse function is to generate a set of training data from a given forward function, but designate Y as the input and X as the output when training the network. Using a least-squares error function, this approach will fail if f() is many-to-one. The problem is that for an input Y, the net will not learn any single X such that Y = f(X), but will instead learn the arithmetic mean of all the Xs in the training set that satisfy the equation (Bishop, 1995, pp. 207-208). One solution to this difficulty is to construct a network that learns a mixture approximation to the conditional distribution of X given Y (Bishop, 1995, pp. 212-221). However, the mixture method will not work well in general for an X vector that is more than one-dimensional, such as Y = X_1^2 + X_2^2, since the number of mixture components required may increase exponentially with the dimensionality of X. And you are still left with the problem of extracting a single output vector from the mixture distribution, which is nontrivial if the mixture components overlap considerably. Another solution is to use a highly robust error function, such as a redescending M-estimator, that learns a single mode of the conditional distribution instead of learning the mean (Huber, 1981; Rohwer and van der Rest 1996). Additional regularization terms or constraints may be required to persuade the network to choose appropriately among several modes, and there may be severe problems with local optima.
Another approach is to train a network to learn the forward mapping f() and then numerically invert the function. Finding X such that Y = f(X) is simply a matter of solving a nonlinear system of equations, for which many algorithms can be found in the numerical analysis literature (Dennis and Schnabel 1983). One way to solve nonlinear equations is turn the problem into an optimization problem by minimizing sum(Y_i-f(X_i))^2. This method fits in nicely with the usual gradient-descent methods for training NNs (Kindermann and Linden 1990). Since the nonlinear equations will generally have multiple solutions, there may be severe problems with local optima, especially if some solutions are considered more desirable than others. You can deal with multiple solutions by inventing some objective function that measures the goodness of different solutions, and optimizing this objective function under the nonlinear constraint Y = f(X) using any of numerous algorithms for nonlinear programming (NLP; see Bertsekas, 1995, and other references under "What are conjugate gradients, Levenberg-Marquardt, etc.?") The power and flexibility of the nonlinear programming approach are offset by possibly high computational demands.
If the forward mapping f() is obtained by training a network, there will generally be some error in the network's outputs. The magnitude of this error can be difficult to estimate. The process of inverting a network can propagate this error, so the results should be checked carefully for validity and numerical stability. Some training methods can produce not just a point output but also a prediction interval (Bishop, 1995; White, 1992). You can take advantage of prediction intervals when inverting a network by using NLP methods. For example, you could try to find an X that minimizes the width of the prediction interval under the constraint that the equation Y = f(X) is satisfied. Or instead of requiring Y = f(X) be satisfied exactly, you could try to find an X such that the prediction interval is contained within some specified interval while minimizing some cost function.
For more mathematics concerning the inverse-function problem, as well as some interesting methods involving self-organizing maps, see DeMers and Kreutz-Delgado (1996, 1997). For NNs that are relatively easy to invert, see the Adaptive Logic Networks described in the software sections of the FAQ.
References:
Bertsekas, D. P. (1995), Nonlinear Programming, Belmont, MA: Athena Scientific.
Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press.
DeMers, D., and Kreutz-Delgado, K. (1996), "Canonical Parameterization of Excess motor degrees of freedom with self organizing maps", IEEE Trans Neural Networks, 7, 43-55.
DeMers, D., and Kreutz-Delgado, K. (1997), "Inverse kinematics of dextrous manipulators," in Omidvar, O., and van der Smagt, P., (eds.) Neural Systems for Robotics, San Diego: Academic Press, pp. 75-116.
Dennis, J.E. and Schnabel, R.B. (1983) Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-Hall
Huber, P.J. (1981), Robust Statistics, NY: Wiley.
Kindermann, J., and Linden, A. (1990), "Inversion of Neural Networks by Gradient Descent," Parallel Computing, 14, 277-286, ftp://icsi.Berkeley.EDU/pub/ai/linden/KindermannLinden.IEEE92.ps.Z
Rohwer, R., and van der Rest, J.C. (1996), "Minimum description length, regularization, and multimodal data," Neural Computation, 8, 595-609.
White, H. (1992), "Nonparametric Estimation of Conditional Quantiles Using Neural Networks," in Page, C. and Le Page, R. (eds.), Proceedings of the 23rd Sympsium on the Interface: Computing Science and Statistics, Alexandria, VA: American Statistical Association, pp. 190-199.
------------------------------------------------------------------------
Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford: Oxford University Press, section 8.7.
Masters, T. (1994), Signal and Image Processing with Neural Networks: A C++ Sourcebook, NY: Wiley.
Soucek, B., and The IRIS Group (1992), Fast Learning and Invariant Object Recognition, NY: Wiley.
Squire, D. (1997), Model-Based Neural Networks for Invariant Pattern Recognition, http://cuiwww.unige.ch/~squire/publications.html
Laurenz Wiskott, bibliography on "Unsupervised Learning of Invariances in Neural Systems" http://www.cnl.salk.edu/~wiskott/Bibliographies/LearningInvariances.html
------------------------------------------------------------------------
Other references:
Hastie, T., and Simard, P.Y. (1998), "Metrics and models for handwritten character recognition," Statistical Science, 13, 54-65.
Jackel, L.D. et al., (1994) "Comparison of Classifier Methods: A Case Study in Handwritten Digit Recognition", 1994 International Conference on Pattern Recognition, Jerusalem
LeCun, Y., Jackel, L.D., Bottou, L., Brunot, A., Cortes, C., Denker, J.S., Drucker, H., Guyon, I., Muller, U.A., Sackinger, E., Simard, P., and Vapnik, V. (1995), "Comparison of learning algorithms for handwritten digit recognition," in F. Fogelman and P. Gallinari, eds., International Conference on Artificial Neural Networks, pages 53-60, Paris.
Orr, G.B., and Mueller, K.-R., eds. (1998), Neural Networks: Tricks of the Trade, Berlin: Springer, ISBN 3-540-65311-2.
------------------------------------------------------------------------
A GA is an optimization program that starts with a population of encoded procedures, (Creation of Life :-> ) mutates them stochastically, (Get cancer or so :-> ) and uses a selection process (Darwinism) to prefer the mutants with high fitness and perhaps a recombination process (Make babies :-> ) to combine properties of (preferably) the succesful mutants.Genetic algorithms are just a special case of the more general idea of "evolutionary computation". There is a newsgroup that is dedicated to the field of evolutionary computation called comp.ai.genetic. It has a detailed FAQ posting which, for instance, explains the terms "Genetic Algorithm", "Evolutionary Programming", "Evolution Strategy", "Classifier System", and "Genetic Programming". That FAQ also contains lots of pointers to relevant literature, software, other sources of information, et cetera et cetera. Please see the comp.ai.genetic FAQ for further information.
URLs on genetic algorithms and NNs:
For general information on GAs, try the links at http://www.shef.ac.uk/~gaipp/galinks.html and http://www.cs.unibo.it/~gaioni
------------------------------------------------------------------------
Fuzzy logic is used where a system is difficult to model exactly (but an inexact model is available), is controlled by a human operator or expert, or where ambiguity or vagueness is common. A typical fuzzy system consists of a rule base, membership functions, and an inference procedure.
Most fuzzy logic discussion takes place in the newsgroup comp.ai.fuzzy (where there is a fuzzy logic FAQ) but there is also some work (and discussion) about combining fuzzy logic with neural network approaches in comp.ai.neural-nets.
Early work combining neural nets and fuzzy methods used competitive networks to generate rules for fuzzy systems (Kosko 1992). This approach is sort of a crude version of bidirectional counterpropagation (Hecht-Nielsen 1990) and suffers from the same deficiencies. More recent work (Brown and Harris 1994; Kosko 1997) has been based on the realization that a fuzzy system is a nonlinear mapping from an input space to an output space that can be parameterized in various ways and therefore can be adapted to data using the usual neural training methods (see "What is backprop?") or conventional numerical optimization algorithms (see "What are conjugate gradients, Levenberg-Marquardt, etc.?").
A neural net can incorporate fuzziness in various ways:
Bezdek, J.C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press.
Bezdek, J.C. & Pal, S.K., eds. (1992), Fuzzy Models for Pattern Recognition, New York: IEEE Press.
Brown, M., and Harris, C. (1994), Neurofuzzy Adaptive Modelling and Control, NY: Prentice Hall.
Carpenter, G.A. and Grossberg, S. (1996), "Learning, Categorization, Rule Formation, and Prediction by Fuzzy Neural Networks," in Chen, C.H. (1996), pp. 1.3-1.45.
Chen, C.H., ed. (1996) Fuzzy Logic and Neural Network Handbook, NY: McGraw-Hill, ISBN 0-07-011189-8.
Dierckx, P. (1995), Curve and Surface Fitting with Splines, Oxford: Clarendon Press.
Hecht-Nielsen, R. (1990), Neurocomputing, Reading, MA: Addison-Wesley.
Klir, G.J. and Folger, T.A.(1988), Fuzzy Sets, Uncertainty, and Information, Englewood Cliffs, N.J.: Prentice-Hall.
Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs, N.J.: Prentice-Hall.
Kosko, B. (1997), Fuzzy Engineering, NY: Prentice Hall.
Lampinen, J and Selonen, A. (1996), "Using Background Knowledge for Regularization of Multilayer Perceptron Learning", Submitted to International Conference on Artificial Neural Networks, ICANN'96, Bochum, Germany.
Lippe, W.-M., Feuring, Th. and Mischke, L. (1995), "Supervised learning in fuzzy neural networks," Institutsbericht Angewandte Mathematik und Informatik, WWU Muenster, I-12, http://wwwmath.uni-muenster.de/~feuring/WWW_literatur/bericht12_95.ps.gz
Nauck, D., Klawonn, F., and Kruse, R. (1997), Foundations of Neuro-Fuzzy Systems, Chichester: Wiley, ISBN 0-471-97151-0.
van Rijckevorsal, J.L.A. (1988), "Fuzzy coding and B-splines," in van Rijckevorsal, J.L.A., and de Leeuw, J., eds., Component and Correspondence Analysis, Chichester: John Wiley & Sons, pp. 33-54.
------------------------------------------------------------------------
------------------------------------------------------------------------
Links to neurosci, psychology, linguistics lists are also provided.
ftp://una.hh.lib.umich.edu/inetdirsstacks/neurosci:cormbonario, gopher://una.hh.lib.umich.edu/00/inetdirsstacks/neurosci:cormbonario, http://http2.sils.umich.edu/Public/nirg/nirg1.html.
------------------------------------------------------------------------That's all folks (End of the Neural Network FAQ).
Acknowledgements: Thanks to all the people who helped to get the stuff above into the posting. I cannot name them all, because I would make far too many errors then. :-> No? Not good? You want individual credit? OK, OK. I'll try to name them all. But: no guarantee.... THANKS FOR HELP TO: (in alphabetical order of email adresses, I hope)
The FAQ was created in June/July 1991 by Lutz Prechelt; he also maintained the FAQ until November 1995. Warren Sarle maintains the FAQ since December 1995.
Bye Warren & LutzPrevious part is part 6.
Neural network FAQ / Warren S. Sarle, saswss@unx.sas.com