The Basics of Neural Network (NN) Technology

First of all, when we are talking about a neural network, we should more properly say "artificial neural network" (ANN), because that is what we mean most of the time in computer science. Biological neural networks are much more complicated than the mathematical models we use for ANNs. But it is customary to be lazy and drop the "A" or the "artificial".

There is no universally accepted definition of an NN. But perhaps most people in the field would agree that an NN is a network of many simple processors ("units"), each possibly having a small amount of local memory. The units are connected by communication channels ("connections") which usually carry numeric (as opposed to symbolic) data, encoded by any of various means. The units operate only on their local data and on the inputs they receive via the connections. The restriction to local operations is often relaxed during training.

Some NNs are models of biological neural networks and some are not, but historically, much of the inspiration for the field of NNs came from the desire to produce artificial systems capable of sophisticated, perhaps "intelligent", computations similar to those that the human brain routinely performs, and thereby possibly to enhance our understanding of the human brain.

Most NNs have some sort of "training" rule whereby the weights of connections are adjusted on the basis of data. In other words, NNs "learn" from examples (as children learn to recognize dogs from examples of dogs) and exhibit some capability for generalization beyond the training data.

NNs normally have great potential for parallelism, since the computations of the components are largely independent of each other. Some people regard massive parallelism and high connectivity to be defining characteristics of NNs, but such requirements rule out various simple models, such as simple linear regression (a minimal feedforward net with only two units plus bias), which are usefully regarded as special cases of NNs.

2. Scope of Definition - Formal Description

According to the DARPA Neural Network Study (1988, AFCEA International Press, p. 60):

... a neural network is a system composed of many simple processing elements operating in parallel whose function is determined by network structure, connection strengths, and the processing performed at computing elements or nodes.

According to Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, NY: Macmillan, p. 2:

A neural network is a massively parallel distributed processor that has a natural propensity for storing experiential knowledge and making it available for use. It resembles the brain in two respects:

1. Knowledge is acquired by the network through a learning process.
2. Interneuron connection strengths known as synaptic weights are used to store the knowledge.

According to Nigrin, A. (1993), Neural Networks for Pattern Recognition, Cambridge, MA: The MIT Press, p. 11:

A neural network is a circuit composed of a very large number of simple processing elements that are neurally based. Each element operates only on local information. Furthermore each element operates asynchronously; thus there is no overall system clock.

According to Zurada, J.M. (1992), Introduction To Artificial Neural Systems, Boston: PWS Publishing Company, p. xv:

Artificial neural systems, or neural networks, are physical cellular systems which can acquire, store, and utilize experiential knowledge.

3. Neural Network Basics

3.1 Neural Network Model

Based on pattern recognition - used for credit assessment and fraud detection
A set of interconnected relatively simple mathematical processing elements
Looks for patterns in a set of examples and learns from those examples by adjusting the weights of the connections to produce output patterns.
Input to output pattern associations are used to classify a new set of examples
Able to recognize patterns even when the data is noisy, ambiguous, distorted, or has a lot of variation
Architecture determines how the processing elements are connected. The most commonly used architecture is feed-forward.
Layers (also called levels, fields or slabs) are organized into a series of layers, input layer, one or more hidden layers and output layer. Some consider the number of layers to be part of architecture while others consider the number of layers and nodes per layer to be attributes of the network rather than part of the architecture.
Neurons - the processing elements

3.2 Structure of a Neuron

Neurons consist of a set of weighted input connections which include a bias input, a state function, a nonlinear transfer function, and output. Input connections have an input value that is either received from the previous neuron or in the case of the input layer from the outside. Bias is not connected to the other neurons in the network and is assumed to have an input value of 1 for the summation function. Weights, a real number representing the strength or importance of an input connection to a neuron. Each neuron input, including the bias, has an associated weight.
State function - The most common form is a simple summation function. The output of the state function becomes the input for the transfer function . Transfer function - A nonlinear mathematical function used to convert data to a specific scale. There are two basic types of transfer functions: continuous and discrete. Commonly used continuous functions used are Ramp, Sigmoid, Arc Tangent and Hyperbolic Tangent. Continuous functions are sometimes called squashing functions. Commonly used discrete functions are Step and Threshold. Discrete transfer functions are sometimes called activation functions.
Training - The process of using examples to develop a neural network that associates the input pattern with the correct answer. A set of examples (training set) with known outputs (targets) is repeatedly fed into the network to "train" the network. This training process continues until the difference between the input and output patterns for the training set reaches an acceptable value. Several algorithms used for training networks most common is back-propagation. Back-propagation is done is two passes
First the inputs are sent forward through the network to produce an output. Then the difference between the actual and desired outputs produces error signals that are sent "backwards" through the network to modify the weights of the inputs.

4. What can you do with an NN and what not?

In principle, NNs can compute any computable function, i.e. they can do everything a normal digital computer can do.

In practice, NNs are especially useful for classification and function approximation/mapping problems which are tolerant of some imprecision, which have lots of training data available, but to which hard and fast rules (such as those that might be used in an expert system) cannot easily be applied. Almost any mapping between vector spaces can be approximated to arbitrary precision by feedforward NNs (which are the type most often used in practical applications) if you have enough data and enough computing resources.

NNs are, at least today, difficult to apply successfully to problems that concern manipulation of symbols and memory. And there are no methods for training NNs that can magically create information that is not contained in the training data.

5. Who is concerned with NNs?

Neural Networks are interesting for quite a lot of very different people:

Computer scientists want to find out about the properties of non-symbolic information processing with neural nets and about learning systems in general.
Statisticians use neural nets as flexible, nonlinear regression and classification models.
Engineers of many kinds exploit the capabilities of neural networks in many areas, such as signal processing and automatic control.
Cognitive scientists view neural networks as a possible apparatus to describe models of thinking and consciousness (High-level brain function).
Neuro-physiologists use neural networks to describe and explore medium-level brain function (e.g. memory, sensory system, motorics).
Physicists use neural networks to model phenomena in statistical mechanics and for a lot of other tasks.
Biologists use Neural Networks to interpret nucleotide sequences.
Philosophers and some other people may also be interested in Neural Networks for various reasons.

6. How are layers counted?

This is a matter of considerable dispute.

Some people count layers of units. But of these people, some count the input layer and some don't.
Some people count layers of weights. How skip-layer connections are counted is unclear.

To avoid ambiguity, you should speak of a 2-hidden-layer network, not a 4-layer network (as some would call it) or 3-layer network (as others would call it). And if the connections follow any pattern other than fully connecting each layer to the next and to no others, you should carefully specify the connections.

7. How are NNs related to statistical methods?

There is considerable overlap between the fields of neural networks and statistics. Statistics is concerned with data analysis. In neural network terminology, statistical inference means learning to generalize from noisy data. Some neural networks are not concerned with data analysis (e.g., those intended to model biological systems) and therefore have little to do with statistics. Some neural networks do not learn (e.g., Hopfield nets) and therefore have little to do with statistics. Some neural networks can learn successfully only from noise-free data (e.g., ART or the perceptron rule) and therefore would not be considered statistical methods. But most neural networks that can learn to generalize effectively from noisy data are similar or identical to statistical methods. For example:

Feedforward nets with no hidden layer (including functional-link neural nets and higher-order neural nets) are basically generalized linear models.
Feedforward nets with one hidden layer are closely related to projection pursuit regression.
Probabilistic neural nets are identical to kernel discriminant analysis.
Kohonen nets for adaptive vector quantization are very similar to k-means cluster analysis.
Hebbian learning is closely related to principal component analysis.

Some neural network areas that appear to have no close relatives in the existing statistical literature are:

Kohonen's self-organizing maps.
Reinforcement learning ((although this is treated in the operations research literature on Markov decision processes).
Stopped training (the purpose and effect of stopped training are similar to shrinkage estimation, but the method is quite different).

Feedforward nets are a subset of the class of nonlinear regression and discrimination models. Statisticians have studied the properties of this general class but had not considered the specific case of feedforward neural nets before such networks were popularized in the neural network field. Still, many results from the statistical theory of nonlinear models apply directly to feedforward nets, and the methods that are commonly used for fitting nonlinear models, such as various Levenberg-Marquardt and conjugate gradient algorithms, can be used to train feedforward nets.

While neural nets are often defined in terms of their algorithms or implementations, statistical methods are usually defined in terms of their results. The arithmetic mean, for example, can be computed by a (very simple) backprop net, by applying the usual formula SUM(x_i)/n, or by various other methods. What you get is still an arithmetic mean regardless of how you compute it. So a statistician would consider standard backprop, Quickprop, and Levenberg-Marquardt as different algorithms for implementing the same statistical model such as a feedforward net. On the other hand, different training criteria, such as least squares and cross entropy, are viewed by statisticians as fundamentally different estimation methods with different statistical properties.

It is sometimes claimed that neural networks, unlike statistical models, require no distributional assumptions. In fact, neural networks involve exactly the same sort of distributional assumptions as statistical models, but statisticians study the consequences and importance of these assumptions while most neural networkers ignore them. For example, least-squares training methods are widely used by statisticians and neural networkers. Statisticians realize that least-squares training involves implicit distributional assumptions in that least-squares estimates have certain optimality properties for noise that is normally distributed with equal variance for all training cases and that is independent between different cases. These optimality properties are consequences of the fact that least-squares estimation is maximum likelihood under those conditions. Similarly, cross-entropy is maximum likelihood for noise with a Bernoulli distribution. If you study the distributional assumptions, then you can recognize and deal with violations of the assumptions. For example, if you have normally distributed noise but some training cases have greater noise variance than others, then you may be able to use weighted least squares instead of ordinary least squares to obtain more efficient estimates.

Hundreds, perhaps thousands of people have run comparisons of neural nets with "traditional statistics" (whatever that means). Most such studies involve one or two data sets, and are of little use to anyone else unless they happen to be analyzing the same kind of data.

8. Available Software

8.1 Freeware and shareware packages for NN simulation

GENESIS

GENESIS 2.0 (GEneral NEural SImulation System) is a general purpose simulation platform which was developed to support the simulation of neural systems ranging from complex models of single neurons to simulations of large networks made up of more abstract neuronal components. Most current GENESIS applications involve realistic simulations of biological neural systems. Although the software can also model more abstract networks, other simulators are more suitable for backpropagation and similar connectionist modeling. Runs on most Unix platforms. Graphical front end XODUS. Parallel version for networks of workstations, symmetric multiprocessors, and MPPs also available.

DartNet

DartNet is a Macintosh-based backpropagation simulator, developed at Dartmouth by Jamshed Bharucha and Sean Nolan as a pedagogical tool. It makes use of the Mac's graphical interface, and provides a number of tools for building, editing, training, testing and examining networks.

PDP++

The PDP++ software is a new neural-network simulation system written in C++. It represents the next generation of the PDP software released with the McClelland and Rumelhart "Explorations in Parallel Distributed Processing Handbook", MIT Press, 1987. It is easy enough for novice users, but very powerful and flexible for research use. The current version is 1.0, works on Unix with X-Windows.

Features: Full GUI (InterViews), realtime network viewer, data viewer, extendable object-oriented design, CSS scripting language with source-level debugger, GUI macro recording. Algorithms: Feedforward and several recurrent BP, Boltzmann machine, Hopfield, Mean-field, Interactive activation and competition, continuous stochastic networks.

WinNN

WinNN is a shareware Neural Networks (NN) package for windows 3.1.

WinNN incorporates a very user friendly interface with a powerful computational engine. WinNN is intended to be used as a tool for beginners and more advanced neural networks users, it provides an alternative to using more expensive and hard to use packages. WinNN can implement feed forward multi-layered NN and uses a modified fast back-propagation for training. Extensive on line help. Has various neuron functions. Allows on the fly testing of the network performance and generalization. All training parameters can be easily modified while WinNN is training. Results can be saved on disk or copied to the clipboard. Supports plotting of the outputs and weight distribution.

8.2 Commercial software packages for NN simulation

SAS Neural Network Application

Operating systems: Windows 3.1, OS/2, HP/UX, Solaris, AIX

The SAS Neural Network Application trains a variety of neural nets and includes a graphical user interface, on-site training and customization. Features include multilayer perceptrons, radial basis functions, statistical versions of counterpropagation and learning vector quantization, a variety of built-in activation and error functions, multiple hidden layers, direct input-output connections, missing value handling, categorical variables, standardization of inputs and targets, and multiple preliminary optimizations from random initial values to avoid local minima. Training is done by state-of-the-art numerical optimization algorithms instead of tedious backprop.

NeuralWorks

Basic capabilities:

supports over 30 different nets: backprop, art-1,kohonen, modular neural network, General regression, Fuzzy art-map, probabilistic nets, self-organizing map, lvq, boltmann, bsb, spr, etc...
Extendable with optional package.
ExplainNet, Flashcode (compiles net in .c code for runtime), user-defined io in c possible. ExplainNet (to eliminate extra inputs), pruning, savebest,graph.instruments like correlation, hinton diagrams, rms error graphs etc.. Operating system : PC,Sun,IBM RS6000,Apple Macintosh,SGI,Dec,HP.
System requirements: varies. PC:2MB extended memory+6MB Harddisk space.
Uses windows compatible memory driver (extended).
Uses extended memory

NeuroShell2/NeuroWindows

NeuroShell 2 combines powerful neural network architectures, a Windows icon driven user interface, and sophisticated utilities for MS-Windows machines. Internal format is spreadsheet, and users can specify that NeuroShell 2 use their own spreadsheet when editing. Includes both Beginner's and Advanced systems, a Runtime capability, and a choice of 15 Backpropagation, Kohonen, PNN and GRNN architectures. Includes Rules, Symbol Translate, Graphics, File Import/Export modules (including MetaStock from Equis International) and NET-PERFECT to prevent overtraining. Options available: Market Technical Indicator Option ($295), Market Technical Indicator Option with Optimizer ($590), and Race Handicapping Option ($149). NeuroShell price: $495. NeuroWindows is a programmer's tool in a Dynamic Link Library (DLL) that can create as many as 128 interactive nets in an application, each with 32 slabs in a single network, and 32K neurons in a slab. Includes Backpropagation, Kohonen, PNN, and GRNN paradigms. NeuroWindows can mix supervised and unsupervised nets. The DLL may be called from Visual Basic, Visual C, Access Basic, C, Pascal, and VBA/Excel 5. NeuroWindows price: $369. GeneHunter is a genetic algorithm with a Dynamic Link Library of genetic algorithm functions that may be called from programming languages such as Visual Basic or C. For non-programmers, GeneHunter also includes an Excel Add-in program which allows the user to solve an optimization problem from an Excel spreadsheet.

Qnet For Windows Version 2.0

Vesta Services announces Qnet for Windows Version 2.0. Qnet is an advanced neural network modeling system that is ideal for developing and implementing neural network solutions under Windows. Qnet Version 2 is a powerful, 32-bit, neural network development system for Windows NT, Windows 95 and Windows 3.1/Win32s. In addition its development features, Qnet automates access and use of Qnet neural networks under Windows.

Qnet neural networks have been successfully deployed to provide solutions in finance, investing, marketing, science, engineering, medicine, manufacturing, visual recognition... Qnet's 32-bit architecture and high-speed training engine tackle problems of large scope and size. Qnet also makes accessing this advanced technology easy. Qnet's neural network setup dialogs guide users through the design process. Simple copy/paste procedures can be used to transfer training data from other applications directly to Qnet. Complete, interactive analysis is available during training. Graphs monitor all key training information. Statistical checks measure model quality. Automated testing is available for training optimization. To implement trained neural networks, Qnet offers a variety of choices. Qnet's built-in recall mode can process new cases through trained neural networks. Qnet also includes a utility to automate access and retrieval of solutions from other Windows applications. All popular Windows spreadsheet and database applications can be setup to retrieve Qnet solutions with the click of a button. Application developers are provided with DLL access to Qnet neural networks and for complete portability. ANSI C libraries are included to allow access from virtually any platform.

Qnet for Windows is being offered at an introductory price of $199. It is available immediately and may be purchased directly from Vesta Services. Vesta Services may be reached at (voice) (708) 446-1655; (FAX) (708) 446-1674; (e-mail) VestaServ@aol.com; (mail) 1001 Green Bay Rd, #196, Winnetka, IL 60093

9. References

Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994) "A study of the classification capabilities of neural networks using unsupervised learning: A comparison with k-means clustering", Psychometrika, 59, 509-525.
Bishop, C.M. (1995), Neural Networks for Pattern Recognition, Oxford:
Oxford University Press.
Chatfield, C. (1993), "Neural networks: Forecasting breakthrough or passing fad", International Journal of Forecasting, 9, 1-3.
Cheng, B. and Titterington, D.M. (1994), "Neural Networks: A Review from a Statistical Perspective", Statistical Science, 9, 2-54.
Cherkassky, V., Friedman, J.H., and Wechsler, H., eds. (1994), From Statistics to Neural Networks: Theory and Pattern Recognition Applications, Berlin: Springer-Verlag.
Geman, S., Bienenstock, E. and Doursat, R. (1992), "Neural Networks and the Bias/Variance Dilemma", Neural Computation, 4, 1-58.
Kuan, C.-M. and White, H. (1994), "Artificial Neural Networks: An Econometric Perspective", Econometric Reviews, 13, 1-91.
Kushner, H. & Clark, D. (1978), Stochastic Approximation Methods for Constrained and Unconstrained Systems, Springer-Verlag.
Michie, D., Spiegelhalter, D.J. and Taylor, C.C. (1994), Machine Learning, Neural and Statistical Classification, Ellis Horwood.
Ripley, B.D. (1993), "Statistical Aspects of Neural Networks", in O.E. Barndorff-Nielsen, J.L. Jensen and W.S. Kendall, eds., Networks and Chaos: Statistical and Probabilistic Aspects, Chapman & Hall. ISBN 0 412 46530 2.
Ripley, B.D. (1994), "Neural Networks and Related Methods for Classification," Journal of the Royal Statistical Society, Series B, 56, 409-456.
Ripley, B.D. (1996) Pattern Recognition and Neural Networks, Cambridge: Cambridge University Press.
Sarle, W.S. (1994), "Neural Networks and Statistical Models," Proceedings of the Nineteenth Annual SAS Users Group International Conference, Cary, NC: SAS Institute, pp 1538-1550. ( ftp://ftp.sas.com/pub/neural/neural1.ps)
White, H. (1989), "Learning in Artificial Neural Networks: A Statistical Perspective," Neural Computation, 1, 425-464.
White, H. (1989), "Some Asymptotic Results for Learning in Single Hidden Layer Feedforward Network Models", J. of the American Statistical Assoc., 84, 1008-1013.
White, H. (1992), Artificial Neural Networks: Approximation and Learning Theory, Blackwell.

The Basics of Neural Network (NN) Technology

Contents