README for Backprop 0.9.4
-------------------------

Thank you for downloading Backprop! I hope that you find it to be a useful program.
This README file describes what Backprop is, and how to use it. It also includes a
summary of changes since the last version, information on what I plan to do in the
future, and how to contact me to request features or report bugs.

Updates
-------

The latest update of Backprop will always be available at the following URL:

http://www.users.cts.com/crash/s/slogan/backprop.html

Distribution Rights
-------------------

Backprop is Copyright 2003 Syd Logan. You have permission to redistribute at will, for 
any legal purpose, provided that you include this README, unmodified, along with this 
software.

Bug Reports and Suggestions
---------------------------

Please send bug reports to me, Syd Logan, at slogan@cts.com. Your input will help to 
make Backprop a better, more stable program.

When reporting a bug, please include the following information:

-- Version of Backprop. The version is shown in the dialog that is displayed when you
select "About Backprop..." in the "Help" menu.

-- Operating system (e.g., Windows 95, 98, NT, 2000, XP).

-- Steps to reproduce the problem. Attach a copy of the data files you are using (the
the architecture file and training exemplars for problems encountered when training, 
or the weight file and execution data for runtime problems) so that I can duplicate the 
problem.

Thanks. If I can't understand the problem, I probably can't help, so the more information,
the better.

What Is Backprop?
-----------------

Backprop is a multi-layer neural network simulator that is based upon the popular 
backpropagation learning algorithm. The goal of this simulator is to provide users 
with a friendly and easy to use environment for experimenting with backpropagation
networks. To achieve this, I put a lot of effort into making the user interface 
give as much visual feedback as possible, especially during network training, as 
well as giving the user easy to use interfaces for changing the attributes of the 
network, such as learning rates, momentum, and so forth. You can zoom in on the 
network graphically to see weight values in more detail, or zoom out in order to 
make visible larger, more complicated network architectures. You can speed up, or 
slow down, the rate at which error graphics and network state are updated during 
training. It is features like this that I hope will make Backprop your first choice
for experimenting with backpropagation neural networks.

More details on the use of backpropagation are provided later in this document (see
"How to Use Backprop", below).

Backprop is written entirely in C++, and uses the Microsoft Foundation Classes (MFC) 
for its user interface. It should run without any problems on any Windows platform
starting with Windows 95.
 
How to Use Backprop
------------------- 

If you are new to neural networks, or to backpropagation in particular, you should spend 
some time reading about it before using backprop. There are numerous books, journals, and
web sites that contain information about backpropagation neural networks, and their uses.
The following should be enough to get you started, however.

A neural network is a program that can be trained to perform a task, usually pattern 
recognition, classification, or function approximation. For example, you might train
a neural network to classify an input as belonging to a certain class, or to recognize
a series of pen strokes read on an input device as a letter of the alphabet. In order to 
train the neural network to perform its intended task, you must do the following:

-- Come up with a neural network architecture. A neural network consists of a set of 
layers, each containing a number of nodes. The number of layers in backpropagation
nets is usually 3 or larger. The first layer is called the input layer, and it has one
node for each input. The last layer is called the output layer, and it has one node for
each output. The remaining layers are called hidden layers, and the number of nodes in
these layers is harder to specify. 

As an example, consider a neural network that is designed to classify patterns based on
the following input data:

Has Fins     Has Gills    Is a Fish
-----------------------------------
Yes          No           No
No           No           No
Yes          Yes          Yes

The first row of the table represents the fact that an animal that has fins, but not gills,
is not a fish. In converting this data to use with a neural network, we can simply replace
Yes with the value 1, no with the value 0 (or perhaps -1), and come up with the following:

Has Fins     Has Gills    Is a Fish
-----------------------------------
1            0            0
0            0            0
1            1            1

Some of you may know that solving this particular problem with a backpropagation network is
overkill, as it can be solved with simpler paradigms, such as the perceptron. However, it is
an easy to understand problem, and for those of you who are new to neural nets, simple is
better at this point. Backpropagation is usually used to solve much harder problems, so don't 
let the simple nature of this example lead you to think backpropagation is only useful for 
solving toy problems. That is most certainly not the case. 

A neural network with two input nodes, one corresponding to Has Fins and one corresponding
to Has Gills, and one output node that corresponds to Is a Fish, can be used to solve the
above problem. Setting the input layer node one to 1 and node two to 0, in a properly trained
network, will result in the output node firing 0. The output node should also, in a properly 
trained network, fire 0 if nodes one and two in the input layer are set to the value 0. The 
number of hidden layers, and the number of nodes in each of the hidden layers, is more difficult 
to specify. Many claim that coming up with the hidden layer architecture is more of an "art" 
than a "science". I won't argue that. One of the nice things about backprop is you can easily 
add or remove hidden layers, or change the number of nodes in the hidden layers, and see the 
effects it has on training.

-- Once you have an architecture for the neural network in hand, you need to train the 
neural network. This is done by presenting the neural network with examples that it can
use to learn the problem you want it to solve. These examples, also known as exemplars,
are repeatedly shown to the network until the network learns them, or some maximum number
of tries has been performed. It can, and often does, take tens of thousands of presentations
of a set of exemplars before a network becomes trained. How long it takes is a function of
the network architecture, the initial state of the network, nuances of the training 
algorithm, and the set of exemplars. Changing one or more of these is all it sometimes takes
for a network that won't train to turn into a network that will. One of the design goals of
backprop is to give you the tools you need to visualize how changes in the exemplar set, 
training algorithm, or architecture affect the ability of the neural network to train
successfully.

Once the network is trained, you can then use it to solve problems. This is done by presenting
data to the input layer nodes, and observing the values that result in the output layer.

Launching Backprop
------------------

To launch backprop, simply double click on the backprop icon. 

Loading a Network Architecture File
-----------------------------------

The first thing that you must do after launching backprop is to load an architecture file that 
describes the architecture of the network. The architecture file is a file that you create
in a text editor (like notepad). The architecture file is written using XML. Here is a simple
example of an architecture file that describes a network suitable for solving the "has fins,
has gills, is fish" problem above.

<network name="Fish Network>
  <layer size="2"/>
  <layer size="3"/>
  <layer size="1"/>
</network>

The architecture file consists of two tags, the <network> tag, and the <layer> tag. The <network>
tag defines the overall network architecture, which consists of layers. In this case, the network
has three layers. The first layer has a size attribute of 2, the second layer has a size attribute
of 3, and the third layer has a size attribute of 1. The size attribute defines how many neurons
are in the layer, therefore, this network has 2 neurons or nodes in the first layer, 3 in the
second layer, and 1 in the third layer. Also, the first layer is always the input layer, the last
layer is always the output layer, and the layers between are hidden layers. Thus, we have a 
network that contains 3 layers, accepts 2 inputs, fires a single output, and has a hidden layer
that contains 3 nodes. The first node in the input layer will accept as input the "has fins"
attribute, the second node in the input layer will accept the "has gills" attribute, and the 
output of the single neuron in the output layer will fire a value which represents the "is fish"
attribute. The goal of the network training, described below, will be to train the network so
that it fires the correct output response ("is fish") when presented different values for "has
fins" and "has gills" at the input layer neurons.

More details on the XML format for use in Backprop to describe network architectures is provided 
later in this document (see "Network Architecture Language", below).

By convention, architecture files are stored on disk in files with a ".net" suffix, for example,
"mynet.net" is a backprop architecture file. 

To load an architecture file, select Open... from the File menu. All the files in the current
directory with a suffix of ".net" will be displayed. Click on the file and hit OK. Backprop
will load the architecture file and display a graphical representation of the network. Note that
lines connect each node in the input layer to the nodes in the first hidden layer, each node in
the first hidden layer to the second hidden layer, and so forth. 

Training the Network
--------------------

The next step is to train the neural network to solve a problem. This is done by selecting 
"Train..." from the Network menu. A dialog will display, asking you to specify
an exemplar file. Type in the path of the exemplar file, or click the "Browse" button to 
navigate the file system in search of one. By convention, exemplar files are given the same name 
as the architecture file, but have a ".exm" suffix. For example, "mynet.exm" would be the exemplar 
file for the network defined in the architecture file named "mynet.net".

The exemplar file contains a count of the number of exemplars stored in the file, followed
by the exemplars themselves. An exemplar consists of two lines, one containing a value for
each of the input nodes, and the other containing a value for each of the output layer nodes.
An exemplar file corresponding to the "Has Fins, Has Gills, Is a Fish" problem described above 
might look like this:

3

1            0            
0

0            0            
0

1            1            
1

You also must specify a weight file, which by convention is the name of the architecture file 
with a ".wgt" suffix, for example, "mynet.wgt". The weights file represents the state of a
learned neural network. As I mentioned above, each node in the input layer is connected to each
node in the first hidden layer, and so forth. Each of these connections has a corresponding
weight associated. Training a neural network amounts to changing these weights in such a 
way that the network correctly learns the problem it is being trained for. If the neural
network successfully trains, the weights will be saved to this file when training completes.

Once you have specified the exemplar and weight files, click OK to start training. A dialog
will display that shows the number of iterations executed, and a graph of the cummulative
error. A properly training network will, over time, exhibit a decrease in the cummulative
error, but at times the error may even increase as the network searches for a solution. 

The architecture window will display the firing values of each node in the network during its
training, using a 255-level grayscale colormap. The range of values mapped to this grayscale 
colormap is [0.0, 1.0] by default, with 0.0 displaying as black, 1.0 displaying as white, and 
0.5 displaying as a middle gray. Values above the range display as green, and values below the
range display as red. You can change the range by selecting Options... from the View menu, and
changing the values in the Neuron Output Range text fields. For example, to set the lower value
to -1, type -1 in the "Low" text field, and click OK. You can change this or any other data in 
View->Options... dialog during a training session, and it will take effect immediately upon
clicking OK. You can also display a color for each weight in the network by selecting the
"View Node Outputs and Weights" radio button. The colormap corresponding to this display will
be shown at the top of the architecture window. You can widen or narrow the range of the 
colormap at any time during a training session by modifying the Low and High text edit fields
in the Weight Output Range portion of the View->Options... dialog.

The View->Options... dialog also allows you to slow down or speed up the user interface during
training. You can make the updates of the error graph more frequent in order to get more detail
but doing so will cause the graph to scroll faster. Or, you can change how often the network
state (weights and firing values) graphics are updated. To increase either rate, specify smaller
numbers (50 will update ten times faster than 500). The faster you update either graphic, the
longer it will take for your network to train because it takes time to redraw the screen each
time an update occurs. The size and topology of the network will also have an effect on the time
that it takes to update the graphics, so my best advice is to load a network, and experiment with
different settings until you find one that works for you. Note that any changes that you make will
take affect immediately, in real-time, so you can experiment with the update rates without having 
to restart the training.

The Edit->Training Settings... dialog can be used to change parameters of the training algorithm
used by backprop. It is outside the scope of this document to give detailed descriptions of each
parameter, but here are some hints and observations:

-- Use "Maximum training iterations" to control how many training iterations are executed before
backprop gives up. The default value is probably too high for most cases, I would recommend a 
lower value and perhaps changing other parameters or the network architecture before attempting
to give the network a long time to converge on a solution.

-- Per-exemplar threshold defines how large the output error must be before the network adjusts
the weights. For example, if the threshold is 0.5 and the error is 0.6, then the error is greater
than the threshold and the weights in the network will be adjusted in an attempt to improve the
accuracy of the network. If the error were 0.4, then the network would not be adjusted for this
exemplar. If all of the errors for the exemplars are below the threshold, the network has learned
the exemplars and training successfully halts.

-- Momentum and learning rate are parameters that affect the training backpropagation training
algorithm. Momentum causes the network to consider earlier behavior of the network in computing
new weight values. Learning rate affects how rapidly weights are adjusted, and may or may not
affect the ability for the network to successfully train. Usually, you will want to set the 
learning rate high and the momentum low, but this is only a starting point. By turning on and off
these options, and changing the values, you can experiment with what works best for your network
architecture and training data.

-- Bias adds a trainable input to each hidden and output node in the network. The value of this
input is always 1, and the weight on this input is always adjusted during training. In some cases,
a bias is needed in order for the network to converge, but this is not always true. Again, refer 
to the literature for more guidance on the uses of bias, and experiment with backprop to see what 
affect it has.

-- Backprop also allows you to select from two activation functions. The first, and default is
sigmoid. This is by far the most popular activation function, and results in an output that is
in the range of 0.0 to 1.0. If your exemplars include outputs in the range -1.0 to 1.0, then 
hyperbolic tangent may be a better choice, since it fires in the range of -1 to 1. 

Finally, backprop training starts by initializing the weights to random values. By default, this
range is 0.0 to 1.0, but you can change the range to, say, -1.0 to 1.0 by using the Initial Weight 
Range settings in the Edit->Training Settings... dialog.

Executing the Network
---------------------

Once you have a trained network, you can use it to solve problems. Select Execute... from the
Network menu. If you just finished training the network, the weight file will be prefilled for
you. If you wish to use another weight file, type in its path or use to Browse... button to 
find it. You also need to specify an input file that contains the data you want the network to
process. These files are named with a ".run" suffix, for example, "mynet1.run". The file is 
very simple, just a single line of text with input values. For example,

0 1

will cause the network to set the value of the first input layer node to 0, and the second
input layer node to 1. Clicking OK in the Execute... dialog will cause the network to process
the specified file, and the graphical architecture window will display the results. Note that
the input nodes will display the values read from the input file, and the output nodes will
display an answer that should be correct for the data that was processed. If not, you might
consider adding the data to the exemplar file, and retraining the network. Then, the network,
assuming it trains, should have no problem processing the input.

Network Architecture Language
-----------------------------

The following is an example network description file for a 3 layer network that can be trained
to categorize seven-segment LED inputs. A seven-segment LED is depicted in the following figure:

   ----1----
   |       |
  2|     3 | 
   |       |
   |---4---|
   |       |
  5|     6 |
   |       |
   ----7----

An seven-segment LED can be used to display the numbers 0 - 9, and many letters of the alphabet as
well, and were introduced in the 1970s when electronic calculators and digital watches first hit 
the market, Numbers and letters are formed by lighting the individual segments. For example, you 
would light segments 1, 3, and 6 in order to display the number '7', like this:


   ----1----
           |
         3 | 
           |
           |
           |
         6 |
           |
  
The number 4 is displayed by the device when segments 2, 3, 4, and 6 are lit:


   |       |
  2|     3 | 
   |       |
   |---4---|
           |
         6 |
           |

A neural network with 7 inputs (each input corresponding to a segment in the seven-segment LED) and 
10 outputs (each output representing a number in the range [0, 9]) can be described as follows:

<network 
	name="Seven Segment"
	usemomentum="true" 
	momentum="0.1">
	<layer size="7"/>
	<layer size="5" activation="htan"/>
	<layer size="10"/>
</network>

Let's take a closer look at this. All networks are described by the "network" tag (a tag is
an XML construct that has a name and is surrounded by '<' and '>' characters). The network
tag can have several attributes, for example, we can specify if the network uses momentum 
during training with the "usemomentum" attribute, which, as shown above example, is set to 
the value "yes".

Nested inside of the network tag in the above example are three "layer" tags. Because there
are three layer tags, the network has 3 layers. Attributes of the layer tag specify the number
of nodes in each layer, and the type of activation each node in the layer fires.

The following is a quick reference for the tags supported in this version of Backprop.

Tag: <network>

Purpose: specifies the definition of a network and its attributes

Attributes:

Name              Type         Purpose                              Example
-------------------------------------------------------------------------------------------

name		          text         symbolic name for the network        name="xor"
usebias		        boolean      enable or disable bias               usebias="false"
usemomentum	      boolean      enable or disable momentum           usemomentum="true"
uselearningrate   boolean      enable or disable learning rate      uselearningrate="false"
threshold         float        set the update threshold             threshold="0.1"
momentum          float        set the momentum term                momentum="0.9"
learningrate      float        set the learning rate                learningrate="0.3"
ranlow            float        lower bound of weight random number  ranlow="-1.0"
                               range
ranhigh           float        upper bound of weight random number  ranhigh="1.0"
                               range 

Tag: <layer>

Purpose: specifies a layer and its attributes

Attributes:

Name              Type         Purpose                              Example
-------------------------------------------------------------------------------------------

size              integer      the number of nodes in the layer     size="7"
activation        text	       the type of activation fired by 
                               nodes in this layer. Possible 
                               values are "sigmoid" and "htan"      activation="htan"
                         
Books About Backpropagation 
---------------------------

These are a few books I've found useful in understanding backpropagation, and neural nets 
in general.

Author                     Title                                    Publisher     

James A. Anderson          Introduction to Neural Networks          MIT Press    																			
Reed, Marks                Neural Smithing                          MIT Press   
Rummelhart, McClelland     Parallel Distributed Processing, Vol 1   MIT Press     

Known Problems
--------------

None as of 0.9.4. 

Please e-mail requests and bug reports to me at slogan@cts.com.

Planned Enhancements
--------------------

-- A toolbar. I'm looking for a talented graphics artist who can do the artwork, so if you 
know of someone who can volunteer his or her time, please send me e-mail at slogan@cts.com.

-- Eventually, Cocoa (MacOS X) and Gtk+ (Linux) versions. 

Modification History
--------------------

8/17/2003 0.9.4

-- Fixed scrollbar issues introduced in 0.9.3


8/5/2003 0.9.3

-- Added double buffered graphics to eliminate flicker seen in earlier releases.

6/27/2003 0.9.2

-- Added checks for overflow when computing activation functions. If overflow occurs, training 
will abort, and the user should change the architecture of the net, or training parameters, to 
avoid the problem.

-- Added XML support for the network architecture file. A part of this change was to allow for
per-layer specification of the activation function. For example, the user can specify the use
of hyperbolic tangent activations in, say, hidden layer 2. Also, the user can now specify the
training attributes (learning rate, etc.) directly in the network architecture file.

6/11/2003 0.9.1
 
-- Added support for controlling the update frequency of the graphical representation of the 
network, and the training error strip chart.

-- Set Use Bias to false as default.

6/10/2003 Initial version 0.9 released.

Licenses
--------

The following corresponds to my use of expat in versions 0.9.3 and later:

Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd
                               and Clark Cooper
Copyright (c) 2001, 2002 Expat maintainers.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:

The above copyright notice and this permission notice shall be included
in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

