Skip to content

ldn-softdev/Rpnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Rpnn

This is an idiomatic C++ implementation of Resilient backprop Neural Network with an easy and convenient user interface; no dependencies - fully contained implementation (STL only), no memory/resource leakages, multithreaded, no dead-locks.

Enhancement requests and/or questions are more than welcome: [email protected]

Resilient backprop is known to be the fastest learning NN in the family of backprops, featuring a number of advantages over the standard backprop mechanism:

  • the learning rule is no longer proportional to the size of the gradient, only the sign of the computed gradient matters
    • programmatically it means no need for plugging a derivative of the logistic function (plugging only the logistic will do)
    • it's not prone to vanishing gradient problem that the standard backprop suffers from
  • the configuration of the rprop is simple and not as complex and sensitive as the standard backprop's
  • this implementation provides auto normalization of outputs and optionally of the inputs (why inputs may require normalization)
  • the framework is fully and easily SERDES'able
  • the framework also support multi-class classification (support of Softmax logistic at the output perceptron)
  • the framework features a detection mechanism of local minimum traps and bouncing its weights out of the trap - ensures a high probability of convergence
  • the framework provides a weight bouncer class capable of finding a better (deeper) minimum in absence of the global one (by running concurrent instances and searching for the deepest local minimum)

Given right configuration (topology, parameters) and enough resources (cpu cores, time)
`Rpnn` guarantees finding the best or near the best solutions

Content:

  1. cli toy
  2. Study Examples
  3. C++ class user interface

cli toy

This package provides a simple unix cli tool which allows running Rpnn from shell and toying with your data. This cli tool is probably the easiest way to introduce yourself to classical NN (no programming skills required), however it's also could be used in all the areas where backprop is applicable (classification, regressions, approximations, prediction, etc):

bash $ rpn -h
usage: rpn [-adhu] [-G N,M] [-P param] [-S separators] [-b threads] [-c cost_func] [-e target_err]
           [-f file_name] [-g N,M] [-l transfer] [-m factor] [-n min,max] [-o transfer] [-p N,M]
           [-r file_name] [-s seed] [-t perceptrons] [epochs]

Resilient Propagation Neural network (https://github.com/ldn-softdev/Rpnn)
Version 1.05 (built on Jan 13 2021), developed by Dmitry Lyssenko ([email protected])

optional arguments:
 -a             plug in a uniform bouncer (alternative to randomizer)
 -d             turn on debugs (multiple calls increase verbosity)
 -h             help screen
 -u             round up outputs to integer values
 -G N,M         recursively interconnect neurons N to M
 -P param       modify generic parameters (PARAM=x,y,..)
 -S separators  value separators (REGEX) [default: \s,;=]
 -b threads     best local minimum search (0: #threads equals #cores)
 -c cost_func   cost function [default: Sse]
 -e target_err  convergence target error [default: 0.001]
 -f file_name   file to dump Rpnn brain to [default: rpn.bin]
 -g N,M         grow synapse from neuron N to neuron M
 -l transfer    effectors logistic function [default: Sigmoid]
 -m factor      local minimum trap detection (0: disable) [default: 2]
 -n min,max     input normalization (min=max to disable) [default: -1,+1]
 -o transfer    output neurons logistic function [default: Sigmoid]
 -p N,M         prune synapse at neuron N to neuron M
 -r file_name   file to reinstate Rpnn brain from [default: rpn.bin]
 -s seed        seed for randomizer (0: auto) [default: 0]
 -t perceptrons full mesh topology (enumerated perceptrons) [default: 1,1]

standalone arguments:
  epochs        epochs to run convergence [default: 100000]

 - parameters N,M are zero based, the index 0 refers to a reserved neuron "the one"
 - factor for option -m is multiple of the total count of synapses (weights)

 available cost functions:
 	o Sse
	o Xntropy

 available logistic functions:
 	o Sigmoid
	o Tanh
	o Tanhfast
	o Relu
	o Softplus
	o Softmax

 generic Rpnn parameters (alterable with -P):
 	o BLM_RDCE [5]
	o DW_FACTOR [1.1618]
	o LMD_PTRN [0.001]
	o MAX_STEP [1000]
	o MIN_STEP [1e-06]
	o NRM_MAX [1]
	o NRM_MIN [-1]

 for further details refer to https://github.com/ldn-softdev/Rpnn
bash $ 

Manual installation

  1. Download, or clone the repo
  2. compile using C++14 or later:
    • MacOS:
    bash $ c++ -o rpn -Wall -Wextra -std=c++14 -Ofast rpn.cpp
    bash $
    • Linux (relocatable image):
    bash $ c++ -o rpn -Wall -std=gnu++14 -Ofast -static -Wl,--whole-archive -lrt -pthread -lpthread -Wl,--no-whole-archive rpn.cpp
    bash $

rpn operations

rpn operates in two modes:

  1. learning mode
  2. trained mode

Learning mode

In the learning mode rpn learns from the provided input-target samples and once the solution is found (rpn successfully converges) it dumps its trained brains into the file (default file is rpn.bin, option -f steers the output filename)

Training patterns are read line-by-line, each line containing input and target values, so that the number of input values corresponds to the number of receptors in the configured topology, and the number of target values corresponds to the number or output neurons. The values on the input line should be separated either with blank space, or optionally with , or ;, or = symbols (separators could be specified using -S option [REGEX notation])

note: there's no semantical significance for separators, so they could be interchanged/duplicated freely

For example, say your topology has 2 receptors and 1 output neuron, then any of following input lines are fine:

0 0 0
0, 1,  1
1,0 =1
1==1,,0

The last line though might be confusing, as it still facilitates two inputs (1, 1) and one single output (0), so apply your discretion when using value separators.

If rpn does not find a solution (fails to converge), then it does not dump its brains into the file (then you should adjust parameters, e.g.: increase epochs, alter target error, change topology, etc) - that applies though only when searching for a global minimum - when engaging BLM search NN always converges with some degree of success.

Trained mode

To start rpn in a trained mode, you need to give it a parameter -r followed by the file name where rpn brains are (default is rpn.bin) in the trained mode rpn accepts the input lines the same way like in the Learning mode, only the values on each line here are input patterns (if any other values are present on the same line, those will be ignored)

when option -r is given the others will be ignored (save -u)

Option -u instructs rpn to round up all the outputs (if real numbers used) to an integer part (in the trained mode, of course); in case if outputs are symbolic enumerations, then show real conversion numbers instead

rpn options and parameters

rpn is extensively debuggable, though using debug depth higher than 3 (-ddd) is not advisable as it will cause huge dumps on the console

rpn has following default parameters when none given:

bash $ rpn -d
.configure(), receptors: 1
.configure(), effectors: 1
.configure(), output neurons: 1
.configure(), target error: 0.001
.configure(), normalize inputs: true [-1 to +1]
.configure(), LM trail size: 4
.configure(), cost function: cf_Sse
.configure(), generic parameter BLM_RDCE: 5
.configure(), generic parameter DW_FACTOR: 1.1618
.configure(), generic parameter LMD_PTRN: 0.001
.configure(), generic parameter MAX_STEP: 1000
.configure(), generic parameter MIN_STEP: 1e-06
.configure(), generic parameter NRM_MAX: 1
.configure(), generic parameter NRM_MIN: -1
.configure(), blm (threads) engaged: no
.configure(), bouncer: native
.configure(), randomizer seed: timer (1609875073812804)
.configure(), epochs to run: 100000
.read_patterns_(), start reading training patterns (1 inputs + 1 outputs)...

^Caborted due to user interrupt received: SIGINT (2)
bash $ 

Default parameters

Topology
.configure(), receptors: 1
.configure(), effectors: 1
.configure(), output neurons: 1
  • Number of receptors: 1 (receptor is a neuron w/o synapses, facing user inputs)
  • Number of effectors: 1 (effector is a non-receptor neuron)
  • Number of output neurons: 1 (output neuron is also an effector)
    thus such default topology is expressed as an option -t 1,1 (there are only 2 neurons in such topology)

well, there's one more hidden neuron ("the one") which is always implicitly present and is interconnected to all others

option -t lets setting up topology and interconnects adjacent layers (perceptrons) in a full mesh

Target error
.configure(), target error: 0.001

- option -e allows setting the target error for convergence

network considers convergence done when the network's global error (i.e., the error across all the output neurons) drops below the target error

Some tasks might not even have a global minimum solution (e.g.: approximations/regressions, or even classifications with a weak correlation), thus manual adjusting target error (to the higher end) might be required.

Though adjusting target error manually could be tedious and non-efficient, rpn provides an automatic way for searching the deepest local minimum in absence of a global one (see option -b)

Inputs normalization
.configure(), normalize inputs: true [-1 to +1]

- Inputs normalization is on by default and could be turned off with option -n 0,0, or -n 1,1 (any combination where min and max parameters are the same). Given that the logistic function often is a bounded type (e.g.: sigmoid, tanh, etc) the faster convergence occurs when input's max and min values are mapped around the logistic's zero point. Default input normalization values are -n -1,+1.

Also, Rpnn limits delta weight step's min and max values to 1.e-6 and 1.e+3 respectively (though such default parameters could be altered with -P option):

#define RPNN_MIN_STEP   1.e-6
#define RPNN_MAX_STEP   1.e+3

Thus, very small or very large input values w/o normalization simply won't converge - the input normalization ensures respective resolution precision.

For example, this converges fine with the normalization on (default):

bash $ <<<"
1e-5, 1e-5 = 10
1e-5, 2e-5 = 20
2e-5, 1e-5 = 20
2e-5, 2e-5 = 10
" rpn -t2,2,1
Rpnn has converged at epoch 40 with error: 0.000512515
bash $ 
bash $ <<<"2e-5, 2e-5 " rpn -ur rpn.bin
10
bash $ 

But with the normalization turned off, it'll fail to find a solution:

bash $ <<<"
1e-5, 1e-5 = 10
1e-5, 2e-5 = 20
2e-5, 1e-5 = 20
2e-5, 2e-5 = 10
" rpn -t2,2,1 -n0,0
Rpnn could not converge for 100000 epochs (err: 1.00001) - not saving
bash $ 

btw, output normalization is always on and there no way to turn it off: the output logistic functions requirement is always to be a bounded type (in this tool's implementation, not in Rpnn class), thus output normalization helps to have any scale of output parameters

Local Minimum traps detection
.configure(), LM trail size: 4

- the framework provides a way to detect if during the convergence it ends up in the local minimum valley so that it could re-initialize all its weights and bounce itself out of the local minimum trap.
That mechanism is facilitated with the tracking the error trail of each epoch's global error. The size of such trail is typically proportional to the total number of weights in a given topology with the default factor of -m 4 (i.e., times 4). Though it does not always work optimally and sometimes requires adjustments (to a shorter factor, e.g.: 2 or 3 - to speedup convergence) or a longer one (to ensure a more reliable LM detection).

The mechanism poses a dilemma though: LM trap detection drastically improves chances for a successful converge, but the trail size slows down the convergence itself (the bigger trail size, the slower training runs) - finding a right balance is the subject of some research for a given problem solution.

Setting trail size to zero (-m0) disables LM detection (and also will render BLM search ineffective)

Cost (error) function
.configure(), cost function: cf_Sse

Default cost function to evaluate convergence (across all the output neurons) is Sum of Squared Errors (Sse). Another cost function is Cross Entropy (Xntropy)

Typically, Cross Entropy is used together with Softmax logistic functions of the output neurons.
- to alter the cost function, use -c Xntropy

Generic parameters
.configure(), generic parameter BLM_RDCE: 5
.configure(), generic parameter DW_FACTOR: 1.1618
.configure(), generic parameter LMD_PTRN: 0.001
.configure(), generic parameter MAX_STEP: 1000
.configure(), generic parameter MIN_STEP: 1e-06
.configure(), generic parameter NRM_MAX: 1
.configure(), generic parameter NRM_MIN: -1

Rpnn framework has a few default parameters, which normally do not require much of fiddling. However, any of those also could be modified using -P option.
Say, you want to try another input normalization range, e.g.: -50, +50. Then either way will do it:

  • -P NRM_MAX=50 -P NRM_MIN:50 - separator between parameter name and value could be either : or =,
  • -P NRM_MAX:50,-50 - if multiple values given, then those applied on the respective parameters in the displayed order starting from the given one (the order is the same on the help screen -h)

For example, it's possible to alter all the values in one go, like this:
rpn -P BLM_RDCE:'15, 1.5, 0.01, 1e+5, 1e-10, 20, -20' ...

note: quotes are used because of the spaces separating parameters (i.e., to dodge shell interpolation)

Description:

  • BLM_RDCE - reduce factor for best local minimum search mode - varies from 1 (exclusively) to to any higher number - the higher number, the harder Rpnn will try finding the best (deepest) LM (i.e., more attempt will be made). The factor is exponential though, so numbers above 10 might already severely impact resolution time (depends on many other factors too - size of the topology, size of the LMD_PTRN, etc)
  • DW_FACTOR - a momentum factor to increase synapse's delta weights if gradient sign did not change. This factor also has an exponential effect and setting it to too big values may result that Rpnn will be overshooting minimums too frequently. Setting it to values lower or too close to 1 does not make sense either - slow momentum will result in slower convergence as well make Rpnn suffer from vanishing gradient problem
  • LMD_PTRN - a percentage factor for Local Minimum trap detection mechanism - how accurately the mechanism will try recognizing error looping behavior. The smaller value the more accurate detection is, the higher value provides more coarse looping detection (which might result in an earlier detection but also a false-positive detection too)
  • MAX_STEP
  • MIN_STEP - these two provide upper and lower capping for delta weight in synapses
  • NRM_MAX
  • NRM_MIN - these two provide max, min normalization boundary for input pattern values normalization. Setting them to the same (any) value results in disabling input normalization (which is not advisable)

Searching best local minimum
.configure(), blm (threads) engaged: no

By default rpn will try finding a global minimum (convergence point, where the global error is close to zero - below target's error) and if it fails to find one - it won't save its brains at the end. However, most of the tasks where backprops may apply do not have global minimum at all. There the solution sounds like this: find the deepest possible local minimum.

The approach Rpnn takes in such case is running and tracking multiple convergences until local minimum is detected and then picking the convergence result with the smallest error (deepest found LM). It's implemented by running multiple instances of configured NN concurrently (each instance will be run multiple times too).

To enable such mode (a.k.a. BLM search) is to give option -b followed by the number of threads. if number 0 given (-b0), then the number of threads will correspond to the maximum number of supported hardware threads (#cores times #threads per core).

The target error (-e) in such case serves as a twofold exit criteria:

  • if NN able to converge below the target error (i.e., a global minimum is found)
  • if adjusted internal goal error's delta (with found so far best minimum) drops below target error

adjustment of the goal error occurs every time when any Rpnn instance detects LM and LM's error is worse than found so far best minimum - then the goal error is adjusted by BLM_RDCE factor).

Because BLM search is suitable for finding even the global minimum, it does not hurt to run rpn always with -b option.

Alternative weight bouncer
.configure(), bouncer: native

By default rpn will use a simple randomizer to update its weights before starting a new convergence (and when bouncing itself out of LM trap).
For a reference there's another bouncer provided: it builds first a limited set of uniform weights distributions (across all the weights) and then uses them up (in a random order) until all exhausted - that method is more deterministic than random weight bouncing in terms that it tries only a certain prescribed sets of weights distributions.

To plug the alternative weight update function give -a option.

Seed for randomizer
.configure(), randomizer seed: timer (1609875073812804)

A seed for randomization (weights initializing) is taken from the timer, though for some debugging (or research) purposes it might be required running multiple convergences with the same seed, which could be done using option -s <seed>

note: though setting the same seed won't provide deterministic behavior in the BLM (option -b), due to inability to control concurrent convergence

Epochs to run
.configure(), epochs to run: 100000

When NN tries once all the given input patterns and learns from them (by back-propagating the resulting error and adjusting its weights towards the closer match) it's called an epoch. Because the error plane is always smoothly differentiable, it inevitably leads towards the minimum either local or global (thanks to the learning rule), however, it certainly requires an unknown number of such iterations (epochs) to reach one.
Rpnn reaches the minimums quite quickly and then (if LMD is enabled) will try bouncing itself out of the found LM and will descend into another one.
To cap the number of such iterations the number of epoch sets the limit. The maximum number of epochs is given as the only standalone attribute to Rpnn (if omitted, then default number 100000 is used).

The above mostly applies when BLM search is not engaged, otherwise, there the number of attempts is rather limited by the number of LM found (which is a combinations of 2 factors BLM_RDCE and target error), though setting epoch number to a very shallow value is not advisable, as it may result in a premature end of convergence even before reaching a local or global minimum.

Configuring NN Topology

NN topology could be verified with -dd debug depth:

bash $ <<<"0 0 " rpn -dd
.configure(), receptors: 1
.configure(), effectors: 1
.configure(), output neurons: 1
.configure(), target error: 0.001
.configure(), normalize inputs: true [-1 to +1]
.configure(), LM trail size: 4
.configure(), cost function: cf_Sse
.configure(), generic parameter BLM_RDCE: 5
.configure(), generic parameter DW_FACTOR: 1.1618
.configure(), generic parameter LMD_PTRN: 0.001
.configure(), generic parameter MAX_STEP: 1000
.configure(), generic parameter MIN_STEP: 1e-06
.configure(), generic parameter NRM_MAX: 1
.configure(), generic parameter NRM_MIN: -1
.configure(), blm (threads) engaged: no
.configure(), bouncer: native
.configure(), randomizer seed: timer (1610037935476627)
.configure(), epochs to run: 100000
.read_patterns_(), start reading training patterns (1 inputs + 1 outputs)...
.read_patterns_(), read 1 pattern(s)
.resolve(), training patterns read and loaded, starting convergence...
..resolve(), class 'Rpnn'...
   Rpnn::addr(): 0x7ffee61b1040
   Rpnn::min_step(): 1e-06
   Rpnn::max_step(): 1000
   Rpnn::dw_factor(): 1.1618
   Rpnn::target_error_: 0.001
   Rpnn::cost_func(): "Sse"
   Rpnn::wbp_: 0x7ffee61b1120
   Rpnn::epoch_: 0
   Rpnn::terminate_: false
   Rpnn::effectors_start_idx(): 2
   Rpnn::output_neurons_start_idx(): 2
   Rpnn::neurons()[0]: class 'rpnnNeuron'...
      neurons()[0]::addr(): 0x7fefc2c07900
      neurons()[0]::host_nn_ptr(): 0x7ffee61b1040
      neurons()[0]::is_receptor(): true
      neurons()[0]::transfer_func(): "Sigmoid"
      neurons()[0]::out(): 1
      neurons()[0]::delta(): 0
      neurons()[0]::bp_err(): 0
      neurons()[0]::synapses(): []
      neurons()[0]::inputs_ptr(): nullptr
      neurons()[0]::sum_: 0
   Rpnn::neurons()[1]: class 'rpnnNeuron'...
      neurons()[1]::addr(): 0x7fefc2c07640
      neurons()[1]::host_nn_ptr(): 0x7ffee61b1040
      neurons()[1]::is_receptor(): true
      neurons()[1]::transfer_func(): "Sigmoid"
      neurons()[1]::out(): 1
      neurons()[1]::delta(): 0
      neurons()[1]::bp_err(): 0
      neurons()[1]::synapses(): []
      neurons()[1]::inputs_ptr(): 0x7fefc2c079d0
      neurons()[1]::sum_: 0
   Rpnn::neurons()[2]: class 'rpnnNeuron'...
      neurons()[2]::addr(): 0x7fefc2c076c0
      neurons()[2]::host_nn_ptr(): 0x7ffee61b1040
      neurons()[2]::is_receptor(): false
      neurons()[2]::transfer_func(): "Sigmoid"
      neurons()[2]::out(): 1
      neurons()[2]::delta(): 0
      neurons()[2]::bp_err(): 0
      neurons()[2]::synapses()[0]: rpnnSynapse.. host_nn_ptr():0x7ffee61b1040, linked_neuron_ptr():0x7fefc2c07900, weight():2.20194e-314, delta_weight():2.20197e-314, gradient():2.20197e-314, prior_gradient():0
      neurons()[2]::synapses()[1]: rpnnSynapse.. host_nn_ptr():0x7ffee61b1040, linked_neuron_ptr():0x7fefc2c07640, weight():2.20197e-314, delta_weight():2.20197e-314, gradient():2.20197e-314, prior_gradient():0
      neurons()[2]::inputs_ptr(): nullptr
      neurons()[2]::sum_: 0
   Rpnn::input_sets_[0][0]: -1
   Rpnn::input_normalization()[0]: Norm.. found_min():0, found_max():0, base():-1, range():2
   Rpnn::target_sets_[0][0]: 0
   Rpnn::target_normalization()[0]: Norm.. found_min():0, found_max():0, base():0, range():1
   Rpnn::output_errors_[0]: 0
   Rpnn::lm_detector(): fifoDeque.. capacity():4, fifo():[]
Rpnn has converged at epoch 2 with error: 0.000525686
.resolve(), dumped rpn brains into file: rpn.bin
bash $ 

Neuron synapses provide linkage to other neurons via linked_neuron_ptr(), so that a topology could be traces down.
In every topology there's one hidden neuron (a.k.a. "the one") - that neuron is required for a NN convergence and every effector is linked to that neuron - "the one" is always shown first in the above output
All the other neurons are from user's configuration, e.g.: the neuron with address 0x7f91fbc06cf0 is a receptor (is_receptor(): true), the logistic for a receptor is irrelevant, as receptors only facilitate input (patterns) values access.

Sigmoid is a default transfer function for all the neurons, though all effectors (and output neurons separately) could be setup using other logistics:

  • Tanh - could be used in hidden and output neurons
  • Tanhfast - could be used in hidden and output neurons
  • Relu - could be used only in hidden neurons
  • Softplus - could be used only in hidden neurons
  • Softmax - could be used in output neurons only.

All the neuron's logistic function could be setup using -l followed the name of the function
Output neuron's only logistics could be setup using -o option followed the name of the function

Setting hidden effectors to a non-bound logistic (Relu, Softplus) requires understanding of the implications. On one hand it may result in a very fast convergence (if weights initialization is favorable, or multi-dimensional plane of f(x) = error(weights) is favorable for given task):

bash $ <<<"
0 0 = 0
1 0 = 1
0 1 = 1
1 1 = 0
" rpn -t2,2,1 -l Relu -o Sigmoid
Rpnn has converged at epoch 12 with error: 0.000974536

However, convergence of hidden neurons on Relu may (and most likely will) kick the weights way afar the global region, resulting in wandering often around local minimums:

bash $ <<<"
0 0 = 0
1 0 = 1
0 1 = 1
1 1 = 0
" rpn -t2,2,1 -l Relu -o Sigmoid
Rpnn has converged at epoch 97363 with error: 0.000343371
bash $ 
Growing and pruning synapses

If a full-mesh connectivity between neuron layers is not enough and you want to add more connections (typically recursive links), then it's possible to do it via options -g, -G:

  • -g N,M allows adding a single synapse for neurons N to M's output (i.e., a connection from M to N considering forward signal flow)
  • -G N,M this will ensure that between neurons N and M all recursive synapses added

base of values M and N is the same as in debug output - it's all zero based, but the first neuron (with index of 0) is always reserved)

option -p N,M allows pruning a single synapse at the neuron N for the (address of) neuron M
a variant of option -p N prunes all the synapses from neuron N

Other supported options

  • -f <file> - lets dumping trained rpn brains (upon a successful convergence) into the file of your choice (default output file is rpn.bin)
  • -r <file> - starts rpn in the trained mode - reads and reinstate brains state entirely from the file, ready to read & run the input data
  • -S <separators> - allows specifying a list of separators used for the input lines. Default are \s,;= (note the REGEX notation)

Study Examples

Let's review a few of academical and real world examples

Hello World!

"Hello World!" problem in the NN is the training of XOR function - it's the simplest problem that requires a multi-perceptron to converge (why is that - is out of scope of this tutorial, but you can easily google up The XOR Problem in Neural Networks).

Topology for the rpn can be given using -t option followed by the perceptron sizes over the comma. E.g.: to train rpn for the XOR function, following topology is required:

    input1 -> (R)---(H)
                 \ /   \
                  X     (O) -> output
                 / \   /
    input2 -> (R)---(H)

That topology is made of 3 layers:

  • 1st layer is made of 2 receptors (R) - first layer is always a layer of receptors
  • 2nd layer is made of 2 hidden neurons (H)
  • and finally the 3rd layer is made of a single output neuron (O)

Thus, it could be expressed as rpn -t 2,2,1 (note: no spaces between numbers). rpn provides a full-mesh synapse connectivity between all adjacent layers.

And here we're good to run our first data sample:

bash $ <<<"
0, 0 = 1
1, 0 = 0
0, 1 = 0
1, 1 = 1
" rpn -t2,2,1
Rpnn has converged at epoch 17 with error: 0.000299919
bash $ 

Now file rpn.bin contains the brain dump of the trained pattern and could be reused on the input data:

bash $ <<<"
0 0
1 1
0 1
" rpn -u -r rpn.bin
1
1
0
bash $

As you might have noticed, rpn was trained for NOT XOR function instead

That shows that the network has learnt the training material properly.

Enumerated patterns

rpn can accept not only numerical input/output patterns but also symbolical (per each input/output channel individually). When symbolical input is detected, all symbolic tokens are tracked and enumerated, i.e., internally translated into respective ordinal numerical values.

Because enumeration always occurs from the first seen token in each channel (receptor/ output neuron) separately (individually), such inputs can be used only when the channel values (input/output) are independent from each other.

Say, we want to train the NN for XOR problem encoding signals 0 as low and 1 as high. If we do it like this, it still works:

bash $ <<<"
high, low  = high
low,  high = high
low,  low  = low
high, high = low
" rpn -t2,2,1
Rpnn has converged at epoch 21 with error: 0.000462312
bash $ 
bash $ <<<"
high, low  = high
low,  high = high
low,  low  = low
high, high = low
" rpn -r rpn.bin
high
high
low
low
bash $ 

However, the signal mapping semantic here won't be like as one would expect. In fact, the above example corresponds to training this pattern:

0, 0 = 0
1, 1 = 0
1, 0 = 1
0, 1 = 1

I.e., in the 1st input channel high,low tokens correspond respectively to 0,1 signals; in the 2nd input channel do low,high to 0,1 respectively. the output channel's high,low tokens correspond respectively to 0,1. It can be observed when showing real converged numbers instead of tokens:

bash $ <<<"
high, low  = high
low,  high = high
low,  low  = low
high, high = low
" rpn -r rpn.bin -u
0.00837045
0.0114564
0.970546
0.999975
bash $ 

Multi-class

The above example illustrates a binary classification, though it's not the only possible type of classification, sometimes tasks require multiple classes. E.g.: the same solution could be expressed as 3 classes:

a) set class1 when the inputs are all zero (0,0)
b) set class2 when the inputs vary (1,0 or 1,0)
c) set class3 when the inputs are all ones (1,1)

This type of classification require setting the logistic of all 3 output neurons (one output neuron per class) to Softmax activation function (default is Sigmoid for all neurons) and the cost function to be Cross-Entropy (default is Sum of Squared Errors - Sse):

bash $ <<<"
0,0 = 1 0 0
1,0 = 0 1 0
0,1 = 0 1 0
1,1 = 0 0 1
" rpn -t2,2,3 -o Softmax -c Xntropy
Rpnn has converged at epoch 22 with error: 0.000758367
bash $ 

Now, the trained network will display all 3 classes (output neurons):

bash $ <<<"
0 0
1 1
0 1
" rpn -ur rpn.bin
1 0 0
0 0 1
0 1 0
bash $ 

Classification as probability

However, like it was mentioned before, it's quite rare when the problem solution has a global minimum. Even for classification types of problems the real world input data may contain duplicate and even conflicting data.
Let's consider this input set - NOT XOR with noisy inputs:

bash $ <<<"
0, 0 = 1
1, 0 = 0
0, 1 = 0
1, 1 = 1
1, 1 = 0
1, 1 = 1
" rpn -t2,2,1
Rpnn could not converge for 100000 epochs (err: 0.636509) - not saving
bash $ 

The last tree lines in the training data set represent the issue: with 66% probability it indicates that pattern 1, 1 produces the 1 signal, while with 33% probability it's 0 signal.

So, it's time to engage BLM search:

bash $ <<<"
0, 0 = 1
1, 0 = 0
0, 1 = 0
1, 1 = 1
1, 1 = 0
1, 1 = 1
" rpn -t2,2,1 -b0
Rpnn found best local minimum, combined total epochs 2659 with error: 0.333333
bash $ 

Let's see how it learned the problem:

bash $ <<<"
0, 0
1, 0
1, 1
" rpn -ur rpn.bin
1
0
1
bash $ 

- as you see it resolved the problem choosing most probable outcome for the conflicting pattern, but what's the actual number for the problematic pattern?

bash $ <<<'1, 1' rpn -r rpn.bin
0.666659
bash $ 

- it found a LM where the error corresponds to the probability of signal occurrence in the input. The pattern 1, 1 results in 0.66,6659% probability (of the output signal occurrence) in this case!

Couple classification examples from internet

Quick search on internet leads to UCI ML Repository for the real world data sets. Let's take couple samples from there:

Iris classification

Here's a page for Iris flower classification, the entire data set iris.data contains 150 input patterns, each made of 4 input parameters and 1 output class, the data (mixed) look like this:

bash $ cat iris.data | sort -R
7.1,3.0,5.9,2.1,Iris-virginica
5.8,2.7,3.9,1.2,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
4.9,3.1,1.5,0.1,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
...

The output class has only 3 values:

  • Iris Setosa
  • Iris Versicolour
  • Iris Virginica

The input values are some botanical property measurements of the flowers. But do we really need to know that info about the input/output data? No, really not. What we really need to know is the total size of the input set (150), number of inputs channels (4) and number of output classes (1).

Normally, NN are trained to be able to generalize rather then memorize the data. Thus, selecting the training set requires a careful selection of the representative data out of the input sets. Then train the NN on the training set and verify that it works on the rest of the data (i.e., the data that it has never seen before).

Let's skip that careful selection and just randomly pick a half of the set (hoping it would be representative enough):

bash $ <iris.data sort -R | head -n75 >iris_train.data
bash $ 

Now, let's train the NN for iris problem (let's start with just 2 neurons in the hidden layer):

bash $ <iris_train.data rpn -f iris.bin -t4,2,1 -b0 
Rpnn found best local minimum, combined total epochs 15331 with error: 0.122279
bash $ 

With given error, it makes only 3 mistakes (out of 150), which is only 2% error rate - not bad at all! By far not all real-world problems have such a good correlations between inputs and outputs

bash $ <iris.data rpn -riris.bin -d
.run(), reinstated rpn brains from file: iris.bin
.run(), cnv_.size(): 5
.run(), receptors_count: 4
...
.read_patterns_(), .read_patterns_(), read input values: 5.9 3.2 4.8 1.8 Iris-versicolor
Iris-virginica
...
.read_patterns_(), .read_patterns_(), read input values: 6.7 3.0 5.0 1.7 Iris-versicolor
Iris-virginica
...
.read_patterns_(), .read_patterns_(), read input values: 6.0 2.7 5.1 1.6 Iris-versicolor
Iris-virginica

There's also an easier visualization than looking at debugging - by transforming outputs into JSON values and then using jtc to apply a series of transformations and display the discrepancies between input data and the produced results, like this:

bash $ <iris.data rpn -r iris.bin -d 2>&1 | tail -n+4 | sed -E 's/^.* ([^ ]+)$/\1/; s/.*/"&"/' |\  
jtc -J / -jw'[::2]<V>v' -T0 -w'[1::2]' -T'[{{V}},{{}}]' / -jw'><i:' / -rw'[:]<I>k[0]<V>f[-1][1]<V>s<>F[-1]' -T'{"{I}":{{}}}'
{ "70": [ "Iris-versicolor", "Iris-virginica" ] }
{ "77": [ "Iris-versicolor", "Iris-virginica" ] }
{ "83": [ "Iris-versicolor", "Iris-virginica" ] }
bash $ 

There might be a temptation to achieve even a more perfect result by throwing more neurons into the topology, but then there's a risk of overtraining the network, where it starts memorizing the data instead of generalizing.
Typically the behavior of the overtrained NN is attributed to following:

  • once trained (on the training set), it converges with a suspiciously low error - like a global minimum is found
  • when verified on the training data - it's indeed produces zero mistakes
  • but once verified on the entire data sets (i.e., on the data NN hasn't seen before) it starts making unexpectedly many mistakes

- then it's indeed could be that NN is overtrained, or the selected training set is unrepresentative (e.g.: it could be made of data showing a strong input-output correlations while weak correlation data were left out)

Important: the computational power of the NN is driven by the total count of synapses and not so much by the count of neurons!

Car Evaluation

page: Car evaluation requires classification training
data set file: car.data
description file: car.names

data look like this:

bash $ <car.data sort -R
med,low,2,4,big,high,vgood
low,med,3,4,small,high,good
low,low,2,more,small,low,unacc
vhigh,low,4,more,big,med,acc
high,high,3,more,big,low,unacc
med,med,3,more,med,high,vgood
...

Three are no continuous input values (only discrete, enumerated inputs). The last column represent an output class (made of 4 values: unacc, acc, good, vgood). There is a total of 1728 values:

bash $ <car.data wc -l
    1728
bash $ 

Thus, training data preparation (which is generally the most crucial part of a successful NN training) here also could be trivial. Description provide following class distribution:

   class      N          N[%]
   -----------------------------
   unacc     1210     (70.023 %) 
   acc        384     (22.222 %) 
   good        69     ( 3.993 %) 
   vgood       65     ( 3.762 %) 

Thus, for the training set, we can select all entries for good, vgood patterns and around 120 (random) patterns form each unacc, acc classes:

bash $ for i in unacc acc good vgood; do <car.data grep ",$i" | sort -R | head -n120; done | sort -R >car_test.data
bash $ <car_test.data wc -l
     374
bash $ 

After playing a bit with variations of topology and other parameters, this pans out like a quite promising solution:

bash $ time <car_test.data rpn -t6,6,4,2,1 -b0 -e1e-30 -PBLM_RDCE:30 -f car.bin -lSoftplus -oSigmoid
Rpnn found best local minimum, combined total epochs 683236 with error: 3.19595

real	1m22.389s
user	10m9.294s
sys	0m1.718s
bash $
bash $ <car.data sort -R | rpn -r car.bin -d
.run(), reinstated rpn brains from file: car.bin
.run(), cnv_.size(): 7
.run(), receptors_count: 6
.read_patterns_(), .read_patterns_(), read input values: high high 4 more big high acc
acc
.read_patterns_(), .read_patterns_(), read input values: high med 4 2 small high unacc
unacc
.read_patterns_(), .read_patterns_(), read input values: med low 4 4 small high good
good
.read_patterns_(), .read_patterns_(), read input values: high vhigh 2 more med med unacc
unacc
...
bash $
bash $ <car.data rpn -r car.bin -d 2>&1 | tail -n+4 | sed -E 's/^.* ([^ ]+)$/\1/; s/.*/"&"/' |\
jtc -J / -jw'[::2]<V>v' -T0 -w'[1::2]' -T'[{{V}},{{}}]' / -jw'><i:' / -rw'[:]<I>k[0]<V>f[-1][1]<V>s<>F[-1]' -T'{"{I}":{{}}}' | wc -l
     149
bash $ 

- The resulting error rates:

  • on the training set: 28 / 374 = 7.487%
  • on the entire data set: 149 / 1728 = 8.623%

So, this another example of a quite correlateable data with a good solution

C++ class user interface

C++ class interface is rather simple. Begin with including a header file:

#include "rpnn.hpp"

That hearder file contans following classes:

  • class rpnnSynapse: facilitates synapse (neurons linkage) as well as (connection) weight
  • class rpnnNeuron: facilitates neurons - holds connecting synapses, logistic function and link to the input patterns (for receptors only)
  • class rpnnBouncer: default class facilitating NN weight assignments (via randomization) - allows plugging weight update methods
  • class Rpnn: user facing class facilitating Resilient backPropagation Neural Network
  • class uniformBouncer: an example of pluggable functor class (for a reference) providing a uniform weight distribution
  • class blmFinder - a child class of rpnnBouncer facilitating a search of a better LM via spawning concurrent (multithreaded) copies of Rpnn host

Rpnn (and other classes user-facing methods) support fluent notation

Essential SYNOPSIS:

here's an example how to to create and train Rpnn for XOR, OR, and AND problem (i.e., 3 output classes) with statically hardcoded data:

    // ...
    // input patterns:
    std::vector<std::vector<double>>
    	input_ptrn = {{0,1,0,1},		// 1st channel (to the 1st receptor)
                      {0,0,1,1}};		// 2nd channel (to the 2nd receptor)
    // target patterns:
    std::vector<std::vector<double>>
    	target_ptrn = {{0,1,1,1},		// OR output channel
                       {0,1,1,0},		// XOR output channel
                       {0,0,0,1}};		// AND output channel

    // configuring and training Rpnn
    Rpnn nn;
    nn.full_mesh(2, 2, 3)			// begin with defining topology
      .normalize()				// optional, but if used, must be called before load_patterns(..) 
      .load_patterns(input_ptrn, target_ptrn)	// load up in/out patterns, required
      .lm_detection(nn.synapse_count() * 3)	// engage LM trap detection (optional, but desirable)
      .target_error(0.00001)			// in this example it's optional
      .converge(10000);				// find solution

    std::cout << "epochs took to converge: " << nn.epoch() << std::endl; 

    // Offload NN brains into the file
    std::ofstream file("oxa.bin", std::ios::binary);	// open file
    file << std::noskipws << Blob(nn);		// dump (serialize) NN to file

Now the counterpart - reading a (trained) Rpnn brains from the file and activating with user data:

    // ...
    Blob b(std::istream_iterator<char>(std::ifstream{"oxa.bin", std::ios::binary}>>std::noskipws),
           std::istream_iterator<char>{});	// read serialized NN from file into blob

    Rpnn nn(b);					// Create Rpnn obj & de-serialize blob into NN
    // the two above declarations could be combined into one: Rpnn nn(Blob(...));

    // activate varous channels
    std::cout << "1 AND 0 = " << nn.activate({1, 0}).out(2) << std::endl; 
    std::cout << "1 XOR 1 = " << nn.activate({1, 1}).out(1) << std::endl;
    std::cout << "0 OR 1 = "  << nn.activate({0, 1}).out()  << std::endl;

Possible output:

epochs took to converge: 26
1 AND 0 = 1.62056e-06
1 XOR 1 = 1.19259e-05
0 OR 1 = 1

Topology methods:

    Rpnn nn;		// The copy constructor exist, but rather performs a cloning operation;
			// the move constructor is deleted (but that might be easily lifted in the future)
			// there's one another forms for constructor:
			// 	Rpnn(Blob & blob);  - it restores Rpnn state from the blob

    nn.full_mesh(..);	// it's best to begin with creating a skeleton of topology:
			// full_mesh method exists in two variants:
			// 1. variadic form - accepts topology as enumerated perceptrons, e.g.:
			//   	full_mesh(5,3,2,3); - 5 receptors, 3 neurons in 1st hidden layer, 2 neurons in 2nd
			//                            hidden layer, 3 output neurons
			// 2. accepts a templated STL trivial container (std::vector, std::list, std::deque, etc)
			//	std::vector<int> my_topology{5,3,2,3};
			//	full_mesh(my_topology);

If by chance a full-mesh topology is not good enough, then it's possible to modify it by growing and pruning synapses:
- class pnnNeuron provides an access to the methods allowing linking, growing and pruning synapses. However, neurons themselves have to be accessed first.

There are 3 types of neurons which typically a user would need an access to:

  1. receptors: these neurons don't have synapses and facilitate input patterns access
  2. effectors: these are non-receptors - i.e., neurons with synapses
  3. output_neurons: these are the effectors in the last (output) layer

Structurally, all the neurons are being held in the sequential container (std::list) and could be accessed using following iterators:

                         neurons().begin()      effectors_itr()
                               |                       |
                               v                       v
 std::list<rpnnNeuron>:      ("1"),  (R1)  ...  (Rn)  (E1) ... (En-m), (O1) .. (Om)
                                      ^                                 ^
                                      |                                 |
                                receptors_itr()                output_neurons_itr()

Thus:
- All neurons are accessible via neurons() container and its iterators neurons().begin() -> neurons().end()
- all receptors are accessible via receptors_itr() -> effectors_itr() iterators
- all effectors (output neurons are the effectors too) are accessible via effectors_itr() -> neurons().end()
- all the output neurons are accessible via output_neurons_itr() -> neurons().end()

The first neuron is always a reserved neuron (a.k.a "the one") - it's a specially reserved empty receptor, all the effectors should have synapses towards this neuron and it's better not to mess up with it (as it will damage the NN ability to function properly); the effectors linkage to "the one" is maintained by the class itself and does not require any of overhead from the user

So, now that any of neurons can be accessed (via iterators), then following neurons methods exist to manage their synapses:

  • linkup_synapse(rpnnNeuron &): link synapse to a neuron by its address
  • grow_synapses(idx1, ..): link synapse(s) by their index in the std::list container (variadic arguments)
  • prune_synapses(idx1, ..): prune synapse(s) to neurons by its (neuron's) index (variadic arguments)

E.g., to link a first effector to the first receptor, it can be done either way:

    nn.effectors_itr()->linkup_synapse(*nn.receptors_itr());
    // or
    nn.effectors_itr()->grow_synapses(1);

Loading data patterns:

Loading data patterns is required at he learning phase and might be also used at post-training phase:

    load_patterns(..);

This method can be used to load up input patterns alone, or both input and & target patterns together. Before the training, it's used to load up both input and target data.

The method works with std::vector<std::vector<double>> container type, where outer container holds as many channels (inner containers) as there were configured receptors. Inner containers hold series of data for each (own) receptors - the size of all inner containers must be the same (otherwise an exception will be thrown: Rpnn::stdException::receptors_misconfigured)

  1. When input data container is passed as a r-value reference, then it's moved to internal class storage, e.g.:
        nn.load_patterns({{0, 0},{1, 1}});	// input patterns moved into nn's internal storage
    
    Otherwise (l-value passed), internal storage won't be used, instead receptors will link to respective channels:
    std::vector<std::vector<double>>
    	input_ptrn = {{0,1,0,1}, {0,0,1}};
    nn.load_patterns(input_ptrn);	// two receptors will be linked to respective containers of input_ptrn
    
  2. if normalization of input data is engaged (via normalize() call), then
    a) the method must be called prior calling load_patterns(..) (otherwise exception will be thrown: Rpnn::stdException::norm_engaged_after_inputs_loaded)
    b) no matter how you pass input data - it'll be copied into internal storage and normalized
    std::vector<std::vector<double>>
    	input_ptrn = {{0,1,0,1}, {0,0,1}};
    nn.normalize(-5, 5)			// optionally normalize() can accept min/max bounds
      .load_patterns(input_ptrn);	// input patterns copied into nn's internal storage and normalized
    
  3. Passing target patterns anyhow will always cause copying and normalizing of the target data: it's the class requirement to have output data always normalized, therefore they are always copied.
    std::vector<std::vector<double>>
    	input_ptrn = {{0,1,0,1}, {0,0,1}};
    nn.load_patterns(input_ptrn,	// inputs will be linked to the respective receptors
                     {{0,1,1,1}});	// targets will be copied & normalized into nn's inner storage
    

    Normalization bounds of targets is determined automatically from output neuron's transfer function (where it converges on + / - infinity)

Other configuration methods:

  • Rpnn::target_error(double err) - assign a target error for convergence when searching for a global minimum (GM); when GM does not exists, this value (together with global parameter BLM_RDCE define how long/hard the NN will try searching the best LM - the lower value the longer it will try. Default value is 0.01

  • Rpnn::lm_detection(size_t size) - engages LM trap detection, the argument defines the global error trail size. Typically the trail size must be greater than number of weights in the NN (synapse_count()), thus it's better to call with some factor, e.g.:

    nn.lm_detection(nn.synapse_count() * 3);
    

    Too short trail size won't be able to detect LM effectively, while too long will slow down convergence - thus, finding the right balance for a given type of problem solution migh require some research. Default trail size is 0 (i.e., LM trap detection is disabled)

  • Rpnn::stop_on_nan(bool) - instructs NN to stop convergence if global_error() (error calculated by cost function averaged by all output neurons) turns NaN. That typically happens when certain configurations (combination of parameters) are incompatible. By default it's on.

  • Rpnn::cost_function(double (*)(double, double)) - allows plugging an error function (c-style), which accepts 2 doubles and returns the error between them; Class provides 2 cost functions: Sse - sum squared error and Xntropy - cross entropy. Default is Sse, Xntropy typically would be used with Softmax activation at output neurons.

  • rpnnNeuron::transfer_func(double (*)(double, rpnnNeuron *)) - neuron's method which allows plugging a logistic function (c-style). The pluggable accepts two parameters:
    - double - implemented function's operand
    - rpnnNeuron * pointer to a calling neuron or nullptr (nullptr is only passed when the function is tested for its bounds) By default each neuron is equipped with Sigmoid transfer function.

  • normalize(double min, double max) - enables/disables inputs normalization. min == max disables normalization. Defaults are {-1, +1}

  • gpm(const std::string &param, double val) - lets setting Rpnn's generic parameters; the counterparts gpm(void) call allows reading the container with the default parameters (which is std::map<std::string, double>)

  • bouncer(rpnnBouncer &) - allows plugging an override of the default weight updater

  • converge(size_t epochs) - engages Rpnn to search for a solution with given data and configuration.

Weight updater

Weight updater is designed as a class (rather than a method) to allow overriding the default behavior (which is random weight assignment). The class allows twofold override of the default behavior:

  1. By creating and plugging a child class, which has ability to override following virtual methods:
    • reset(void): this method is called by Rpnn:converge(..) only once the NN is untrained
    • bounce(void): this method is called by Rpnn:converge(..) right after reset() and once LM trap detection is engaged and LM trap is detected - i.e., the method is called each time when NN's weights need to be bounced.
      Here's a synopsis of such override:
     Rpnn nn;
     blmFinder blm;			// child class of rpnnBouncer overriding the weight bouncing policy 
     // ...
      nn.bouncer(blm);			// plug the override
     // ...
    
  2. By plugging own std::function callable target into the rpnnBouncer. Here's a synopsis of such example:
    Rpnn nn;
    uniformBouncer ub;			// functor facilitating own weight update policy
    // ...
    nn.bouncer().weight_updater(ub); 	// now ub.operator() will be called whenever nn need to bounce weights
                                        // instead  
    // ...
    

Both methods could be used together at the same time.

Post training activation

Once the NN is trained to a desirable effect, then it becomes possible to activate it with the new (unseen) input data. This could be done in two ways:

  1. load the input patterns and then activate the NN pointing it to the pattern, e.g.:

    std::vector<std::vector<std::string>>
        input_ptrns = {{"A", "B"},{"B", "C"},{"D", "E"}};
    
    nn.load_patterns(input_ptrns);
    
    std::cout << "activating input ptrn #1: " << nn.activate(1).out() << std::endl;
    

    - in that example the (trained) NN was activated with the input pattern {"B", "C"}

  2. The input pattern can be passed directly to the activate() method:

    std::cout << "activating NN with pattern "D, E": " << nn.activate({"D", "E"}).out() << std::endl;
    

    - in this example, the activate() method loads the pattern values directly into the respective receptors and then activates the NN with the loaded data

The access to the NN activation result(s) may occurs in a couple ways:

  1. using Rpnn::out(size_t n = 0) - this method, illustrated in the above examples, reads the activation result from the first output neuron. In case there are multiple neurons the method accepts the index of the output neuron, e.g.: nn.out(1);

  2. neuron can be accessed first (via iterator, or using Rpnn::neuron(size_t) access method) and then the activation result can be read using rpnnNeuron::out() method:

    std::cout "activation result of the fist output neuron: " << nn.output_neurons_itr()->out() << std::endl;
    

SERDES interface

The Rpnn class is fully SERDES'able - it's easily serializable into a binary blob and deserializable (from the blob) into a target Rpnn object.
Naturally, the serialization would be required when Rpnn is trained and ready to be dumped into the file (or SQL database):

    std::ofstream file("rpnn.bin", std::ios::binary);		//open a file in a binary mode
 
    file << std::noskipws << Blob(nn);		// Blob class is part of the Rpnn header file

Reverse operation (i.e. restoration NN from the serialized blob) is equally easy:

    Blob b(std::istream_iterator<char>(std::ifstream{"rpnn.bin", std::ios::binary}>>std::noskipws),
           std::istream_iterator<char>{});	// read blob from the file

    Rpnn nn(b);					// deserialize - restore NN state from the blob
    // alternatively this could done sequentially:
    //    Rpnn nn;
    //    b.restore(nn);

For a detailed description of the Blob class, refer to the header Blob.hpp
For further details of Rpnn class refer to the header rpnn.hpp

Enhancement requests and/or questions are more than welcome: [email protected]