From an explanation I wrote here in 2012: "The simplest ANNs have just an input layer and an output layer with a defined threshold value. Basically, a single output
y ∈ {0,1} is a function (or, iterated function) of
n 2-valued inputs (
x1, x2...xn) of the "neuron", each with a weight
w ∈ {-1,1}. The output
y is a piecewise summation function of the weighted inputs such that if the result is greater than the threshold, the neuron "fires", and if not, it doesn't
Let
w represent a vector of
n weights and
x an input vector with
n elements. Then we have
y= 1 if
w(transpose)
x>= threshold value
&
= -1 if
w(transpose)
x< threshold value.
In reality, we'd have
y(t+1) [and the matrix formed by the
w-transpose *
x-transpose is a transformation on
t) because we're dealing with an iterated function, but the gist is still the same. Schematically:
You can't do much with this. But you can still do a lot even with a signum threshold function by adding other elements. A "simple" method which vastly increases the power of network schema above is the addition of another threshold function with an adaptive parameter of some sort. Instead of just a simple summation of weights, the linear combination
y (the output) becomes part of a larger summation function. This linear combiner not only takes the output as input, but is also a composite function of the input vector and some adaption function. For example:
Adding hidden layers, multiple outputs, etc., further increases the power and complexity of the network all without changing the binary threshold.
However, such networks are still limited by the binary nature of the threshold. Using an interval, rather than a 2-valued output, vastly improves the adaption process and consequently the power of the neural net. The adaption mechanism described earlier is limited in that one of it's arguments,
y (or the "output"), can only provide two values. Thus, no matter how complicated your adaptive algorithm is, the central mechanism changing the state of the network is a binary function. Replacing this with some sort of nonlinear function not only maps the output onto some interval in R (usually [-1,1] or [0,1]), allowing a dynamic threshold, but also greatly improves the network's capacity to adapt:
Here a nonlinear threshold function is "updated" using nonlinear adaption functions. However, there is still a threshold. You can store the threshold values within the weight matrix, as the initial output is the product of the weights and inputs. If the resulting value reaches the threshold, the activation function will adjust the network accordingly."