Somebody has had to of made this before. But before I get describing it I should tell you that it works.
So those of you who know how a multi-layer neural network works knows that there are layers of "neurons" connected by synapses. It starts with inputs, and some hidden layers, and then output to a layer considered the result.
Some of you may also know that sometimes the values of neurons inside are useful for tasks like self categorization. That is that if we wanted to determine how good an employee is based on certain behavior we might in some hidden layer have a neuron that encodes how likely the neural net thinks they are black. That might inform layers passed it ultimately to give us a useful result. That's what it means for something to be deep learning. Each layer develops an "understanding" of features that are increasingly more relevant to our output based on consistent patterns noticed in the input data.
Here is the problem with the current standard, and there are two ways to describe it. If a hidden feature is discovered in one layer of the network that is very obviously relevant and we have a very deep network then the network has to essentially develop a pass through from layer to layer for that variable without other training data pushing synapses in a way that disturb that pass through. Similarly if we had an input that was immediately relevant to our output without a lot of "deep understanding" being needed a pass through would have to form through the whole network. That wastes neurons and that wastes synapses as 90% of synapses going into our chain of neurons would need to be zero to reduce the relevance of other neurons not in the chain. The back propagation algorithm is going to drive our weights towards any direction but zero. Essentially we are asking the neural network to perform the hardest task to perform the simplest result if we want any depth.
The consequence is that either networks have to be very shallow or take a very long time to train. So the degree of deep learning available is limited unless you are google and have unlimited computation to do these trainings.
Mine on the other hand, every layer has for it's inputs all prior neurons including the original input neurons. The output layers has all other neurons as a consideration, including all "hidden features" and inputs directly. If they are useful to the output they will be used.
When I first thought of this I thought about making them rather narrow, like four or five. Some consideration on the matter has made me realize that taking it to the extreme is the best way to go. The most narrow layer is one. This gets us as deep as possible without as much cost. Depth in a standard network has cost because of the pass through issue but because we don't have that problem we should go for depth as quick as possible. To the extent that the problem doesn't require depth each neuron in a extremely deep network is simultaneously shallow with respect to the inputs and outputs if that is more appropriate.
In terms of computation cost being narrow isn't so bad. The whole model can be expressed in a single line of purely mathematical code. It's just rather long. What I mean to say is there is no looping involved. Because I'm using nodejs there are no parallel utilities for math (matrix operations) anyway so in that context there is zero cost in loosing matrix operations.
You really don't need as many neurons doing it this way.
Let's take a simple problem with two inputs and one output. The first layer of weights will be one for each neuron plus a bias. That's three. The next layers of weights will take be a total of four. Both inputs, plus the prior neuron, plus a bias. The next one will take five.
That would be a total of 12 weights. That's actually a lot of flexibility to do a regression with, especially considering they each have more direct capacity to effect our output and we aren't wasting weights attempting to not impact a chain of neurons because we need a pass through. With the kind of unique activation function being used you can model almost anything with that. Keep in mind it is very easy to up it to sixty or thousands.
Each neuron is simply grabbing the most useful and most immediately inferable information out of the inputs and prior neurons. To notice one useful thing about some data and take note of it is not a lot of work. And that's what humans do. We first pick up on one simple detail about some data, note it, and use it going forward. Each neuron just figures out the next most important thing that can be solved at that depth. When humans look at data it is not uncommon that that human is only able to operate with about 7 facts about that data, and that's if we are performing well.
Standard neural nets only really detect about 7 or so hidden features. They just take a shotgun approach to it.
The other unique aspect is that it has a different kind of activation function. It just uses a reservable natural log. This makes it unbiased towards negative and positive numbers and doesn't restrict you to thinking about data being represented by ranges from 0 to 1 like a logistic function does.
Anyways I'm looking for problems to through at it. I don't think it's the most effective at taking advantage of hardware but I think the limits on the depth of problem it can solve are greater. I also think it will require less specialized knowledge to use because the activation function has broader abilities and with standard neural nets there is a lot of intuition that has to be used to think about how a certain topologies can't solve certain problems that I think will be less of an issue here. Also because of the standard neural networks love-hate relationship with depth a lot of though has to go into how data is represented as an input to lower the depth of problem. Maybe that will be less of a problem and therefor be less of a hassle to use.
(post is archived)