This time I tried a few variations. The first one I produced made a output like this:
one (pic8.co)
two (pic8.co)
three (pic8.co)
four (pic8.co)
The second one with only different random initial weights produced this:
https://pic8.co/sh/uDQ04b.png https://pic8.co/sh/Y49JZH.png
The third one I set the initial weights to zero. It fit the data less well, but is more smooth.
https://pic8.co/sh/GEnNAX.png https://pic8.co/sh/T8Drjq.png
I you want to graph it I have some data:
one (jssocial.pw)
two (jssocial.pw)
three (jssocial.pw)
I'm getting a large amount of variance in how well it fits based on the initial weights and or the way the training is shuffled. But clearly starting the weights at zero to remove that variance isn't the answer because it had 10x the error on it's second attempt and 100x the error on the first. The one you see there is the second run. I guess then that means that the training order is a source too.
Edit: I discovered a horrible error.
This is AND (pic8.co)
data (jssocial.pw)
This is XOR (pic8.co)
data (jssocial.pw)
Playing with -1 and 1 as outputs (pic8.co)
data (jssocial.pw)
Some really curious stuff. Playing with per weight learning rate based on if the delta of it oscillates or continues in the same direction per training.
Xor when only reducing per weight learning rate (pic8.co)
Xor when per weight learning rate can expand (pic8.co)
The last one is weird. The whole thing is more smooth (meaning less over fitting which makes sense because doing this is similar to pruning), but then their is one dot by itself at zero,zero, which is similar to something you see with super over fitting. Not good, but interesting.
(post is archived)