Well, you could use an EA to take a stab at finding better minima :) And correct...

chestervonwinch · on May 26, 2014

Mostly, no. Hidden units introduce non-convexity to the cost. How bout a simple counter-example?

Take a simple classifier network with one input, one hidden unit and one output and no biases. To make things even simpler, tie the two weights, i.e. make the first weight equal to the second. Now, mathematically the output of the network can be written: z=f(w * f(w * x)) where f() is the sigmoid.

Next, consider a dataset with two items: [(x_1, y_1), (x_2, y_2)] where x_i is the input and y_i is the class label, 0 or 1. Take as values: [(0.9, 1), (0.1,0)]. The cost function (loglikelihood in this case) is:

L(w) = sum_i { y_i * log( f(w * f(w * x_i)) ) + (1-y_i) * log( 1-f(w * f(w * x_i)) ) }

or

L(w) = log( f(w * f(w * 0.9)) ) + log( 1-f(w * f(w * 0.1)) )

Plot that last guy replacing f with the sigmoid, and you'll see the result is non-convex - there's a kink near zero.