Phys. Rev. E 52, 2878–2886 (1995)On-line versus off-line learning in the linear perceptron: A comparative studyReceived 17 March 1995; published in the issue dated September 1995 The spherical perceptron with N inputs and a linear output does not present optimal generalization if trained by minimization of the standard quadratic cost function E=1/2 Jμ=1αN (bμ-hμ)2, where bμ and hμ are the outputs from the rule (teacher) and hypothesis (student) networks for the example μ and there are αN examples. We derive an optimal algorithm for on-line learning of examples which outperforms the iterative (off-line) standard algorithm for α up to 0.71. The on-line optimized algorithm suggests a class of cost functions for off-line learning, which we then proceed to study using the replica method. The optimized cost function within that class has the suggestive form E=αN[Γ(1/αN) Jμ=1αN [-lnP(bμ‖hμ)]-Γ lnZ], where Z is a normalization constant, P(bμ‖hμ) is the conditional probability of the output data bμ given the hypothesis output hμ, and Γ is a learning parameter analogous to a temperature which decreases in a well defined manner along the learning process. © 1995 The American Physical Society URL:
http://link.aps.org/doi/10.1103/PhysRevE.52.2878
DOI:
10.1103/PhysRevE.52.2878
PACS:
87.10.+e, 02.50.-r, 05.90.+m
|
