NotesFAQContact Us
Search Tips
Back to results
ERIC Number: ED567498
Record Type: Non-Journal
Publication Date: 2014
Pages: 130
Abstractor: As Provided
ISBN: 978-1-3038-1582-9
Newton Methods for Large Scale Problems in Machine Learning
Hansen, Samantha Leigh
ProQuest LLC, Ph.D. Dissertation, Northwestern University
The focus of this thesis is on practical ways of designing optimization algorithms for minimizing large-scale nonlinear functions with applications in machine learning. Chapter 1 introduces the overarching ideas in the thesis. Chapters 2 and 3 are geared towards supervised machine learning applications that involve minimizing a sum of loss functions over a dataset, but respectively apply to the general scenarios of either minimizing a stochastic convex function or a convex function with an L1 regularizer. Chapter 4 discusses efficient implementations of projected Newton methods for nonnegative tensor factorization. Chapter 2 outlines a new stochastic quasi-Newton algorithm that incorporates second order information through the L-BFGS approximation of the Hessian. The method's novel element comes from using subsampled Hessian-vector products and averaging to define the L-BFGS curvature pairs. Numerical results on a speech and text classification problem demonstrate the effectiveness of this new algorithm over stochastic gradient descent. Chapter 3 presents a new active set method for minimizing a convex function with an L1 regularizer. The algorithm follows a two phase approach: an active-set prediction phase that employs first-order and second-order information, and a subspace phase that performs a Newton-like step using sub-sampled Newton-CG. The novelty of the algorithm comes from using an iterative shrinkage step in the active-set phase and a projected piece-wise linear line search in the subspace phase. The new algorithm is compared against a state-of-the-art orthant-wise limited memory algorithm on a speech classification problem. The fourth chapter concerns nonnegative tensor/matrix factorization with a Kullback-Leibler objective. All presented algorithms start from an alternating block Gauss-Seidel framework and formulate each block subproblem as a sum of independent row functions that only depend on a subset of variables. Minimization of the block problem is executed by the independent minimization of each row function, which is a strictly convex function with nonnegativity constraints. The conclusion is that applying two-metric gradient projection techniques with exact or approximate Hessian information to each of the independent row functions is much more effective than applying the same algorithms directly to the block subproblem. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page:]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site:
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A