A Nonmonotone ADMM-Based Diagonal Quasi-Newton Update with Application to the Compressive Sensing Problem

. Considering a minimization problem according to the Byrd-Nocedal measure function together with the secant equation, a diagonal quasi-Newton updating formula is suggested. To find the optimal elements of the updating matrix, the well-known algorithm of the alternating direction method of multipliers (ADMM) is employed. Moreover, convergence analysis is conducted based on a modified non-monotone Armijo line search incorporating the simulated annealing strategy. Lastly, performance of the method is numerically tested on a set of CUTEr functions and on a smooth transcendental approximation of the compressive sensing problem. Across the computational spectrum, the given method turns out to be successful.


Introduction
Quasi-Newton (QN) algorithms demonstrate potential competence for solving unconstrained optimization problems.So, enriching their efficiency has triggered the interest of optimization scholars.In particular, QN methods are efficient because under standard circumstances, the algorithms generate descent search directions and have reasonable global as well as local superlinear convergence features [37].
In the QN methods, starting with some positive definite matrices, successive estimations of the (inverse) Hessian are updated to guarantee the secant (QN) equation [37].To make the generated matrix approximations well-conditioned, the scaled QN updates have been developed based on the eigenvalue analyses [33,34].Moreover, to adjust the methods for large-scale problems, memoryless QN techniques have been proposed with significantly reduced calculations [8].
Recently, the QN methods have been much heeded in practical applications such as image processing, time series prediction, neural networks training, document categorization, managing demands in the water distribution networks, machine learning, robotics, solving systems of nonlinear equations, curve fitting by B-splines, matrix approximation in Frobenius norm, computation of the matrix geometric mean and estimating unitary symmetric eigenvalues of the complex tensors; for more details see [11] and references therein.The methods have also been well-combined with classical optimization tools such as conjugate gradient methods [12] as well as metaheuristic algorithms [32].
Real-world applications of the QN algorithms motivated the researchers to ardently improve efficiency of the methods.Such attempts can be mainly categorized into modifying the secant equation to approximate the curvature more accurately [36] or to achieve convergence without convexity assumptions [30], improving the scaling scheme [10], and structuring the updating formulas to solve special problems such as nonlinear least squares models [1].
As an underpinning improvement devised to address big data models, researchers were eager to find diagonal matrices with positive diagonal elements to approximate the Hessian.Such efforts were principally begun by Zhu et al. [40] via considering various versions of the secant equation.Hassan et al. [27] developed another diagonal matrix based on the Barzilai-Borwein approach [14].Leong et al. [29] advanced the issue by giving some effective diagonal QN matrices that preserve the positive definiteness.In another attempt, Andrei [6] proposed a diagonal QN update by minimizing the measure function of Byrd and Nocedal [17], based on forward and central finite differences [7].A family of diagonal QN updates was suggested in [3] in accordance to the DFP (Davidon-Fletcher-Powell) and the BFGS (Broyden-Fletcher-Goldfarb-Shanno) updating formulas.A tridiagonal Hessian approximation has also been developed in [9].
Here, in an attempt to deliver progress in the diagonal Hessian approximations' scope, we mainly focus on the following issues in the given order: 1. Finding a diagonal QN updating formula is becoming pervasive, due to tying to the big data minimization problems.By the same token, Andrei [6] presented a minimization subproblem taking the weak secant condition into account to find elements of the Hessian matrix approximation.However, the weak secant equation may be far from the original secant equation.So, in a direct response to this issue, the least squares approach is employed to find an appropriate minimization model of the diagonal approximation of the Hessian matrix.
2. For solving the given subproblem and consequently, finding the optimal elements of the Hessian estimation effectively, an ADMM approach is applied.
3. To establish the global convergence of the proposed QN method without convexity assumption and what's more to upgrade the performance of the line search strategy, an improved version of the nonmonotone line search of [39] is developed.To do so, we incorporate it by a simulated annealing technique.
4. We evaluate performance of the proposed method on the CUTEr test functions and the compressive sensing problem for which a smooth approximation model is rendered as well.
Mainstream theme of Section 2 is comprised of dealing with a diagonal QN updating formula by proposing an efficient minimization problem and solving it with the ADMM technique.Analyzing the global convergence according to a modification of the nonmonotone line search of [39] incorporated by the simulated annealing (SA) strategy is carried out in Section 3. Conducting practical tests is performed in Section 4.

A diagonal ADMM-based quasi-Newton update
QN iterations for finding the solution of the unconstrained optimization problem min are generally generated via initiated by some x 0 ∈ R n , with s k = α k d k where d k is a descent direction and the step size α k > 0 is determined by a line search along d k [35].In QN methods, we have where Especially, in diagonal matrix estimation of the Hessian, it needs to find appropriate diagonal components B ki , i = 1, ..., n, to obtain B k as follows: Associated with the measurement of the well-conditioning of a positive definite matrix A ∈ R n×n , Byrd and Nocedal [17] suggested the following function: which has been essentially employed in the convergence analysis of the QN algorithms.Taking (2.4) into account, Andrei [6] dealt with the following minimization problem: min in which D is the set of all diagonal n × n matrices.The constraint of (2.5) is called the weak secant equation [20], being a scalar (relaxed) version of the standard secant equation B k s k = y k .The model (2.5) yields the following formula for the elements of B k : where λ is the Lagrange multiplier computed from an extended conjugacy condition [19] as follows: where t > 0 is the Dai-Liao parameter [19].In another attempt, Babaie-Kafaki et al. [11] suggested the following penalized version of (2.5): min where ∥.∥ stands for the ℓ 2 norm.By this model, doing some arithmetic we obtain which are positive and well-defined for s ki ̸ = 0.As known, ADMM has been proven to be a powerful approach to address a wide range of practical fields, particularly machine learning and signal processing models [38].Stimulated by merits of the ADMM approach, we intend to broaden the model (2.6) for finding an appropriate diagonal approximation for the Hessian of the cost function, in the sense of min for which the augmented Lagrangian function [35] can be rendered as where τ ∈ R n is the vector of Lagrange multipliers (dual variables), B and C respectively denote the vectors which contain the diagonal elements of B and C, and µ k > 0 is a constant.The ADMM-based technique finds the solution of (2.7) by executing the next process in each iteration: where ρ ≥ 1 is a constant.
To give some details for the solution process of (2.8) and (2.9), we have in view the following componentwise definition of (2.7): Therefore, answer of the mentioned problems can be obtained exactly by solving the equations ∂L ∂Bi = 0 and ∂L ∂Ci = 0, for i = 1, ..., n, yielding Note that for arbitrary constants a and b ̸ = 0, we have a < √ a 2 + b 2 .Thus, if we let a = µ k C k,i +τ k+1,i −1 and b = √ 4µ k , then from (2.10) we have B k+1,i > 0 which ensures the matrix B k is positive semidefinite and consequently, the search direction (2.3) is nonincreasing.

A nonmonotone line search technique with the simulated annealing strategy
Imposing monotone reduction on the successive function values in the classical iterative schemes for solving (2.1) may cause losing the efficiency when an iteration confines close to the bottom of a narrow curved valley of the cost function.Actually, a monotone scheme is forced to creep along the valley's floor, making too short steps or even undesired zigzagging trajectories [39].As a remedy, scholars put their best efforts into developing nonmonotone schemes which guarantee the global convergence.One of the pioneering nonmonotone schemes was proposed by Zhang and Hager [39], which specifies the smallest integer h ≥ 0 fulfilling where γ ∈ (0, 1) is the Armijo constant and ), the parameter η k ∈ [η min , η max ] with the constants 0 ≤ η min ≤ η max ≤ 1 handles the nonmonotonicity measure and In a backtracking scheme, the step size α k is the largest member of {σ j a} j≥0 with a > 0 and σ ∈ (0, 1).As a well-known classic metaheuristic algorithm, SA is a probabilistic technique for approximating the global optimum of an optimization model [15], being efficient practically and well-developed theoretically.It originates from the impact of effective control of the temperature in the annealing procedure of a substance.With a view of improving performance of the nonmonotone scheme, here we incorporate the line search strategy (3.1) with the SA algorithm.As a matter of fact, taking inspiration from SA, the line search accepts the iterate x k+1 with some probability specified by the temperature parameter even when it does not guarantee the condition (3.1).The probability of accepting the iterates is gradually decreased by decreasing the temperature during execution to enhance the exact exploration capabilities of the algorithm near the optimal solution.To provide a detailed description of the nonmonotone Armijo line search with the SA technique, the iterate for an integer ϑ ≥ 1, and and T k > 0 is the temperature parameter.Hence, the incorporative line search can be presented as follows: in which T k = θ k T 0 with θ ∈ (0, 1), and T 0 is the initial temperature, and also, with F 0 = f (x 0 ) and the parameter q k is defined by (3.2).Now, we are in a position to spell out our algorithm.
Step 2: Update the QN matrix B k by (2.10).
Step 2: Compute the search direction d k by (2.3).
To establish global convergence of the iterative method (2.2) with the backtracking line search satisfying (3.4), hereafter, we suppose that the following assumption holds.Assumption 1. (i) The cost function f is coercive.(ii) The gradient of f is Lipschitz continuous; that is, there exists a positive constant L such that ∥g(x) − g(y)∥ ≤ L∥x − y∥, ∀x, y ∈ R n .
(iii) The search direction d k satisfies the following conditions for some positive constants c 1 and c 2 : ) Lemma 1.The backtracking line search strategy based on (3.4) is well-defined.
Proof.Firstly, we define A k+1 : R + → R, given by , for all t ≥ 0. Particularly, by taking t = η k q k into account, By contrary, assume that for all j, thanks to T k ln r k ≤ 0, Hence, there exists ϖ ∈ (0, 1) so that Now, when k tends to infinity, (1 − δ)g T k d k > 0, which violates (3.5).In the following, we find the constant α > 0 such that α k ≥ α, for all k ≥ 0.
Exploiting the Cauchy-Schwarz inequality and the mean value theorem, we have By setting α = α k σ in (3.8) and applying the fact that from (3.5) and (3.6) we get Therefore, which completes the proof.⊓ ⊔ In light of the convergence analysis conducted in [22,39], now we should deal with the following theorem.

Numerical experiments
To uphold the theoretical results, here we compare performance of the DQ-NADMM algorithm with the two well-known QN methods, namely, the modified memoryless BFGS (MMLBFGS) method and the modified limited memory BFGS (MLMBFGS) method, both devised based on the modified secant condition proposed in [30].It can be observed that MMLBFGS and MLMBFGS generate descent directions regardless of the line search.Furthermore, we consider the diagonal QN methods proposed in [6], [11] and [26] in our comparisons, here respectively named by DQNBN1, DQNBN2 and DQNMSG.For DQNADMM, we set ρ = 10 and µ 0 = 1 in (2.9).The scaling parameter of MSMLBFGS and MSLMBFGS was tuned by υ k = min 10 10 , max 10 −10 , ∥s k ∥/∥y k ∥ , as suggested in [33].Information of the hardware and software applied in the implementations have been provided in [2].Besides, line search has been performed employing nonmonotone backtracking Armijo condition (3.4) using a = 1, γ = 10 −4 , σ = 0.85, T 0 = ∥g k ∥ and ϑ = 2.The algorithms were ended up to an iterate satisfying k > 10000 or ∥g k ∥ < 10 −5 (1 + |f k |).To compare efficiency of the algorithms, the CPU time (CPUT) and the total number of function and gradient evaluations (TNF), introduced in [25], have been assessed using the Dolan-Moré (DM) technique [21], following the notations of [2].The test problems data including 147 functions of the CUTEr library [23] is provided in Table 1.
The results of comparisons are illustrated by Figure 1 upon which DQ-NADMM decidedly outperforms the others with respect to the running time, while considering TNF into account, DQNADMM and DQNMSG are competitive and both of them are preferable to the other algorithms.
To assess the efficiency of the given nonmonotone line search technique, we investigated performance of DQNADMM together with the nonmonotone line search technique (3.4) and the nonmonotone approaches proposed in [39] and [24].The corresponding methods are here respectively called DQNADMM-NMi, for i = 1, 2, 3.The results have been shown in Figure 2. According to Figure 2(a), DQNADMM-NM1 is preferable to the others with respect to TNF.Nevertheless, from Figure 2(b), it can be concluded that performing extra  We also render an application of the presented method for the compressive sensing (CS) problem as a practical issue.The CS problem is a rapidly growing field that has picked up considerable attention in a broad-ranging of scientific areas.CS addresses a framework for simultaneous sensing and compression of finite-dimensional vectors.To be more specific, CS exploits the sparse signal by solving an underdetermined linear system in the following unified formulation: with the sampling matrix ψ ∈ R m×n (m ≪ n), and the measurement vector y ∈ R m [16], where ∥.∥ 1 represents the ℓ 1 norm and also, the regularization parameter ς > 0 keeps the balance between the sparsity and the measurements coupled with the additive white Gaussian noise.Smoothing strategies, as a remedy to combat nonsmoothness of the ℓ 1 regularizer, are one of the methods for solving problem (4.1) [28,31].Smooth approximations for optimization problems have been scrutinized for decades including complementarity problems, variational inequalities, secondorder cone complementarity problems, semidefinite programming, semi-infinite programming, optimal control, eigenvalue optimization, penalty methods and mathematical programs with equilibrium constraints; see [18] and references therein, for instance.The smoothing techniques are advantageous thanks to the rich theory and powerful methods provided for continuously differentiable cost functions, and the guarantee for finding a local minimizer or stationary point.
Recently, Bagul [13] proposed the following smooth transcendental approximation of |x|: where the small constant ν > 0 is called smoothing parameter.As established in Theorem 1 of [13], ||x| − φ(x)| < ν.Hence, the CS cost function can be approximated by which is (smooth and) solvable via the given algorithm.We generated the random test functions by choosing the signal dimension equal to n = 2 13 and the sampling matrix ψ as the Hadamard matrix, utilizing the approach of [4].The initial point has been set as 0 ∈ R n .We also set ς = max(0.001∥ψT y∥ ∞ , 2 −8 ) in (4.1) and ν = 0.001 in (4.2).To make a meaningful comparison on the performance of the methods, we scrutinized the outputs with respect to relative error (RelErr) [4]

Conclusions
To take advantage of the significant aspects of the diagonal quasi-Newton updates for handling the big data optimization models, inspired by [6], we have proposed a minimization problem founded upon the popular Byrd-Nocedal measure function as well as the secant equation.The proposed minimization problem has been solved utilizing the alternating direction method of multipliers strategy.Besides, we have addressed convergence of the algorithm using the classic nonmonotone line search of [39] attached by the simulated annealing algorithm.To evaluate the effect of our theoretical arguments, we performed some computational tests using a class of problems of the CUTEr library.Results showed that the given method is computationally promising.
A special numerical experiment has been also performed on the well-known compressive sensing problem for which a smooth transcendental function has been employed to approximate the ℓ 1 regularizer term.The outputs showed that our algorithm is capable of delivering progress in sparse signal recovery.
. Results are shown by Figure 3.In the figure, the initial and the noisy signals have been pictured by subfigures (a) and (b), respectively.Also, reconstructed signals (marked by the red circles) obtained by DQNADMM, MMLBFGS, MLMBFGS, DQNBN1, DQNBN2 and DQNMSG over the initial signals (marked by the blue endpoints) have been pictured by subfigures (c)-(h), respectively.The figure reveals that DQNADMM, DQNBN2 and MLMBFGS work better than the other techniques.

Table 1 .
Test problems data