Nonsmooth Optimization (NSO)
Introduction to Nonsmooth Optimization
Nonsmooth optimization (NSO) refers to the general problem of minimizing (or maximizing) functions that are typically not differentiable at their minimizers (maximizers). Since the classical theory of optimization presumes certain differentiability and strong regularity assumptions upon the functions to be optimized, it can not be directly utilized. However, due to the complexity of the real world, functions involved in practical applications are often nonsmooth. That is, they are not necessarily differentiable. In what follows, we briefly introduce the basic concepts of nonsmooth analysis and optimization. For more details we refer to [1,2,3,4,5,6],7] and references therein.
Let us consider the NSO problem of the form
where the objective function
is supposed to be
locally Lipschitz continuous
on the feasible set
.
Note that no differentiability or convexity assumptions are made.
The simplest example of nonsmooth function is the absolute-value function on reals.
NSO problems arise in many fields of applications, for example in
- image denoising,
- optimal control,
- neural network training,
- data mining,
- economics and
- computational chemistry and physics.
Moreover, using certain important methodologies for solving difficult smooth (continuously differentiable) problems leads directly to the need to solve nonsmooth problems, which are either smaller in dimension or simpler in structure. This is the case, for instance in
- decompositions,
- dual formulations and
- exact penalty functions.
Finally, there exist so called stiff problems that are analytically smooth but numerically nonsmooth. This means that the gradient varies too rapidly and, thus, these problems behave like nonsmooth problems.
There are several approaches to solve NSO problems (see, e.g., [2]). The direct application of smooth gradient-based methods to nonsmooth problems is a simple approach but it may lead to a failure in convergence, in optimality conditions, or in gradient approximation [1]. All these difficulties arise from the fact that the objective function fails to have a derivative for some values of the variables. The following figure demonstrates the difficulties that are caused by nonsmoothness.
Difficulties caused by nonsmoothness
Smooth problem | Nonsmooth problem |
---|---|
|
|
|
|
|
|
|
On the other hand, using some derivative free method may be another approach but standard derivative free methods like genetic algorithms or Powell's method may be unreliable and become inefficient as the dimension of the problem increases. Moreover, the convergence of such methods has been proved only for smooth functions. In addition, different kind of smoothing and regularization techniques may give satisfactory results in some cases but are not, in general, as efficient as the direct nonsmooth approach [4]. Thus, special tools for solving NSO problems are needed.
Methods for solving NSO problems include subgradient methods (see e.g. [6]), bundle methods (see e.g. [4]), and gradient sampling methods (see e.g. [1]). All of them are based on the assumption that only the objective function value and one arbitrary subgradient (generalized gradient [2]) at each point are available.
The basic idea behind the subgradient methods is to generalize smooth methods by replacing the gradient with an arbitrary subgradient. Due to this simple structure, they are widely used NSO methods, although they may suffer from some serious drawbacks (this is true especially with the simplest versions of subgradient methods) [3]. An extensive overview of various subgradient methods can be found in [6].
At the moment, bundle methods are regarded as the most effective and reliable methods for NSO. They are based on the subdifferential theory developed by Rockafellar [5] and Clarke [2], where the classical differential theory is generalized for convex and locally Lipschitz continuous functions, respectively. The basic idea of bundle methods is to approximate the subdifferential (that is, the set of subgradients) of the objective function by gathering subgradients from previous iterations into a bundle. In this way, more information about the local behavior of the function is obtained than what an individual arbitrary subgradient can yield (cf. subgradient methods).
The newest approach is to use gradient sampling algorithms
developed by Burke, Lewis and Overton. The gradient
sampling method is a method for minimizing an objective function that
is locally Lipschitz continuous and smooth in an open dense subset
of
.
The objective may be nonsmooth and/or nonconvex. Gradient sampling
methods may be considered as a stabilized steepest descent
algorithm. The central idea behind these techniques is to
approximate the subdifferential of the objective function through
random sampling of gradients near the current iteration point. The
ongoing progress in the development of gradient sampling algorithms
suggests that they have potential to rival bundle methods in the terms
of theoretical might and practical performance.
Note that NSO techniques can be successfully applied to smooth problems but not vice versa [3] and, thus, we can say that NSO deals with a broader class of problems than smooth optimization. Although using a smooth method may be desirable when all the functions involved are known to be smooth, it is often hard to confirm the smoothness in practical applications (e.g. if function values are calculated via simulation). Moreover, as already mentioned, the problem may be analytically smooth but still behave numerically nonsmoothly, in which case an NSO method is needed.
Some NSO software and NSO software links can be found here. For more details on various NSO methods see [2].
Nonsmooth Analysis
The theory of nonsmooth analysis is based on convex analysis. Thus, we start by giving some definitions and results for convex (not necessarily differentiable) functions. We define the subgradient and the subdifferential of a convex function as they are defined in [5]. Then we generalize these results to nonconvex locally Lipschitz continuous functions.
Convex Analysis
Definition. The subdifferential of a convex function
at
is the set
of vectors
such that
Each vector
is called a subgradient of
at
Example — Absolute-value function
Function
is clearly convex and differentiable when
By the definition of subdifferential
Theorem. Let
be a convex function. Then the classical directional derivative
exists in every direction
and it satisfies
The next theorem shows the relationship between the subdifferential and the
directional derivative. It turns out that knowing
is equivalent to knowing
.
Theorem. Let
be a convex function. Then for all
Nonconvex Analysis
Since classical directional derivatives do not necessarily exist for locally Lipschitz continuous functions, we first define a generalized directional derivative. We then generalize the subdifferential for nonconvex locally Lipschitz continuous functions.
Definition (Clarke). Let
be a locally Lipschitz continuous function at
The generalized directional derivative of
at
in the direction
is defined by
Note that this generalized directional derivative always exists for
locally Lipschitz continuous functions and, as a function of
,
it is
sublinear.
Therefore, we can now define the subdifferential for
nonconvex locally Lipschitz continuous functions as analogous to the
previous theorem (2.) with the directional derivative replaced by the
generalized directional derivative.
Definition (Clarke). Let
be a locally Lipschitz continuous function at a point
Then the subdifferential of
at
is the set
of vectors
such that
Each vector
is called a subgradient of
at
.
Theorem (Rademacher). Let
be an open set. A function
that is locally Lipschitz continuous on
is differentiable almost everywhere on
.
Theorem. Let
be a locally Lipschitz continuous function at a point
Then
where
denotes the convex hull of set
The next list summarizes some properties of the subdifferential both in convex and nonconvex cases:
-
The subdifferential
for a convex function
is a nonempty, convex, and compact set such that
where
is an open ball with center
and radius
and
is the Lipschitz constant of
at
-
The subdifferential
for a for locally Lipschitz continuous function
is a nonempty, convex, and compact set such that
where
is the Lipschitz constant of
at
Moreover,
for all
-
The subdifferential for locally Lipschitz continuous functions is a generalization of the subdifferential for convex functions: If
is a convex function, then
for all
and
.
-
The subdifferential for locally Lipschitz continuous functions is a
generalization of the classical derivative: If
is both locally Lipschitz continuous and differentiable at
then
. If, in addition,
is continuously differentiable at
then
.
With the last point in mind, we can finally finish the whole subdifferential of the absolute-value function.
Nonsmooth Optimization: Theory
Finally, we present some results that connect the theories of nonsmooth analysis and optimization. The necessary conditions for a locally Lipschitz continuous function to attain its local minimum in an unconstrained case are given in the next theorem. For convex functions these conditions are also sufficient and the minimum is global.
Theorem Let
be a locally Lipschitz continuous function at
If
attains its local minimal value at
then
Theorem If
is a convex function, then the following conditions are equivalent:
Definition
A point
satisfying
is called a critical or a stationary point of
References
- A. Bagirov, N. Karmitsa, M.M. Mäkelä "Introduction to Nonsmooth Optimization: Theory, Practice and Software." Springer, 2014.
- A. Bagirov, M. Gaudioso, N. Karmitsa, M.M. Mäkelä, S. Taheri (Eds.) "Numerical Nonsmooth Optimization: State-of-the-Art Algorithms." Springer, 2020.
- F.H. Clarke: "Optimization and Nonsmooth Analysis", Wiley-Interscience, New York, 1983.
- C. Lemarechal: "Nondifferentiable Optimization", in Optimization (G.L. Nemhauser, A.H.G. Rinnooy Kan, and M.J. Todd, Eds.), p. 529-572, Elsevier North-Holland, Inc., New York, 1989.
- M.M. Mäkelä and P. Neittaanmäki: "Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control", World Scientific Publishing Co., Singapore, 1992.
- R.T. Rockafellar: "Convex Analysis", Princeton University Press, Princeton, New Jersey, 1970.
- N.Z. Shor: "Minimization Methods for Non-Differentiable Functions", Springer-Verlag, Berlin, 1985.