Nonsmooth Optimization (NSO)
Introduction to Nonsmooth Optimization
Nonsmooth optimization (NSO) refers to the general problem of minimizing (or maximizing) functions that are typically not differentiable at their minimizers (maximizers). Since the classical theory of optimization presumes certain differentiability and strong regularity assumptions upon the functions to be optimized, it can not be directly utilized. However, due to the complexity of the real world, functions involved in practical applications are often nonsmooth. That is, they are not necessarily differentiable. In what follows, we briefly introduce the basic concepts of nonsmooth analysis and optimization. For more details we refer to [1,2,3,4,5,6] and references therein.
Let us consider the NSO problem of the form
where the objective function is supposed to be locally Lipschitz continuous on the feasible set . Note that no differentiability or convexity assumptions are made.
The simplest example of nonsmooth function is the absolutevalue function on reals.
NSO problems arise in many fields of applications, for example in
 image denoising,
 optimal control,
 neural network training,
 data mining,
 economics and
 computational chemistry and physics.
Moreover, using certain important methodologies for solving difficult smooth (continuously differentiable) problems leads directly to the need to solve nonsmooth problems, which are either smaller in dimension or simpler in structure. This is the case, for instance in
 decompositions,
 dual formulations and
 exact penalty functions.
Finally, there exist so called stiff problems that are analytically smooth but numerically nonsmooth. This means that the gradient varies too rapidly and, thus, these problems behave like nonsmooth problems.
There are several approaches to solve NSO problems. The direct application of smooth gradientbased methods to nonsmooth problems is a simple approach but it may lead to a failure in convergence, in optimality conditions, or in gradient approximation [1]. All these difficulties arise from the fact that the objective function fails to have a derivative for some values of the variables. The following figure demonstrates the difficulties that are caused by nonsmoothness.
Difficulties caused by nonsmoothness
Smooth problem  Nonsmooth problem 








On the other hand, using some derivative free method may be another approach but standard derivative free methods like genetic algorithms or Powell's method may be unreliable and become inefficient as the dimension of the problem increases. Moreover, the convergence of such methods has been proved only for smooth functions. In addition, different kind of smoothing and regularization techniques may give satisfactory results in some cases but are not, in general, as efficient as the direct nonsmooth approach [4]. Thus, special tools for solving NSO problems are needed.
Methods for solving NSO problems include subgradient methods (see e.g. [6]), bundle methods (see e.g. [4]), and gradient sampling methods (see e.g. [1]). All of them are based on the assumption that only the objective function value and one arbitrary subgradient (generalized gradient [2]) at each point are available.
The basic idea behind the subgradient methods is to generalize smooth methods by replacing the gradient with an arbitrary subgradient. Due to this simple structure, they are widely used NSO methods, although they may suffer from some serious drawbacks (this is true especially with the simplest versions of subgradient methods) [3]. An extensive overview of various subgradient methods can be found in [6].
At the moment, bundle methods are regarded as the most effective and reliable methods for NSO. They are based on the subdifferential theory developed by Rockafellar [5] and Clarke [2], where the classical differential theory is generalized for convex and locally Lipschitz continuous functions, respectively. The basic idea of bundle methods is to approximate the subdifferential (that is, the set of subgradients) of the objective function by gathering subgradients from previous iterations into a bundle. In this way, more information about the local behavior of the function is obtained than what an individual arbitrary subgradient can yield (cf. subgradient methods).
The newest approach is to use gradient sampling algorithms developed by Burke, Lewis and Overton. The gradient sampling method is a method for minimizing an objective function that is locally Lipschitz continuous and smooth in an open dense subset of . The objective may be nonsmooth and/or nonconvex. Gradient sampling methods may be considered as a stabilized steepest descent algorithm. The central idea behind these techniques is to approximate the subdifferential of the objective function through random sampling of gradients near the current iteration point. The ongoing progress in the development of gradient sampling algorithms suggests that they have potential to rival bundle methods in the terms of theoretical might and practical performance.
Note that NSO techniques can be successfully applied to smooth problems but not vice versa [3] and, thus, we can say that NSO deals with a broader class of problems than smooth optimization. Although using a smooth method may be desirable when all the functions involved are known to be smooth, it is often hard to confirm the smoothness in practical applications (e.g. if function values are calculated via simulation). Moreover, as already mentioned, the problem may be analytically smooth but still behave numerically nonsmoothly, in which case an NSO method is needed.
Some NSO software and NSO software links can be found here. For more details on different NSO methods see [1] part III and the references therein.
Nonsmooth Analysis
The theory of nonsmooth analysis is based on convex analysis. Thus, we start by giving some definitions and results for convex (not necessarily differentiable) functions. We define the subgradient and the subdifferential of a convex function as they are defined in [5]. Then we generalize these results to nonconvex locally Lipschitz continuous functions.
Convex Analysis
Definition. The subdifferential of a convex function at is the set of vectors such that
Each vector is called a subgradient of at
Example — Absolutevalue function
Function is clearly convex and differentiable when By the definition of subdifferential
Theorem. Let be a convex function. Then the classical directional derivative exists in every direction and it satisfies
The next theorem shows the relationship between the subdifferential and the directional derivative. It turns out that knowing is equivalent to knowing .
Theorem. Let be a convex function. Then for all
Nonconvex Analysis
Since classical directional derivatives do not necessarily exist for locally Lipschitz continuous functions, we first define a generalized directional derivative. We then generalize the subdifferential for nonconvex locally Lipschitz continuous functions.
Definition (Clarke). Let be a locally Lipschitz continuous function at The generalized directional derivative of at in the direction is defined by
Note that this generalized directional derivative always exists for locally Lipschitz continuous functions and, as a function of , it is sublinear. Therefore, we can now define the subdifferential for nonconvex locally Lipschitz continuous functions as analogous to the previous theorem (2.) with the directional derivative replaced by the generalized directional derivative.
Definition (Clarke). Let be a locally Lipschitz continuous function at a point Then the subdifferential of at is the set of vectors such that
Each vector is called a subgradient of at .
Theorem (Rademacher). Let be an open set. A function that is locally Lipschitz continuous on is differentiable almost everywhere on .
Theorem. Let be a locally Lipschitz continuous function at a point Then
where denotes the convex hull of set
The next list summarizes some properties of the subdifferential both in convex and nonconvex cases:
 The subdifferential for a convex function is a nonempty, convex, and compact set such that where is an open ball with center and radius and is the Lipschitz constant of at
 The subdifferential for a for locally Lipschitz continuous function is a nonempty, convex, and compact set such that where is the Lipschitz constant of at Moreover, for all
 The subdifferential for locally Lipschitz continuous functions is a generalization of the subdifferential for convex functions: If is a convex function, then for all and .
 The subdifferential for locally Lipschitz continuous functions is a generalization of the classical derivative: If is both locally Lipschitz continuous and differentiable at then . If, in addition, is continuously differentiable at then .
With the last point in mind, we can finally finish the whole subdifferential of the absolutevalue function.
Nonsmooth Optimization: Theory
Finally, we present some results that connect the theories of nonsmooth analysis and optimization. The necessary conditions for a locally Lipschitz continuous function to attain its local minimum in an unconstrained case are given in the next theorem. For convex functions these conditions are also sufficient and the minimum is global.
Theorem Let be a locally Lipschitz continuous function at If attains its local minimal value at then
Theorem If is a convex function, then the following conditions are equivalent:
Definition A point satisfying is called a critical or a stationary point of
References
 A. Bagirov, N. Karmitsa, M.M. Mäkelä "Introduction to Nonsmooth Optimization: Theory, Practice and Software." Springer, 2014.
 F.H. Clarke: "Optimization and Nonsmooth Analysis", WileyInterscience, New York, 1983.
 C. Lemarechal: "Nondifferentiable Optimization", in Optimization (G.L. Nemhauser, A.H.G. Rinnooy Kan, and M.J. Todd, Eds.), p. 529572, Elsevier NorthHolland, Inc., New York, 1989.
 M.M. Mäkelä and P. Neittaanmäki: "Nonsmooth Optimization: Analysis and Algorithms with Applications to Optimal Control", World Scientific Publishing Co., Singapore, 1992.
 R.T. Rockafellar: "Convex Analysis", Princeton University Press, Princeton, New Jersey, 1970.
 N.Z. Shor: "Minimization Methods for NonDifferentiable Functions", SpringerVerlag, Berlin, 1985.