Question：

I am reading this nice document about the subgradient method, which defines the subgradient method iteration as follows.

$\begin{align*} x_{k+1}=x_{k}-\alpha_k g^k \end{align*}$
for a $g$ such that
$\begin{align*} f(y) \geq f(x) + g^T (y-x) \end{align*}$
If $f$ is differential then $g$ is its gradient. This seems to suggest that for any valid $g$ , we are ensured to increase (not strictly) the value of $f$ . However the same document states the following.

The subgradient method is not a descent method; the function value can (and often does) increase

and proposes to keep track of $f$ in each iteration, in order to keep track of the best one.

This seems to contradict the first statement about the choice of $g$ , as we are choosing a $g$ such that $f$ increases, how can the next $f$ not be the best so far?

Answer：

I believe your confusion comes from the fact that you inaccurately read the sign of the inequality. If $g^k$ is a sub-gradient at $x^k$ , then by taking $y=x^{k+1}$ and $x=x^k$ the sub-gradient inequality is:
$\begin{align*} f(x^{k+1})&=f(x^{k}-\alpha g^k)\\ &\geq f(x^{k})+(g^k)^T(-\alpha g^k)\\ &=f(x^{k})-\alpha ||g^k||_2^2 \end{align*}$

This means that $f(x^k+1)$ can be any value above $f(x^k)−α||g^k||_2^2$ , and in particular, any value above $f(x^k)$ . So the sub-gradient inequality does not ensure that it is a descent method.

In contrast to the sub-gradient method, when $f$ is differentiable and $\nabla f$ is Lipschitz continuous, you have the Descent Lemma:
$\begin{align*} f(y) \leq f(x) + \nabla f(x)^T (y-x) + \frac{L}{2} || y-x ||_2^2 \end{align*}$

The Descent Lemma is the property which ensures descent, and it does not necessarily holds when you replace the gradient with a sub-gradient.

Why is the subgradient not a descent method?

Question：

Answer：

参考网址：