求解一类非光滑凸优化问题的相对加速SGD算法

doi:10.19665/j.issn1001-2400.20240301

西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (3): 147-157.doi: 10.19665/j.issn1001-2400.20240301

• 计算机科学与技术 & 人工智能 • 上一篇下一篇

求解一类非光滑凸优化问题的相对加速SGD算法

张文娟¹(), 冯象初(), 肖锋³(), 黄姝娟³(), 李欢¹()

1.西安工业大学基础学院,陕西西安 710021
2.西安电子科技大学数学与统计学院,陕西西安 710071
3.西安工业大学计算机科学与工程学院,陕西西安 710021

收稿日期:2023-01-03 出版日期:2024-03-26 发布日期:2024-03-26
通讯作者: 肖锋(1976—),男,教授,E-mail:544070146@mail.xidian.edu.cn
作者简介:张文娟(1980—),女,副教授,E-mail:zhangwenjuan@xatu.edu.cn
冯象初(1962—),男,教授,E-mail:xcfeng@xidian.edu.cn
黄姝娟(1974—),女,副教授,E-mail:349242386@qq.com
李欢(1998—),女,西安工业大学硕士研究生,E-mail:1186858891@qq.com
基金资助:
陕西省自然科学基础研究计划(2021-JM440);国家自然科学基金(62171361);陕西省重点研发计划(2022GY-119)

Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem

ZHANG Wenjuan¹(), FENG Xiangchu(), XIAO Feng³(), HUANG Shujuan³(), LI Huan¹()

1. School of Sciences,Xi’an Technological University,Xi’an 710021,China
2. School of Mathematics and Statistics,Xidian University,Xi’an 710071,China
3. School of Computer Science and Engineering,Xi’an Technological University,Xi’an 710021,China

Received:2023-01-03 Online:2024-03-26 Published:2024-03-26

摘要/Abstract

摘要：

一阶优化算法由于其计算简单、代价小,被广泛应用于机器学习、大数据科学、计算机视觉等领域,然而,现有的一阶算法大多要求目标函数具有Lipschitz连续梯度,而实际中的很多应用问题不满足该要求。在经典的梯度下降算法基础上,引入随机和加速,提出一种相对加速随机梯度下降算法。该算法不要求目标函数具有Lipschitz连续梯度,而是通过将欧氏距离推广为Bregman距离,从而将Lipschitz连续梯度条件减弱为相对光滑性条件。相对加速随机梯度下降算法的收敛性与一致三角尺度指数有关,为避免调节最优一致三角尺度指数参数的工作量,给出一种自适应相对加速随机梯度下降算法。该算法可自适应地选取一致三角尺度指数参数。对算法收敛性的理论分析表明,算法迭代序列的目标函数值收敛于最优目标函数值。针对Possion反问题和目标函数的Hessian阵算子范数随变量范数多项式增长的极小化问题的数值实验表明,自适应相对加速随机梯度下降算法和相对加速随机梯度下降算法的收敛性能优于相对随机梯度下降算法。

关键词: 凸优化, 非光滑优化, 相对光滑, 随机规划, 梯度方法, 加速随机梯度下降

Abstract:

The first order method is widely used in the fields such as machine learning,big data science,computer vision,etc.A crucial and standard assumption for almost all first order methods is that the gradient of the objective function has to be globally Lipschitz continuous,which,however,can’t be satisfied by a lot of practical problems.By introducing stochasticity and acceleration to the vanilla GD (Gradient Descent) algorithm,a RASGD (Relatively Accelerated Stochastic Gradient Descent) algorithm is developed,and a wild relatively smooth condition rather than the gradient Lipschitz is needed to be satisfied by the objective function.The convergence of the RASGD is related to the UTSE (Uniformly Triangle Scaling Exponent).To avoid the cost of tuning this parameter,a ARASGD(Adaptively Relatively Accelerated Stochastic Gradient Descent)algorithm is further proposed.The theoretical convergence analysis shows that the objective function values of the iterates converge to the optimal value.Numerical experiments are conducted on the Poisson inverse problem and the minimization problem with the operator norm of Hessian of the objective function growing as a polynomial in variable norm,and the results show that the convergence performance of the ARASGD method and RASGD method is better than that of the RSGD method.

Key words: convex optimization, nonsmooth optimization, relatively smooth, stochastic programming, gradient method, accelerated stochastic gradient descent

中图分类号:

张文娟, 冯象初, 肖锋, 黄姝娟, 李欢. 求解一类非光滑凸优化问题的相对加速SGD算法[J]. 西安电子科技大学学报, 2024, 51(3): 147-157.

ZHANG Wenjuan, FENG Xiangchu, XIAO Feng, HUANG Shujuan, LI Huan. Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem[J]. Journal of Xidian University, 2024, 51(3): 147-157.

图/表 5

表1

函数h及相应的Bregman距离Dh的UTSE"

h(x)	强凸、光滑	$\frac{1}{2}\\|\boldsymbol{x}\\|^{2}_{2}$	$∑ i = 1 n$ x_i logx_i	- $∑ i = 1 n$ logx_i	$1 p$ ‖x‖^p (p>2)	$\begin{array}{c}\frac{1}{2}\\|\boldsymbol{x}\\|{ }_{2}^{2}+\frac{1}{p}\\|\boldsymbol{x}\\| \frac{p}{2} \\(p \geqslant 4)\end{array}$
UTSE	γ=2	γ=2	γ=1	0<γ≤0.5	0<γ<1	γ>1

表1

图1

图2

图3

图4

参考文献 21

[1]	王勇, 王喜媛, 任泽洋. 毫米波MIMO的DNN混合预编码梯度优化方法[J]. 西安电子科技大学学报, 2022, 49(1):202-207.
	WANG Yong, WANG Xiyuan, REN Zeyang. Algorithm for Gradient Optimization of Hybrid Precoding Based on DNN in the Millimeter Wave MIMO System[J]. Journal of Xidian University, 2022, 49(1):202-207.
[2]	LI J, XIAO M, FENG C, et al. Training Neural Networks by Lifted Proximal Operator Machines[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6):3334-3348.
[3]	POLYAK B T. Introduction to Optimization[M]. New York: Optimization Software,1987.
[4]	NESTEROV Y. Introductory Lectures on Convex Optimization:a Basic Course[M]. Boston: Kluwer Academic Publishers, 2004.
[5]	BIRNBAUM B, DEVANUR N R D, XIAO L. Distributed Algorithms via Gradient Descent for Fisher Markets[C]//Proceedings of the 12th ACM Conference on Electronic Commerce. New York: ACM, 2011:127-136.
[6]	BAUSCHKE H H, BOLTE J, TEBOULLE M. A Descent Lemma Beyond Lipschitz Gradient Continuity:First-Order Methods Revisited and Applications[J]. Mathematics of Operations Research, 2017, 42(2):330-348.
[7]	LU H, FREUD R M, NESTEROV Y. Relatively-Smooth Convex Optimization by First Order Methods,and Applications[J]. SIAM Journal on Optimization, 2018, 28(1):333-354.
[8]	HANZELY F, RICHTARIK P, XIAO L. Accelerated Bregman Proximal Gradient Methods for Relatively Smooth Convex Optimization(2020)[R/OL].[2020-01-01].http://10.48550/arXiv.1808.03045.
[9]	NESTEROV Y. Implementable Tensor Methods in Unconstrained Convex Optimization[J]. Mathematical Programming, 2021, 186:157-183.
[10]	ZHOU Y, LIANG Y, SHEN L. A Simple Convergence Analysis of Bregman Proximal Gradient Algorithm[J]. Computational Optimization and Applications, 2019, 93:903-912.
[11]	LI H, LIN Z, FANG Y. Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization[J]. Journal of Machine Learning Research, 2022, 23(1):10057-10097.
[12]	ZHOU P, YUAN X, LIN Z,et.al. A Hybrid Stochastic-Deterministic Minibatch Proximal Gradient Method for Efficient Optimization and Generalization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(10):5933-5946.
[13]	SCHMIDT M, ROUX L N, BACH F. Minimizing Finite Sums with the Stochastic Average Gradient[J]. Mathematical Programming, 2017, 162(1-2):83-112.
[14]	JOHNSON R, ZHANG T. Accelerating Stochastic Gradient Descent Using Predictive Variance Reduction[C]//Advances in Neural Information Processing Systems. San Diego: NEURIPS, 2013,315-323.
[15]	HANZELY F, RICHTARIKP. Fastest Rates for Stochastic Mirror Descent Methods(2018)[R/OL].[2018-01-01].https://doi.org/10.48550/arXiv.1803.07374v1.
[16]	XIE X, ZHOU P, LI H,et.al. Adan:Adaptive Nesterov Momentum Algorithm for Faster Optimizing DeepModels(2023)[R/OL].[2023-01-01].https://doi.org/10.48550/arXiv.2208.06677.
[17]	ZHUANG Z, LIU M, CUTKOSKY A,et.al. Understanding Adamw through Proximal Methods and Scale-Freeness(2022)[R/OL].[2022-08-09].https://doi.org/10.48550/arXiv.2202.00089.
[18]	LI H, LIN Z. Restarted Nonconvex Accelerated Gradient Descent:No More Polylogarithmic Factor in the O(ε^-7/4) Complexity(2022)[R/OL].[2022-01-01].https://doi.org/10.48550/arXiv.2201.11411.
[19]	POLYAK B T. Some Methods of Speeding up the Convergence of Iteration Methods[J]. Ussr Computational Mathematics and Mathematical Physics, 1964, 4(5):1-17.
[20]	NESTEROV Y. On an Approach to the Construction of Optimal Methods of Minimization of Smooth Convex Functions[J]. Ekonomika I Mateaticheskie Metody, 1988, 24(3):509-517.
[21]	ALLEN ZZ. Katyusha:the First Truly Accelerated Stochastic Gradient Method[C]//Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. New York: ACM, 2017:1200-1206.

求解一类非光滑凸优化问题的相对加速SGD算法

Relatively accelerated stochastic gradient algorithm for a class of non-smooth convex optimization problem

RichHTML

PDF (PC)

赞

可视化

摘要/Abstract

引用本文

使用本文

图/表 5

参考文献 21

相关文章 11

Metrics

本文评价

推荐阅读 0

[1]	王洪雁,邱贺磊,裴腾达. 利用判别字典学习的视觉跟踪方法[J]. 西安电子科技大学学报, 2019, 46(4): 150-158.
[2]	贺王鹏;孙伟;苏博;闫允一;郭宝龙. 机械故障诊断的稀疏特征提取方法[J]. 西安电子科技大学学报, 2018, 45(2): 154-159.
[3]	刘帅琦;王布宏;李龙军;李夏;曹帅. 二维混合MIMO相控阵雷达的DOA估计算法[J]. 西安电子科技大学学报, 2017, 44(3): 157-163.
[4]	杨明;李翔;杨昊;刘鑫;陈琨奇. 能量效率认知无线电协作感知和传输联合优化[J]. 西安电子科技大学学报, 2017, 44(3): 101-107.
[5]	孔繁锵;卞陈鼎;李云松;郭文骏. 非凸稀疏低秩约束的高光谱解混方法[J]. 西安电子科技大学学报, 2016, 43(6): 116-121.
[6]	高崧;李玉山;闫旭. 一种补偿宏模型无源性的快速凸优化方法[J]. J4, 2012, 39(1): 62-66+121.
[7]	李新民;白宝明;童胜. 具有不完全信道状态信息的MIMO广播信道预编码[J]. J4, 2011, 38(5): 7-12.
[8]	李军;邢孟道;张磊;吴顺君. 一种高分辨的稀疏孔径ISAR成像方法[J]. J4, 2010, 37(3): 441-446+453.
[9]	何学辉;吴兆平;苏涛;吴顺君. 任意相位编码信号及其脉压滤波器联合优化设计[J]. J4, 2009, 36(6): 1027-1033.
[10]	柴晶;刘宏伟;保铮. 提高雷达HRRP目标识别和拒判性能的核学习算法[J]. J4, 2009, 36(5): 793-800.
[11]	霍永亮1;2;刘三阳1. 随机规划逼近最优解集的上半收敛性[J]. J4, 2005, 32(6): 953-957.