一种自注意力序列模型的视频流长期预测方法

doi:10.19665/j.issn1001-2400.20240202

摘要/Abstract

摘要：

视频流量预测是实现传输带宽精准分配和提高互联网业务服务质量的关键技术。然而视频流量固有的高速率变异性、长期依赖性和短期依赖性使得其难以快速、精准、长期预测,具体表现为:① 现有预测序列依赖关系的模型复杂度高;② 预测模型失效快。针对视频流精准预测问题,提出了画面组帧结构特征嵌入的自注意力序列模型。自注意力序列模型对离散数据非线性关系的建模能力强,基于视频帧的特点和相关分析的发现,首次将时间序列自注意力模型应用于视频流量长期预测。现有时间序列自注意力模型无法对视频帧的类别特征有效表示,通过引入基于画面组帧结构嵌入层,将画面组帧结构信息有效嵌入时间序列,提升模型的准确度。结果表明,所提基于画面组帧结构特征嵌入的自注意力序列模型相比于现有的长短期记忆网络模型、卷积神经网络模型等,推理速度快,预测精度在平均绝对误差指标上至少降低约32%。

关键词: 预测, 时间序列分析, 网络管理, 视频流

Abstract:

Video traffic prediction is a key technology to achieve accurate transmission bandwidth allocation and improve the quality of the Internet service.However,the inherent high rate variability,long-term dependence and short-term dependence of video traffic make it difficult to make a quick,accurate and long-term prediction:because existing models for predicting sequence dependencies have a high complexity and prediction models fail quickly.Aiming at the problem of long-term prediction of video streams,a sequential self-attention model with frame structure feature embedding is proposed.The sequential self-attention model has a strong modeling ability for the nonlinear relationship of discrete data.Based on the difference of correlation between video frames,this paper applies the time series self-attention model to the long-term prediction of video traffic for the first time.The existing time series self-attention model cannot effectively represent the category features of video frames.By introducing an embedding layer based on the frame structure,the frame structure information is effectively embedded into the time series to improve the accuracy of the model.The results show that,compared with the existing long short-term memory network model and convolutional neural network model,the proposed sequential self-attention model based on frame structure feature embedding has a fast inference speed,and that the prediction accuracy is reduced by at least 32% in the mean absolute error.

Key words: forecasting, time series analysis, network management, video streaming

中图分类号:

TN915.03

葛云峰, 李红艳, 史可懿. 一种自注意力序列模型的视频流长期预测方法[J]. 西安电子科技大学学报, 2024, 51(3): 88-102.

GE Yunfeng, LI Hongyan, SHI Keyi. A self-attention sequential model for long-term prediction of video streams[J]. Journal of Xidian University, 2024, 51(3): 88-102.

图/表 17

图1

图2

图3

图4

图5

表1

图6

图7

图8

图9

表2

表3

表4

图10

表5

表6

表7

参考文献 53

[1]	BARNETT T, JAIN S, ANDRA U, et al. Cisco Visual Networking Index(VNI) Complete Forecast Update,2017-2022[R]. San Francisco: Americas/EMEAR Cisco Knowledge Network(CKN) Presentation, 2018:1-30.
[2]	LI Y, GUO S, SONG Q, et al. Mapping Prediction with Recurrent Neural Networks for Future LISP Enabled Networks[J]. Journal of Information and Intelligence, 2023, 1(2):134-147.
[3]	OM K. Ai Models of Video and Email Traffic for Anticipatory Networking[D]. Perth: Murdoch University, 2022.
[4]	张可涵, 李红艳, 刘文慧, 等. 面向流量预测的时间相关图卷积网络构建方法[J]. 西安电子科技大学学报, 2023, 50(5):11-20.
	ZHANG Kehan, LI Hongyan, LIU Wenhui, et al. Construction Method of Temporal Correlation Graph Convolution Network for Traffic Prediction[J]. Journal of Xidian University, 2023, 50(5):11-20.
[5]	RYDÉN H, PALAIOS A, HÉVIZI L, et al. Mobility Traffic and Radio Channel Prediction:5G and Beyond Applications(2022)[J/OL].[2022-03-15].https://arxiv.org/abs/2203.08047v1.
[6]	LEE W, OZGER M, CHALLITA U, et al. Noise Learning-Based Denoising Autoencoder[J]. IEEE Communications Letters, 2021, 25(9):2983-2987.
[7]	NIKNAM S, ROY A, DHILLON H S, et al. Intelligent O-RAN for Beyond 5G and 6G Wireless Networks(2020)[J/OL].[2020-05-17].https://arxiv.org/abs/2005.08374.
[8]	NOMURA M, FUJII T, OHTA N. Basic Characteristics of Variable Rate Video Coding in ATM Environment[J]. IEEE Journal on Selected Areas in Communications, 1989, 7(5):752-760.
[9]	ZHOU Y, LI Q, CHU H, et al. Inherent-Attribute-Aware Dual-Graph Autoencoder for Rating Prediction[J]. Journal of Information and Intelligence, 2024, 2(1):82-97.
[10]	LUCANTONI D M, NEUTS M F, REIBMAN A R. Methods for Performance Evaluation of VBR Video Traffic Models[J]. IEEE/ACM Transactions on Networking, 1994, 2(2):176-180.
[11]	DAWOOD A M, GHANBARI M. Content-Based MPEG Video Traffic Modeling[J]. IEEE Transactions on Multimedia, 1999, 1(1):77-87.
[12]	ZHU H, MATRAWY A, LAMBADARIS L. Models and Tools for Simulation of Video Transmission on Wireless Networks[C]//Canadian Conference on Electrical and Computer Engineering 2004(IEEE Cat.No.04CH37513).Piscataway:IEEE, 2004, 2:781-784.
[13]	REN Q, KOBAYASHI H. Diffusion Approximation Modeling for Markov Modulated Bursty Traffic and Its Applications to Bandwidth Allocation in ATM Networks[J]. IEEE Journal on Selected Areas in Communications, 1998, 16(5):679-691.
[14]	HEYMAN D P. The GBAR Source Model for VBR Videoconferences[J]. IEEE/ACM Transactions on Networking, 1997, 5(4):554-560.
[15]	SARKAR U K, RAMAKRISHNAN S, SARKAR D. Modeling Full-Length Video Using Markov-Modulated Gamma-Based Framework[C]//GLOBECOM'01.IEEE Global Telecommunications Conference(Cat.No.01CH37270).Piscataway:IEEE, 2001, 3:1979-1983.
[16]	SARKAR U K, RAMAKRISHNAN S, SARKAR D. Modeling Full-Length Video Using Markov-Modulated Gamma-Based Framework[J]. IEEE/ACM Transactions on Networking, 2003, 11(4):638-649.
[17]	CHIN H S, GOODGE J W, GRIFFITHS R, et al. Statistics of Video Signals for View Phone Type Pictures[J]. IEEE Journal on Selected Areas in Communications, 1989, 7(5):826-832.
[18]	LIEW C H, KODIKARA C, KONDOZ A M. Video Traffic Model for MPEG4 Encoded Video[C]//62nd IEEE VTS Vehicle Technology Conference.Institute of Electrical and Electronics Engineers. Piscataway:IEEE, 2005, 3:1854-1858.
[19]	MARKOVIĆ D R, GAVROVSKA A M, RELJIN I S. 4k Video Traffic Prediction Using Seasonal Autoregressive Modeling[J]. Telfor Journal,2017,9(1):8-13.
[20]	TANWIR S, NAYAK D, PERROS H. Modeling 3D Video Traffic Using a Markov Modulated Gamma Process[C]//2016 International Conference on Computing,Networking and Communications(ICNC).Piscataway:IEEE, 2016:1-6.
[21]	ZHANG L, MA J. A Spatiotemporal Graph Wavelet Neural Network for Traffic Flow Prediction(2023)[J/OL].[2023-03-16].Journal of Information and Intelligence,https://www.sciencedirect.com/science/article/pii/S2949715923000021.
[22]	MUMUNI A, MUMUNI F. Automated Data Processing and Feature Engineering for Deep Learning and Big Data Applications:A Survey(2024)[J/OL].[2024-01-08].Journal of Information and Intelligence,https://www.sciencedirect.com/science/article/pii/S2949715924000027.
[23]	SALEHIN I, ISLAM M S, SAHA P, et al. AutoML:A Systematic Review on Automated Machine Learning with Neural Architecture Search[J]. Journal of Information and Intelligence, 2024, 2(1):52-81.
[24]	RETICCIOLI E, DI GIROLAMO G D, DI CARLO C, et al. Machine Learning-Based Approaches Comparison for Netflix/DAZN Streaming and Real Traffic Prediction[C]//GLOBECOM 2022-2022 IEEE Global Communications Conference.Piscataway:IEEE, 2022:3102-3107.
[25]	BIERNACKI A. Enhancing Quality of DASH Video with LSTM Throughput Prediction(2022)[J/OL].[2022-04-13].https://www.researchsquare.com/article/rs-1546677/v1.
[26]	DUC T N, MINH C T, XUAN T P, et al. Convolutional Neural Networks for Continuous QoE Prediction in Video Streaming Services[J]. IEEE Access, 2020, 8:116268-116278.
[27]	KOUGIOUMTZIDIS G, POULKOV V, ZAHARIS Z D, et al. A Survey on Multimedia Services QoE Assessment and Machine Learning-Based Prediction[J]. IEEE Access, 2022, 10:19507-19538.
[28]	OM K, SINGH R, KAUR A, et al. Artificial Intelligence-Based Video Traffic Policing for Next Generation Networks[J]. Simulation Modelling Practice and Theory, 2022, 121:102650.
[29]	OM K, MCGILL T, DIXON M, et al. H.264 and H.265Video Traffic Modeling Using Neural Networks[J]. Computer Communications, 2022, 184:149-159.
[30]	CHEN H L, LEE P C, HU S H. Improving Scalable Video Transmission over IEEE 802.11 e through a Cross-Layer Architecture[C]//2008 The Fourth International Conference on Wireless and Mobile Communications.Piscataway:IEEE, 2008:241-246.
[31]	DU H, TAN C, GAO J. Research on Identifying Video Frame and Getting the Decoding Priorities for Each Video Frame with DPI Technique on Cross-Layer Design[C]//2009 WRI World Congress on Computer Science and Information Engineering.Piscataway:IEEE, 2009, 1:443-447.
[32]	KALAMPOGIA A, KOUTSAKIS P. Using Simulated Annealing for Improved Video Bandwidth Prediction[C]//2017 IEEE Conference on Computer Communications Workshops(INFOCOM WKSHPS).Piscataway:IEEE, 2017:701-705.
[33]	IRAZABAL M, LOPEZ-AGUILERA E, DEMIRKOL I, et al. Dynamic Buffer Sizing and Pacing as Enablers of 5G Low-Latency Services[J]. IEEE Transactions on Mobile Computing, 2020, 21(3):926-939.
[34]	MARATSOLAS E, KOUTSAKIS P, LAZARIS A. Video Activity-Based Traffic Policing:A New Paradigm[J]. IEEE Transactions on Multimedia, 2014, 16(5):1446-1459.
[35]	WANG Y, LIN T L, COSMAN P C. Packet Dropping for H.264 Videos Considering both Coding and Packet-Loss Artifacts[C]//2010 18th International Packet Video Workshop.Piscataway:IEEE, 2010:165-172.
[36]	MARKOVIĆ D R, GAVROVSKA A M, RELJIN I S. 4K Video Traffic Analysis Using Seasonal Autoregressive Model for Traffic Prediction[C]//2016 24th Telecommunications Forum(TELFOR). Piscataway:IEEE,2016:1-4.
[37]	DAI M, ZHANG Y, LOGINOV D. A Unified Traffic Model for MPEG-4 and H.264 Video Traces[J]. IEEE Transactions on Multimedia, 2009, 11(5):1010-1023.
[38]	SEELING P, REISSLEIN M. Video Transport Evaluation with H.264 Video Traces[J]. IEEE Communications Surveys & Tutorials, 2011, 14(4):1142-1165.
[39]	VAN DER AUWERA G, DAVID P T, REISSLEIN M. Traffic and Quality Characterization of Single-Layer Video Streams Encoded with the H.264/MPEG-4 Advanced Video Coding Standard and Scalable Video Coding Extension[J]. IEEE Transactions on Broadcasting, 2008, 54(3):698-718.
[40]	TANWIR S, PERROS H. A Survey of VBR Video Traffic Models[J]. IEEE Communications Surveys & Tutorials, 2013, 15(4):1778-1802.
[41]	MARPE D, WIEGAND T, SULLIVAN G J. The H.264/MPEG4 Advanced Video Coding Standard and its Applications[J]. IEEE Communications Magazine, 2006, 44(8):134-143.
[42]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[C]//Advances in Neural Information Processing Systems. New York: ACM, 2017:6000-6010.
[43]	WU H, XU J, WANG J, et al. Autoformer:Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting[J]. Advances in Neural Information Processing Systems, 2021, 34:22419-22430.
[44]	ZENG A, CHEN M, ZHANG L, et al. Are Transformers Effective for Time Series Forecasting[J]. Proceedings of 37th the AAAI Conference on Artificial Intelligence, 2023, 37(9):11121-11128.
[45]	PASZKE A, GROSS S, MASSA F, et al. Pytorch:An Imperative Style,High-Performance Deep Learning Library[C]//Advances in Neural Information Processing Systems. San Diego: NeurIPS, 2019: 32.
[46]	ZHANG X. Matrix Algebra Methods for Artificial Intelligence[M]. Beijing: Higher Education Press, 2021.
[47]	VASWANI A, BENGIO S, BREVDO E, et al. Tensor2tensor for Neural Machine Translation(2018)[J/OL].[2018-03-16].https://arxiv.org/abs/1803.07416.
[48]	ACETO G, BOVENZI G, CIUONZO D, et al. Characterization and Prediction of Mobile-App Traffic Using Markov Modeling[J]. IEEE Transactions on Network and Service Management, 2021, 18(1):907-925.
[49]	KHANDAIT P, HUBBALLI N, MAZUMDAR B. Efficient Keyword Matching for Deep Packet Inspection Based Network Traffic Classification[C]//2020 International Conference on Communication Systems & Networks(COMSNETS).Piscataway:IEEE, 2020:567-570.
[50]	PENG S, MAO J, HU R, et al. Demo Abstract:Apn6:Application-Aware Ipv6 Networking[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops(INFOCOM WKSHPS). Piscataway:IEEE, 2020:1330-1331.
[51]	WANG Y, LIN D, LI C, et al. Application Driven Network:Providing On-Demand Services for Applications[C]//Proceedings of the 2016 ACM SIGCOMM Conference. New York: ACM, 2016:617-618.
[52]	PENG S. A YANG Model for Application-Aware Networking(APN)(2022)[EB/OL].[2022-11-01].https://datatracker.ietf.org/doc/draft-peng-apn-yang.
[53]	郭刚, 杨超, 陈明哲, 等. 结合机器学习的SSR代理下App流量识别方法[J]. 西安电子科技大学学报, 2023, 50(2):138-146.
	GUO Gang, YANG Chao, CHEN Mingzhe, et al. App Traffic Identification under Shadow SocksR Proxy with Machine Learning[J]. Journal of Xidian University, 2023, 50(2):138-146.

	所有P帧B帧相关系数的均值	所有B帧间相关系数的均值
NBC新闻	0.572 0	0.728 3
沉默的羔羊	0.713 5	0.700 2
星球大战	0.613 2	0.746 1
东京奥林匹克运动会	0.606 0	0.606 1
蓝色星球	0.056 2	0.480 2
大象的梦	0.033 7	0.627 6

参数	参数值
编码器层数	2
解码器层数	2
Dropout	0.05
学习率	10^-4
优化器	Adam
模型映射维度	512
激活函数	Gelu

编码器层数	解码器层数	推理时间/s	均方误差
2	2	11.91	0.053 9
4	4	13.47	0.054 5
6	6	13.57	0.055 8
8	8	20.72	0.057 6

	时间复杂度O(·)	模型参数量/个	最长计算路径O(·)
自注意力序列	n×n×m+n×m×d	21 528 583	1
线性回归模型	n×m×d	77 896	1
卷积神经网络	(k+1)×n×m×d	84 040	n/k
长短记忆网络	n×m×(m+d)	4 284 488	n

视频流	TOL	SOL	NBC	SW	BP	ED
线性回归模型	8.27	6.25	5.21	9.07	9.16	26.66
长短期记忆网络	2.31	0.23	1.02	0.40	1.55	5.68
卷积神经网络	2.64	0.55	1.34	0.73	1.84	6.11
Auto former	3.32	0.12	0.07	0.05	1.26	0.06
Auto former-GOP	1.67	0.01	0.06	0.04	0.98	0.05