西安电子科技大学学报 ›› 2024, Vol. 51 ›› Issue (2): 13-27.doi: 10.19665/j.issn1001-2400.20231206

• 信息与通信工程 • 上一篇    下一篇

通信计算联合优化的图分割工作流部署方法

马英红1(), 吝李婉1(), 焦毅2(), 李秦尧3()   

  1. 1.西安电子科技大学 通信工程学院,陕西 西安 710071
    2.润建股份有限公司,陕西 西安 710075
    3.中国航天空气动力技术研究院 创新与应用中心,北京 100074
  • 收稿日期:2022-11-29 出版日期:2024-04-20 发布日期:2024-01-04
  • 作者简介:马英红(1981—),女,副教授,E-mail:yhma@xidian.edu.cn;
    吝李婉(1995—),女,西安电子科技大学硕士研究生,E-mail:lwlin@stu.xidian.edu.cn;
    焦 毅(1980—),男,讲师,E-mail:jiaoyi80@aliyun.com;
    李秦尧(1993—),男,助理工程师,E-mail:rishingyou@163.com
  • 基金资助:
    “十四五”国防基础科研计划(JCKY2020203XXXX);中央高校基本科研业务费专项资金(JB210106)

Workflow deployment method based on graph segmentation with communication and computation jointly optimized

MA Yinghong1(), LIN Liwan1(), JIAO Yi2(), LI Qinyao3()   

  1. 1. School of Telecommunications Engineering,Xidian University,Xi’an 710071,China
    2. Runjian Stock,Xi’an 710075,China
    3. Center of Innovation and Application,China Academy of Aerospace Aerodynamics,Beijing 100074,China
  • Received:2022-11-29 Online:2024-04-20 Published:2024-01-04

摘要:

为提高计算效率,将复杂的大规模任务分解为简单任务并建模为工作流,交由并行分布式计算集群来完成,已成为云中心处理持续增长的计算和网络任务的重要手段。然而,分布式计算的任务间数据传输所带来的通信带宽占用却容易造成云中心的网络拥塞。如何兼顾计算效率和通信开销,科学地部署工作流意义重大。两类典型的工作流部署算法为基于列表的部署算法和基于分簇的部署算法。然而,前者致力于提高计算效率,未关注工作流中任务之间的通信开销,大规模工作流的部署易带来较重的网络负荷;后者关注通信开销的最小化,但牺牲了工作流中任务的并行计算效率,导致工作流完成时间较长。文中从图论的角度出发,充分挖掘工作流中各任务之间的依赖性和并行性,通过对经典图分割算法进行改进,实现了工作流任务分区过程中通信开销最小化和计算并行性最大化之间的平衡。仿真结果表明,在不同的工作流规模下,所提算法的通信开销比列表部署算法平均减少约35%~50%,工作流完成时间比分簇部署算法平均降低约50%~65%,且对于具有不同通信计算比的工作流均具有良好的稳定性。

关键词: 云计算, 数据中心, 工作流, 任务部署, 图论

Abstract:

For the purpose of improving computing efficiency,it becomes an important way for cloud data centers to deal with the continuous growth of computing and network tasks by decomposes complex large-scale tasks into simple tasks and modeling them into workflows,which are then completed by parallel distributed computing clusters.However,the communication bandwidth consumption caused by inter-task transmission can easily cause network congestion in data center.It is of great significance to deploy workflow scientifically,taking into account both computing efficiency and communication overhead.There are two typical types of workflow deployment algorithms:list-based workflow deployment algorithm and cluster-based workflow deployment algorithm.However,the former focuses on improving the computing efficiency while does not pay attention to the inter-task communication cost,so the deployment of large-scale workflow is easy to bring heavy network load.The latter focuses on minimizing the communication cost,but sacrifices the parallel computing efficiency of the tasks in the workflow,which results in a long workflow completion time.This work fully explores the dependency and parallelism between tasks in workflow,from the perspective of graph theory.By improving the classic graph segmentation algorithm,community discovery algorithm,the balance between minimizing communication cost and maximizing computation parallelism was achieved in the process of workflow task partitioning.Simulation results show that,under different workflow scales,the proposed algorithm reduces the communication cost by 35%~50%,compared with the typical list-based deployment algorithm,and the workflow completion time by 50%~65%,compared with the typical cluster-based deployment algorithm.Moreover,its performance has good stability for workflows with different communication-calculation ratios.

Key words: cloud computing, data center, workflow, task deployment, graph theory

中图分类号: 

  • TP393
Baidu
map