
联络
- (852) 3411 5998
- chxw@hkbu.edu.hk
简历
深度学习是许多现代人工智能应用程序背后的关键技术,而褚晓文教授的研究工作致力於提高深度学习系统的实际运作性能。早在2016年,他的研究团队研发了最早的开源深度学习平台基准测试集,用以评估、比较和分析不同深度学习平台的性能。这项开拓性的工作引起了包括CNTK团队,MXNet团队,英伟达,英特尔,腾讯和浪潮等深度学习社区的极大关注。自2018年以来,褚教授的团队进一步提出了许多创新的方法来减少在大规模GPU集群上进行AI模型训练的时间。
作为GPU计算领域的先驱之一,褚教授的研究小组设计并实现了许多基於GPU的并行算法,显著的减少了很多实际应用软件的运行时间,其中包括DNA序列比对,蛋白质鉴定,数据挖掘,数据安全,线性代数运算和深度学习等等。他的团队不仅发表了许多高引用的学术论文,而且还开发了一系列的开源软件。例如,关於DNA序列比对的研究工作G-BLASTN不仅发表在生物信息学领域的顶级期刊Bioinformatics,而且该软件在Sourceforge和Github上获得了一千余次下载。褚教授最近的论文“Dissecting GPU Memory Hierarchy through Microbenchmarking”提出了一种新颖的细粒度P-Chase算法,用以揭示GPU内存系统中许多未知的特性。这是第一篇成功的分析英伟达的Kepler和Maxwell GPU缓存特性的论文,并且该方法被学术界广泛用於:(1)分析最新的GPU(例如Volta和Turing)的内存系统;(2)准确预测GPU应用的性能,例如深度学习中的并行卷积运算;(3)设计对缓存友好的矩阵运算和卷积运算的优化算法。
褚教授已在并行和分布式计算,深度学习系统,云计算,无线网络等领域发表了170余篇研究论文。他的许多著作发表在国际知名期刊,包括IEEE Transactions on Parallel and Distributed Systems ,IEEE Journal on Selected Areas in Communications,IEEE Transactions on Mobile Computing,IEEE/ACM Transactions on Networking,IEEE Transactions on Smart Grid,IEEE Transactions on Computers,和Bioinformatics;以及国际知名会议论文集,包括IEEE INFOCOM,ACM PPoPP,IEEE ICDCS,IEEE IPDPS,IJCAI,IEEE ICRA和ACM e-Energy。 褚教授的研究工作在学术界和工业界都有广泛的影响。 他已获得香港研究资助局,创新及科技基金,及业界的多项外部研究资助及捐款,总额超过港币一千八百万元。
成就
- 最佳論文獎 第四屆IEEE International Conference on Big Data Intelligence and Computing (2018)
- 最佳論文獎 第一屆 International Conference on Big Data Computing and Communications (2015)
- 最佳論文獎 第十屆IEEE International Conference on Computer and Information Technology (2010)
出版
- Zeng, R., S. Zhang, J. Wang & X. W. Chu. “FMore: An Incentive Scheme of Multi-dimensional Auction for Federated Learning in MEC.” IEEE ICDCS (2020). [Link]
- Shi, S., Q. Wang, X. W. Chu, B. Li, Y. Qin, R. Liu & X. Zhao. “Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs.” IEEE INFOCOM (2020). [Link]
- Yan, D., W. Wang & X. W. Chu. “Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply.” IEEE International Parallel and Distributed Processing Symposium (2020). [Link]
- Shi, S., Z. Tang, Q. Wang, K. Zhao & X. W. Chu. “Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees.” The 24th European Conference on Artificial Intelligence (2020). [Link]
- Yan, D., W. Wang & X. W. Chu. “Optimizing Batched Winograd Convolution on GPUs,” ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2020). [Link]
- Shi, S., K. Zhao, Q. Wang, Z. Tang & X. W. Chu. “A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification.” The 28th International Joint Conference on Artificial Intelligence (2019). [Link]
- Shi, S., Q. Wang, K. Zhao, Z. Tang, Y. Wang, X. Huang & X. W. Chu. “A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks.” The 39th IEEE International Conference on Distributed Computing Systems (2019). [Link]
- Tang, Z., Y. Wang, Q. Wang & X. W. Chu. “The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical Study.” The 10th ACM International Conference on Future Energy Systems (e-Energy) (2019). [Link]
- Shi, S., X.-W. Chu & B. Li. “MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms.” IEEE INFOCOM (2019). [Link]
- Jia X., S. Song, S. Shi, W. He, Y. Wang, H. Rong, F. Zhou, L. Xie, Z. Guo, Y. Yang, L. Yu, T. Chen, G. Hu & X. W. Chu. “Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes.” NeurIPS 2018 Workshop on Systems for ML and Open Source Software (2018). [Link]
- Zhao, H., H. Liu, Y.-W. Leung & X. W. Chu. “Self-Adaptive Collective Motion of Swarm Robots.” IEEE Transactions on Automation Science and Engineering 15.4 (2018): 1533-1545. [Link]
- Liu, C., Q. Wang, X.-W. Chu & Y. W. Leung. “G-CRS: GPU Accelerated Cauchy Reed-Solomon Coding.” IEEE Transactions on Parallel and Distributed Systems 29.7 (2018): 1482-1498. [Link]
- Zhang, F., H. Liu, Y. W. Leung, X.-W. Chu & B. Jin. “CBS: Community-based Bus System as Routing Backbone for Vehicular Ad Hoc Networks.” IEEE Transactions on Mobile Computing 16.8 (2017): 2132-2146. [Link]
- Wang, Q., P. Xu, Y. Zhang & X. W. Chu. “EPPMiner: An Extended Benchmark Suite for Energy, Power and Performance Characterization of Heterogeneous Architecture.” The 8th ACM International Conference on Future Energy Systems (e-Energy) (2017). [Link]
- Chau, V., X. W. Chu, H. Liu & Y.-W. Leung. “Energy Efficient Job Scheduling with DVFS for CPU-GPU Heterogeneous Systems.” The 8th ACM International Conference on Future Energy Systems (e-Energy) (2017). [Link]
- Mei, X., X. W. Chu, H. Liu, Y.-W. Leung & Z. Li. “Energy Efficient Real-time Task Scheduling on CPU-GPU Hybrid Clusters.” IEEE INFOCOM (2017). [Link]
- Mei, X. & X. W. Chu. "Dissecting GPU Memory Hierarchy through Microbenchmarking." IEEE Transactions on Parallel and Distributed Systems 28.1 (2017): 72-86. [Link]
- Lam, Albert Y. S., Y. W. Leung & X. W. Chu. “Autonomous Vehicle Public Transportation System: Scheduling and Admission Control.” IEEE Transactions on Intelligent Transportation Systems 17.5 (2016): 1210-1226. [Link]
- Yu, L., H. Liu, Y. W. Leung, X.-W. Chu & Z. Lin. “Multiple Radios for Fast Rendezvous in Cognitive Radio Networks.” IEEE Transactions on Mobile Computing 14.9 (2015): 1917-1931. [Link]
- Zhang, F., H. Liu, Y. W. Leung, X.-W. Chu & B. Jin. “Community-based Bus System as Routing Backbone for Vehicular Ad Hoc Networks.” IEEE ICDCS (2015). [Link]
- Zhao, J., X. W. Chu, H. Liu, Y. W. Leung & Z. Li. “Online Procurement Auctions for Resource Pooling in Client-Assisted Cloud Storage Systems.” IEEE INFOCOM (2015). [Link]
- Lam, Albert Y. S., Y. W. Leung & X. W. Chu. “Electric Vehicle Charging Station Placement: Formulation, Complexity, and Solutions.” IEEE Transactions on Smart Grid 5.6 (2014): 2846–2856. [Link]
- Zhao K. & X. W. Chu. “G-BLASTN: Accelerating Nucleotide Alignment by Graphics Processors.” Bioinformatics 30.10 (2014): 1384-91. [Link]
- Liu, H., X. W. Chu, Y. W. Leung & R. Du. “Minimum-Cost Sensor Placement for Required Lifetime in Wireless Sensor-Target Surveillance Networks.” IEEE Transactions on Parallel and Distributed Systems 24.9 (2013): 1783-1796. [Link]
- Lin, Z., H. Liu, X. W. Chu & Y. W. Leung. “Enhanced Jump-Stay Rendezvous Algorithm for Cognitive Radio Networks,” IEEE Communications Letters 17.9 (2013): 1742-1745. [Link]
- Lin, Z., H. Liu, X. W. Chu, Y. W. Leung & I. Stojmenovic. “Constructing Connected-Dominating-Set with Maximum Lifetime in Cognitive Radio Networks.” IEEE Transactions on Computers (2013). [Link]
- Liu, H., Z. Lin, X. W. Chu & Y. W. Leung. “Jump-Stay Rendezvous Algorithm for Cognitive Radio Networks.” IEEE Transactions on Parallel and Distributed Systems 23.10 (2012): 1867-1881. [Link]
- Li, Z. & X.-W. Chu. “On Achieving Group-Strategyproof Multicast.” IEEE Transactions on Parallel and Distributed Systems 23.5 (2012): 913-923. [Link]
- Liu, C. M., T. Wong, E. Wu, R. Luo, S. M. Yiu, Y. Li, B. Wang, C. Yu, X. W. Chu, K. Zhao, R. Li & T. W. Lam. “SOAP3: Ultra-fast GPU-based parallel alignment tool for short reads.” Bioinformatics 28.6 (2012):878-879. [Link]
- Liu, H., X. W. Chu, Y. W. Leung, X. Jia & P. Wan. “General Maximal Lifetime Sensor-Target Surveillance Problem and Its Solution.” IEEE Transactions on Parallel and Distributed Systems 22.10 (2011): 1757-1765. [Link]
- Lin, Z., H. Liu, X. W. Chu & Y. W. Leung. “Jump-Stay Based Channel-Hopping Algorithm with Guaranteed Rendezvous for Cognitive Radio Networks.” IEEE INFOCOM (2011). [Link]
- Liu, H., X. W. Chu, Y. W. Leung & R. Du. “Simple Movement Control Algorithm for Bi-Connectivity in Robotic Sensor Networks.” IEEE Journal on Selected Areas in Communications 28.7 (2010): 994-1005. [Link]
- Chu, X. W. & Y. Jiang. “Random Linear Network Coding for Peer-to-Peer Applications.” IEEE Network 24.4 (2010): 35-39. [Link]
- Chu, X. W., K. Zhao, Z. Li & A. Mahanti. “Auction-Based On-Demand P2P Min-Cost Media Streaming with Network Coding.” IEEE Transactions on Parallel and Distributed Systems 20.12 (2009): 1816-1829. [Link]
- Li, X. Y., Y. Wu, H. Chen, X. W. Chu, Y. Wu & Y. Qi. “Reliable and Energy Efficient Routing for Static Wireless Ad Hoc Networks with Unreliable Links.” IEEE Transactions on Parallel and Distributed Systems 20.10 (2009): 1408-1421. [Link]
- Liu, J. C., J. Xu & X. W. Chu. “Fine-Grained Scalable Video Caching for Heterogeneous Clients.” IEEE Transactions on Multimedia 8.5 (2006): 1011-1020. [Link]
- Chu, X. W. & B. Li. “Dynamic Routing and Wavelength Assignment in the Presence of Wavelength Conversion for All-Optical Networks.” IEEE/ACM Transactions on Networking 12.3 (2005): 704-715. [Link]
- Chu, X. W., J. Liu, B. Li & Z. Zhang. “Analytical Model of Sparse-Partial Wavelength Conversion in Wavelength-Routed WDM Networks.” IEEE Communications Letters 9.1 (2005): 69-71. [Link]
- Chu, X. W., J. Liu & Z. Zhang. “Analysis of Sparse-Partial Wavelength Conversion in Wavelength-Routed WDM Networks.” IEEE INFOCOM (2004). [Link]
- Liu, J., X. W. Chu & J. Xu. “Proxy Cache Management for Fine-Grained Scalable Video Streaming.” IEEE INFOCOM (2004). [Link]
- Sohraby, K., Z. Zhang, X. W. Chu & B. Li. “Resource Management in an Integrated Optical Network.” IEEE Journal on Selected Areas in Communications 21.7 (2003): 1052-1062. [Link]
- Li, B., X. W. Chu & K. Sohraby. “Routing and Wavelength Assignment vs. Wavelength Converter Placement in All-Optical Networks.” IEEE Communications Magazine 41.8 (2003): S22-S28. [Lhttps://ieeexplore.ieee.org/document/1222717ink]
- Chu, X. W., B. Li & I. Chlamtac. “Wavelength Converter Placement under Different RWA Algorithms in Wavelength-Routed All-Optical Networks.” IEEE Transactions on Communications 51.4 (2003): 607-617. [Link]
- Chu, X. W., B. Li & Z. Zhang. “A Dynamic RWA Algorithm in a Wavelength-Routed All-Optical Network with Wavelength Converters.” IEEE INFOCOM (2003): 1795-1804. [Link]