Thursday, December 7, 2017

           "外面不安的世界骚动的心情,不能熄灭曾经你拥有的炙热的心."


Dr. Ang Li is a senior computer scientist in the Physical and Computational Sciences  Directorate (PCSD) of Pacific Northwest National Laboratory (PNNL) and Associated Professor with the ECE department of University of Washington (UW) with dual appointment. He received his bachelor degree from the CS department of Zhejiang University, China, in 2010, and two PhD degrees from the Electrical and Computer Engineering (ECE) department of National University of Singapore (NUS), Singapore, and the Electrical Engineering (EE) department of Eindhoven University of Technology (TU/e), The Netherlands, in 2016. He jointed PNNL since Nov, 2016 and dual-appointed with UW ECE since Oct, 2023. His research has been focusing on software-hardware co-design for scalable heterogeneous HPC, particularly GPUs, since 2009. His research covers full-stack design from circuit level up to architecture, system, library, and applications. He has published in major HPC conferences and journals including SC, ICS, PPoPP, IPDPS, HPDC, ASPLOS, MICRO, HPCA, ICPP, CGO, IISWC, EuroPar, TPDS, TC, ICPE, etc. His lead-author work was nominated for best paper award in SC-15, SC-17, IISWC-18 and SC-20. He received the European HiPEAC paper award, and PNNL's PCSD Outstanding Performance award. He served as organizing committee or review committee member for major HPC conferences including PPoPP, SC, ASPLOS, PACT, ISCA, IPDPS, etc. He used to work in industry as a HPC application developer, where he led the evaluation, development, and optimization of several industrial HPC applications. He also worked as a research intern in the INRIA-Lab in Paris-Sud University, France and Chinese University of Hong Kong. 

His research interest includes:
  • Software-Hardware Co-design for HPC accelerators, particularly GPUs, and domain-specific accelerators
  • Performance Modeling and Evaluation for HPC Architecture and Applications
  • Scalable Quantum Circuit Simulation, Transformation and Verification
  • Binarized Neural Network

Service (PC/ERC)

  • 2025: HPCA, IPDPS, ISCA, ICLR, PLDI, ICS
  • 2024: HPCA, SC, CC, IPDPS, ISCA, ICS, QCE, MICRO
  • 2023: IPDPS, ISCA, MLSys, CC, SC, ICPP, QCE, SBAC-PAD
  • 2022: ASPLOS, PPoPP, ISCA, MICRO, SC, ISC, SBAC-PAD, HiPC, CC, ICRC
  • 2021: SC, PPoPP, ISCA, MICRO, IPDPS, Cluster, LCTES, TPDS-SS, SBAC-PAD, HiPC, HPCC
  • 2020: PPoPP, ISCA, SPAD-BAC, HPCC, TPDS-SS, SC-MLHPC
  • 2019: PPoPP, PACT, NPC, RTSS-AE
  • 2018: PPoPP-AE, NPC, ASPLOS-SRC
  • Journal Review: TPDS, TC, TOPC, CSUR, DB, JPDC, JSA, CAL, TACO, TNNLS, JCSC, TCAS-I/II, MICPRO, FGCS, TODAES, COSE, CVIU, Nature-NPJQI, SPE, TECS, IEEE Micro, Nature-ScientificReport, TQC, TQE, SNCS, Quantum

Mentoring (PhD & Postdoc):

  • 2024: Anastashia Jebraeilli (GSU), Sean Garner (UW), Nidhi Munikote (USC), Matthew Burns (Rochester), Yue Shi (UW), Meng Wang (UBC), Muqing Zheng (PNNL), Chenxu Liu (PNNL), Xinyi Li (PNNL), Zirui Mao (PNNL), Peiyi Li (NCSU)
  • 2023: Yue Shi (UW), Meng Wang (UBC), Muqing Zheng (Lehigh), Zirui Mao (PNNL), Habeebat Elwalily (Middlesex County College), Chenxu Liu (PNNL), Xinyi Li (Utah), Fei Hua (Rutgers)
  • 2022: Xinyi Li (Utah), Muqing Zheng (Lehigh), Anbang Wu (UCSB), Fei Hua (Rutgers), Zheng Wang (UCSB), Anqi Guo (Boston U), Chunshu Wu (Boston U), Kendric Hood (Kent State), Hongwu Peng (UConn), Yuke Wang (UCSB), Mao Lin (UC Merced), Bo Fang (PNNL)
  • 2021: Muqing Zheng (Lehigh), Luke Zhang (Rice), Anqi Guo (Boston U), Fei Hua (Rutgers), Xinyi Li (Utah), Tong Geng (PNNL), Cheng Tan (PNNL), Chenhao Xie (PNNL)
  • 2020: Komail Dharsee (Rochester U), Lele Ma (W&M), Jou-An Chen (NCSU)
  • 2019: Linghao Song (Duke), Tong Geng (Boston U), Erika Leal (UTA)
  • 2018: Tong Geng (Boston U), Pengfei Zou (Clemson)

Publications



2025:
  • [ASPLOS] "QECC-Synth: A Layout Synthesizer for Quantum Error Correction Codes on Sparse Architecture", Keyi Yin, Hezi Zhang, Xiang Fang, Yunong Shi, Travis Humble, Ang Li, and Yufei Ding. Rotterdam, Netherlands, Mar 30-Apr 3, 2025 (Accepted)
  • [PPoPP] "ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training", Yuhang Liang, Bo Fang, Xinyi Li, Jie Ren, Ang Li and Jieyang Chen. Las Vegas, NV, Mar 1-5, 2025 (Accepted)
  • [PPoPP] "Accelerating GNNs on GPU Sparse Tensor Cores through N:M Sparsity-Oriented Graph Reordering", Jou-An Chen, Hsin-Hsuan Sung, Ang Li, and Xipeng Shen. Las Vegas, NV, Mar 1-5, 2025 (Accepted)
2024:
  • [arXiv] "Architectures for Heterogeneous Quantum Error Correction Codes", Samuel Stein, Shifan Xu, Andrew Cross, Theodore yoder, Ali Javadi-Abhari, Chenxu Liu, Kun Liu, Victor Zhou, Charles Guinn, Yufei Ding, Yongshan Ding and Ang Li [arXiv]
  • [NPJQI] "Unleashed from Constrained Optimization: Quantum Computing for Quantum Chemistry Employing Generator Coordinate Method", Muqing Zheng, Bo Peng, Ang Li, Xiu Yang, and Karol Kowalski. npj Quantum Information, Nature (Accepted) [arXiv]
  • [arXiv] "GALIC: Hybrid Multi-Qubitwise Pauli Grouping for Quantum Computing Measurement", Matthew Burns, Chenxu Liu, Samuel Stein, Bo Peng, Karol Kowalski and Ang Li [arXiv]
  • [arXiv] "A GPU Accelerated Mixed-Precision Finite Difference Informed Random Walker (FDiRW) Solver for Strongly Inhomogeneous Diffusion Problems", Zirui Mao, Shenyang Hu and Ang Li [arXiv]
  • [arXiv] "Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models", Chuan Liu, Chunshu Wu, Mingkai Che, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, and Tong Geng [arXiv]
  • [arXiv] "On Scaling Up 3D Gaussian Splatting Training", Hexu Zhao, Haoyang Weng, Daohan Lu, Ang Li, Jinyang Li, Aurojit Panda, Saining Xie [arXiv]
  • [arXiv] "Inertial Confinement Fusion Forecasting via LLMs", Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng and Dongfang Liu [arXiv]
  • [AISY] "Accurate and Data-Efficient Micro-XRD Phase Identification Using Multi-Task Learning: Application to Hydrothermal Fluids", Yanfei Li, Juejing Liu, Xiaodong Zhao, Wenjun Liu, Tong Geng, Ang Li and Xin Zhang. Advanced Intelligent Systems. Wiley. DOI:10.1002/aisy.202400204. [arXiv][Link]
  • [MICRO] "FlexiSCD: Flexible Surface Code Deformer for Dynamic Defects", Keyi Yin, Xiang Fang, Travis Humble, Ang Li, Yunong Shi, and Yufei Ding. IEEE/ACM International Symposium on Microarchitecture. Austin, TX, Nov 2-6, 2024. [arXiv]
  • [MICRO] "Bridging the Gap between LLMs and LNS with Dynamic Data Format and Architecture Codesign", Pouya Haghi, Chunshu Wu, Zahra Azad, Yanfei Li, Andrew Gui, Yunchen Hao, Ang Li and Tong Geng. IEEE/ACM International Symposium on Microarchitecture. Austin, TX, Nov 2-6, 2024. 
  • [QCE] "Benchmarking Optimizers for Qumode State Preparation with Variational Algorithms", Shuwen Kan, Miguel Palma, Zefan Du, Samuel Stein, Chenxu Liu, Juntao Chen, Ang Li and Ying Mao. IEEE International Conference on Quantum Computing and Engineering. Montreal, Canada. Sep 15-20, 2024. [arXiv]
  • [QCE] "Scalable Circuit Cutting and Scheduling in a Resource-constrained and Distributed Quantum System", Shuwen Kan, Zefan Du, Miguel Palma, Samuel Stein, Chenxu Liu, Wenqi Wei, Juntao Chen, Ang Li and Ying Mao. IEEE International Conference on Quantum Computing and Engineering. Montreal, Canada. Sep 15-20, 2024. [arXiv
  • [TQC] "ARQUIN: Architectures for Multinode Superconducting Quantum Computers", James Ang, Gabriella Carini, Yanzhu Chen, Issac Chuang, Michael Austin DeMarco, Sophia E. Economou, Alec Eickbusch, Andrei Faraon, Kai-Mei Fu, Steven M. Girvin, Michael Hatridge, Andrew Houck, Paul Hilaire, Kevin Krsulich, Ang Li, Chenxu Liu, Yuan Liu, Margaret Martonosi, David C. McKay, James Misewich, Mark Ritter, Robert J. Schoelkopf, Samuel A. Stein, Sara Sussman, Hong X. Tang, Wei Tang, Teague Tomesh, Norm M. Tubman, Chen Wang, Nathan Wiebe, Yong-Xin Yao, Dillon C. Yost, and Yiyu Zhou, ACM Transactions on Quantum Computing. DOI:10.1145/3674151 [arXiv][Link]
  • [Cluster] "Understanding Mixed Precision GEMM with MPGemmFI: Insights into Fault Resilience", Bo Fang, Xinyi Li, Harvey Dam, Cheng Tan, Siva Kumar Sastry Hari, Timothy Tsai, Ignacio Laguna, Dingwen Tao, Ganesh Gopalakrishnan, Prashant Nair, Kevin Barker, and Ang Li. IEEE International Conference on Cluster Computing, IEEE. Kobe, Japan. Sep 24-27, 2024. [arXiv]
  • [FGCS] "Quantum-centric Supercomputing for Materials Science: A Perspective on Challenges and Future Directions", Many Authors. Future Generation Computer Systems, Elsevier. DOI:10.1016/j.future.2024.04.060 [Link][arXiv]
  • [ATC] "OPER: Optimality-Guided Embedding Table Parallelization for Large-scale Recommendation Model", Zheng Wang, Yuke Wang, Boyuan Feng, Guyue Huang, Dheevatsa Mudigere, Bharath Muthiah, Ang Li, Yufei Ding, 2024 USENIX Annual Technical Conference [Link].
  • [IEEE Access] "How much can we gain from Tensor Kernel Fusion on GPUs", Wei Sun, Ang Li, Sander Stuijk and Henk Corporaal, IEEE Access. DOI:10.1109/ACCESS.2024.3411473. [Link]
  • [arXiv] "Design of an Entanglement Purification Protocol Selection Module", Yue Shi, Chenxu Liu, Samuel Stein, Meng Wang, Muqing Zheng, and Ang Li [arXiv].
  • [arXiv] "TANQ-Sim: Tensorcore Accelerated Noisy Quantum System Simulation via QIR on Perlmutter HPC", Ang Li, Chenxu Liu, Samuel Stein, In-Saeng Suh, Muqing Zheng, Meng Wang, Yue Shi, Bo Fang, Martin Roetteler, and Travis Humble. [arXiv]. 
  • [arXiv] "An Early Investigation of the HHL Quantum Linear Solver for Scientific Applications", Muqing Zheng, Chenxu Liu, Samuel Stein, Xiangyu Li, Johannes Mulmenstadt, Yousu Chen, and Ang Li. [arXiv]
  • [ICS-24] "SmartFuse: Reconfigurable Smart Switches to Accelerate Fused Collectives in HPC Applications", Pouya Haghi, Cheng Tan, Anqi Guo, Chunshu Wu, Dongfang Liu, Ang Li, Anthony Skjellum, Tong Geng and Martin Herbordt, International Conference on Supercomputing, Kyoto, Japan. June 4-7, 2024. DOI:10.1145/3650200.3656616
  • [ISCA-24] "DS-GL: Advancing Graph Learning via Harnessing the Power of Nature within Dynamic Systems", Ruibing Song, Chunshu Wu, Chuan Liu, Ang Li, Michael Huang and Tong Geng, International Symposium on Computer Architecture, Buenos Aires, Argentina. Jun 29-Jul 3, 2024. DOI:10.1109/ISCA59077.2024.00014
  • [EPJA] "Deep Quantum Circuit Simulations of Low-energy Nuclear States", Ang Li, Alessandro Baroni, Ionel Stetcu, and Travis S. Humble. The European Physical Journal A. DOI:10.1140/epja/s10050-024-01286-7.[arXiv][Link]
  • [arXiv] "AQM: A Refresh of the Abstract Qubit Model for Quantum Codesign", Chenxu Liu, Samuel Stein, Muqing Zheng, James Ang, and Ang Li, 2024. [arXiv]
  • [TC] "FPGA-Accelerated Range-Limited Molecular Dynamics", Chunshu Wu, Chen Yang, Sahan Bandara, Tong Geng, Anqi Guo, Pouya Haghi, Ang Li and Martin Herbordt, IEEE Transactions on Computers, 2024. 10.1109/TC.2024.3375613
  • [ICPE-24] "Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs", Hongwu Peng, Caiwen Ding, Tong Geng, Sutanay Choudhury, Kevin Barker and Ang Li, 15th ACM/SPEC International Conference on Performance Engineering, South Kensington, London, UK. May 7-11, 2024. DOI:10.1145/3629527.3651428 [arXiv]
  • [PES-24] "Early Exploration of a Flexible Framework for Efficient Quantum Linear Solvers in Power Systems", Muqing Zheng, Yousu Chen, Xiu Yang, and Ang Li, IEEE Power and Energy Society General Meeting, Seattle, WA, USA. July 21-25, 2024. [arXiv]
  • [CCGrid-24] "A Testing-Guided Approach to Characterize NVIDIA and AMD Matrix Accelerator Numerics ", Xinyi Li, Ang Li, Bo Fang, Ignacio Laguna, Ganesh Gopalakrishnan, 24th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Philadelphia, PA, USA. May 6-9, 2024. DOI:10.1109/CCGrid59990.2024.00014
  • [PSCC-24] "Toward Intelligent Emergency Control for Large-scale Power Systems: Convergence of Learning, Physics, Computing and Control", Qiuhua Huang, Renke Huang, Tianzhixi Yin, Sohom Datta, Xueqing Sun, Jason Hou, Jie Tan, Wenhao Yu, Yuan Liu, Xinya Li, Bruce Palmer, Ang Li, Xinda Ke, Marianna Vaiman, Song Wang, Yousu Chen, the 23rd Power Systems Computation Conference, Paris-Saclay, France. June 4-7, 2024.
  • [JCIM] "Acceleration of Graph Neural Network-based Predication Models in Chemistry via Co-design Optimization on Intelligence Processing Units", Hatem Helal, Jesun Firoz, Jenna Bilbrey, Henry Sprueill, Kristina Herman, Mario Krell, Tom Murray, Manuel Roldan, Mike Kraus, Ang Li, Payel Das,Sotiris Xantheas, Sutanay Choudhury, Journal of Chemical Information and Modeling. DOI:10.1021/acs.jcim.3c01312 [arXiv][link]
  • [TQE] "A Quantum-Classical Collaborative Training Architecture Based on Quantum State Fidelity", Ryan L'Abbate, Anthony D'Onofrio JR., Samuel Stein, Samuel Yen-Chi Chen, Ang Li, Pin-Yu Chen, Juntao Chen, Ying Mao, IEEE Transactions on Quantum Engineering. DOI:10.1109/TQE.2024.3367234 [link]
  • [EABE] "A GPU Accelerated Mixed-precision Smoothed Particle Hydrodynamics Framework with Cell-based Relative Coordinates", Zirui Mao, Xinyi Li, Shenyang Hu, Ganesh Gopalakrishnan, Ang Li, Engineering Analysis with Boundary Elements, Elsevier. DOI:10.1016/j.enganabound.2024.01.020
  • [ICLR-24] "NP-GL: Extending Power of Nature from Binary to Real-World Graph Learning", Chunshu Wu, Ruibing Song, Chuan Liu, Yunan Yang, Ang Li, Michael Huang, Tong Geng, 12th International Conference on Learning Representations, Vienna, Austria. May 7-12, 2024.
  • [ICASSP-24] "Quapprox: A Framework for Benchmarking the Approximability of Variational Quantum Circuit", Jinyang Li, Ang Li and Weiwen Jiang, 2024 IEEE International Conference on Accoustics, Speech and Signal Processing, Seoul, South Korea. April 14-19, 2024. DOI:10.1109/ICASSP48485.2024.10447919
  • [ASPLOS-24] "Red-QAOA: Efficient Variational Optimization through Circuit Reduction", Meng Wang, Bo Fang, Ang Li and Prashant Nair, ACM International Conference on Architectural Support for Programming Languages and Operating Systems, San Diego, CA, USA. April 27- May 1, 2024. DOI:10.1145/3620665.3640363
  • [ASPLOS-24] "RAP: Resource-aware Automated GPU Sharing for Multi-GPU Recommendation Model Training and Input Preprocessing", Zheng Wang, Yuke Wang, Jiaqi Deng, Ang Li and Yufei Ding, ACM International Conference on Architectural Support for Programming Languages and Operating Systems, San Diego, CA, USA. April 27- May 1, 2024. DOI:10.1145/3620665.3640406
2023:
  • [arXiv] "Co-Designed Superconducting Architecture for Lattice Surgery of Surface Codes with Quantum Inferface Routing Card", Charles Guinn, Samuel Stein, Esin Tureci, Guus Avis, Chenxu Liu, Stefan Krastanov, Andrew A. Houck, Ang Li. [arXiv]
  • [arXiv] "Multi-mode Cavity Centric Archtiectures for Quantum Simulation", Samuel Stein, Fei Hua, Chenxu Liu, Charles Guinn, James Ang, Eddy Zhang, Srivatsan Chakram, Yufei Ding and Ang Li [arXiv]
  • [arXiv] "Quantum Memory: A Missing Piece in Quantum Computing Units", Chenxu Liu, Meng Wang, Samuel Stein, Yufei Ding and Ang Li [arXiv]
  • [arXiv] "QASMTrans: A QASM based Quantum Transpiler Framework for NISQ Devices", Fei Hua, Meng Wang, Gushu Li, Bo Peng, Chenxu Liu, Muqing Zheng, Samuel Stein, Yufei Ding, Eddy Z. Zhang, Travis S. Humble, Ang Li [arXiv]
  • [MICRO-23] "Microarchitectures for Heterogeneous Superconducting Quantum Computers", Samuel Stein, Sara Sussman, Teague Tomesh, Charles Guinn, Esin Tureci, Sophia Fuhui Lin, Wei Tang, James Ang, Srivatsan Chakram, Ang Li, Margaret Martonosi, Fred T. Chong, Andrew A. Houck, Issac L. Chuang, Michael Austin DeMarco, 56th IEEE/ACM International Symposium on Microarchitecture, Toronto, Canada. Oct 28-Nov 1, 2023. DOI:10.1145/3613424.3614300 [arXiv][Link]
  • [MICRO-23] "QuComm: Optimizing Collective Communication for Distributed Quantum Computing", Anbang Wu, Yufei Ding and Ang Li56th IEEE/ACM International Symposium on Microarchitecture, Toronto, Canada. Oct 28-Nov 1, 2023. DOI: 10.1145/3613424.3614253 [arXiv][Link]
  • [BigData-23] "Distributed Quantum Learning with co-Management in a Multi-tenant Quantum System", Anthony D'Onofrio Jr., Amir Hossain, Lesther Santana, Naseem Machlovi, Samuel Stein, Jinwei Liu, Ang Li and Ying Mao, Sorrento, Italy. Dec 15-18, 2023. DOI: 10.1109/BigData59044.2023.10386676
  • [JPCC] "Machine Learning Automated Analysis of Enormous Synchrotron X-Ray Diffraction Datasets", Xiaodong Zhao, Yixuan Luo, Juejing Liu, Wenjun Liu, Kevin Rosso, Xiaofeng Guo, Tong Geng, Ang Li and Xin Zhang, The Journal of Physical Chemistry, Part C. DOI:10.1021/acs.jpcc.3c03572 [arXiv][Link]
  • [ICCV-23] "AutoReP: Automatic ReLU Replacement for Fast Private Network Inference", Hongwu Peng, Shaoyi Huang, Tong Zhou, Yukui Luo, Chenghong Wang, Zigeng Wang, Jiahui Zhao, Xi Xie, Ang Li, Tong Geng, Kalleel Mahmood, Wujie Wne, Xiaolin Xu, Caiwen Ding, International Conference on Computer Vision, Paris, France. Oct 2-6, 2023.
  • [QCE-23] "A Novel Spatial-Temporal Variational Quantum Circuit to Enable Deep Learning on NISQ Devices", Jinyang Li, Zhepeng Wang, Zhirui Hu, Prasanna Date, Ang Li and Weiwen Jiang, IEEE International Conference on Quantum Computing and Engineering, Bellevue, WA, USA. Sep 17-22, 2023. 10.1109/QCE57702.2023.00038
  • [SC-23] "FASDA: An FPGA-Aided, Scalable and Distributed Accelerator for Range-Limited Molecular Dynamics", Chunshu Wu, Tong Geng, Anqi Guo, Saha Bandara, Pouya Haghi, Chuan Liu, Ang Li and Martin Herbordt, The 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 12-17, 2023. DOI:10.1145/3581784.3607100.
  • [arXiv] "Enabling Full-Stack Quantum Computing with Changeable Error-Corrected Qubits", Anbang Wu, Keyi Yin, Andrew Cross, Ang Li and Yufei Ding [arXiv]
  • [PRR] "Quantum Algorithms for Generator Coordinate Methods", Muqing Zheng, Bo Peng, Nathan Wiebe, Ang Li, Xiu Yang, and Karol Kowalski, Physical Review Research, DOI:10.1103/PhysRevResearch.5.023200 [arXiv][Link]
  • [ICS-23] "BitGNN: Unleashing the Performance Potential of Binary Graph Neural Networks on GPUs", Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Sutanay Choudhury, Ang Li, International Conference on Supercomputing, Orlando, FL, USA. June 21-23, 2023. DOI:10.1145/3577193.3593725
  • [ICS-23] "Software-Hardware Co-design of Heterogeneous SmartNIC System for Recommendation Models Inference and Training", Anqi Guo, Yuchen Hao, Chunshu Wu, Pouya Haghi, Zhenyu Pan, Min Si, Dingwen Tao, Ang Li, Martin Herbordt, Tong Geng, International Conference on Supercomputing, Orlando, FL, USA. June 21-23, 2023. DOI:10.1145/3577193.3593724
  • [ICS-23] "FLASH: FPGA-Accelerated Smart Switches with GCN Case Study", Pouya Haghi, William Krska, Cheng Tan, Tong Geng, Po Hao Chen, Connor Greenwood, Anqi Guo, Thomas Hines, Chunshu Wu, Ang Li, Antony Skjellum, Martin Herbordt, International Conference on Supercomputing, Orlando, FL, USA. June 21-23, 2023. DOI:10.1145/3577193.3593739
  • [OSDI-23] "Accelerating Graph Neural Networks with Fine-grained Intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms", Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Kevin Barker, Ang Li and Yufei Ding, 17th USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, USA. July 10-12, 2023, [arXiv]
  • [HPDC-23] "Practical GPU Floating-Point Exception Detection, Diagnosis and Repair", Xinyi Li, Ignacio Laguna, Bo Fang, Katarzyna Swirydowicz, Ang Li and Ganesh Gopalakrishnan, ACM International Symposium on High-Performance Parallel and Distributed Computing, Orlando, FL, USA. June 16-23, 2023, DOI:10.1145/3588195.3592991
  • [ISCA-23] "Q-BEEP: Quantum Bayesian Error Mitigation Employing Poisson Modeling over the Hamming Spectrum for Quantum Error Mitigation", Samuel Stein, Nathan Wiebe, Yufei Ding, James Ang and Ang Li, International Symposium on Computer Architecture, Orlando, FL, USA. Jun 17-23, 2023, DOI:10.1145/3579371.3589043 [arXiv]
  • [JPDC] "Accelerating Matrix-Centric Graph Processing on GPU through Bit-Level Optimizations", Jou-An Chen, Hsin-Hsuan Sung, Xipeng Shen, Nathan Tallent, Kevin Barker, and Ang Li, Journal of Parallel and Distributed Computing, DOI:10.1016/j.jpdc.2023.02.013 [Link]. 
  • [DAC-23] "Ising-CF: A Pathbreaking Collaborative Filtering Method Through Efficient Ising Machine Learning", Zhuo Liu, Yunan Yang, Zhenyu Pan, Anshujit Sharma, Amit Hasan, Caiwen Ding, Ang Li, Michael Huang, and Tong Geng, Design Automation Conference, San Francisco, CA, USA, July 9-13, 2023. DOI:10.1109/DAC56929.2023.10247860
  • [DAC-23] "ML-CGRA: An Integrated Compilation Framework to Enable Efficient Machine Learning Acceleration on CGRAs", Yixuan Luo, Cheng Tan, Nicolas Bohm Agostini, Ang Li, Antonino Tumeo, Nirav Dave and Tong Geng, Design Automation Conference, San Francisco, CA, USA, July 9-13, 2023. DOI:10.1109/DAC56929.2023.10247873
  • [HPCA-23] "A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis", Yanhao Chen, Yuwei Jin, Fei Hua, Ari Hayes, Ang Li, Yunong Shi, and Eddy Z. Zhang, 29th IEEE International Symposium on High-Performance Computer Architecture, Montreal, QC, Canda, Feb 25-Mar 1, 2023. DOI:10.1109/HPCA56546.2023.10070990
  • [AAAI-23] "Ising-Traffic: An Ising-based Framework for Traffic Congestion Prediction with Uncertainty", Zhenyu Pan, Anshujit Sharma, Jerry Yao-Chieh Hu, Zhuo Liu, Ang Li, Han Liu, Michael Huang, and Tong Geng, Thirty-Seventh AAAI Conference on Artificial Intelligence, Washington DC, USA, Feb 7-14, 2023. DOI:10.1609/aaai.v37i8.26121
2022:
  • [arXiv] "GMI-DRL: Empowering Multi-GPU Deep Reinforcement Learning with GPU Spatial Multiplexing", Yuke Wang, Boyuan Feng, Zheng Wang, Tong Geng, Ang Li and Yufei Ding [arXiv]
  • [arXiv] "MSREP: A Fast yet Light Sparse Matrix Framework for Multi-GPU Systems", Jieyang Chen, Chenhao Xie, Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin Barker, Mark Raugas, Ang Li [arXiv]
  • [TPDS] "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors", Wei Sun, Ang Li, Tong Geng, Sander Stuijk, Henk Corporaal, IEEE Transactions on Parallel and Distributed Systems, DOI:10.1109/TPDS.2022.3217824 [arXiv][Link]
  • [SEC] "QuCNN: A Quantum Convolutional Neural Network with Entanglement based Backpropagation", Samuel Stein, Ying Mao, James Ang, and Ang Li, IEEE/ACM 7th Symposium on Edge Computing (SEC), Seattle, WA, USA, Dec 5-8, 2022, DOI:10.1109/SEC54971.2022.00054 [arXiv]
  • [iEnergy] "Power System Computing: Then, Now, and the Future"Yousu Chen, Zhenyu Huang, Shuangshuang Jin, Ang Li, IEEE iEnergy Jounal, DOI:10.23919/IEN.2022.0037 [Link]
  • [TQC] "A Bayesian Approach for Characterizing and Mitigating Gate and Measurement Errors"Muqing Zheng, Ang Li, Tamás Terlaky, Xiu Yang, ACM Transactions on Quantum Computing, DOI:10.1145/3563397 [Link][arXiv]
  • [arXiv] "A Synergistic Compilation Workflow for Tracking Crosstalk in Quantum Machines", Fei Hua, Yuwei Jin, Ang Li, Yanhao Chen, Chi Zhang, Ari Hayes, Hang Gao, Eddy Z. Zhang [arXiv]
  • [TQC] "QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation"Ang Li, Samuel Stein, Sriram Krishnamoorthy and James Ang, ACM Transactions on Quantum Computing, DOI:10.1145/3550488 [Link][arXiv][Github].
  • [TCC] "Elastic Resource Management for Deep Learning Applications in a Container Cluster"Ying Mao, Vaishali Sharma, Wenjia Zheng, Qiang Guan, Long Cheng, and Ang Li, IEEE Transactions on Cloud Computing, DOI:10.1109/TCC.2022.3194128 [Link]
  • [Cluster-22] "Efficient Hierarchy State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning", Bo Fang, M.Yusuf Ozkaya, Ang Li, Umit Catalyurek, Sriram Krishnamoorthy, IEEE Cluster, Heidelberg, Germany, Sep 6-9, 2022. DOI:10.1109/CLUSTER51413.2022.00041  [arXiv].
            Best Paper Award!
  • [FPL-22] "A Framework for Neural Network Inference on FPGA-Centric SmartNICs", Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li and Martin Herbordt, International Conference on Field Programmable Logic and Applications, Belfast, UK, Aug 29-Sep 2, 2022. DOI:10.1109/FPL57034.2022.00071
  • [FPL-22] "H-GCN: A Graph Convolutional Network Accelerator on Xilinx Versal AI Engines", Chengming Zhang, Tong Geng, Anqi Guo, Martin Herbordt, Ang Li, Dingwen Tao, International Conference on Field Programmable Logic and Applications, Belfast, UK, Aug 29-Sep 2, 2022. DOI:10.1109/FPL57034.2022.00040
  • [arXiv] "Searching Similarity Measure for Binarized Neural Networks", Yanfei Li, Ang Li, and Huimin Yu [arXiv]
  • [ICS-22] "ASAP - Automatic Synthesis of Area-Efficient and Precision-Aware CGRA", Cheng Tan, Thierry Tambe, Jeff Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li, International Conference on Supercomputing. Jun 27-30, 2022. DOI:10.1145/3524059.3532359 [pdf]
  • [ICS-22] "Accelerating Parallel I/O Via Hardware-Algorithm Co-Designed Adaptive Lossy Compression", Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li and Dingwen Tao, International Conference on Supercomputing. Jun 27-30, 2022. DOI:10.1145/3524059.3532362
  • [ISCA-22] "EQC: Ensembled Quantum Computing for Variational Quantum Algorithms", Samuel Stein, Nathan Wiebe, Yufei Ding, Bo Peng, Karol Kowalski, Nathan Baker, James Ang, and Ang Li, International Symposium on Computer Architecture, New York, NY, USA. Jun 11-15, 2022. DOI:10.1145/3470496.3527434 [arXiv][pdf].
            Nominated for Best Paper Award!
  • [DAC-22] "A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining"Hongwu Peng, Shaoyi Huang, Shiyang Chen, Bingbing Li, Tong Geng, Ang Li, Weiwen Jiang, Wujie Wen, Jinbo Bi, Hang Liu and Caiwen Ding, Design Automation Conference. DOI:10.1145/3489517.3530585
  • [TPWRS] "Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning"Renke Huang, Yujiao Chen, Tianzhixi Yin, Qiuhua Huang, Jie Tan, Wenhao Yu, Xinya Li, Ang Li, Yan Du, IEEE Transactions on Power Systems. DOI:10.1109/TPWRS.2022.3155117 [Link][arXiv]
  • [MLSys-22] "QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity", Samuel Stein, Betis Baheri, Daniel Chen, Ying Mao, Qiang Guan, Shuai Xu, Caiwen Ding, and Ang Li, Fifth Conference on Machine Learning and Systems, Santa Clara, CA, USA. Aug 29-Sep 1, 2022. [pdf][Link]
  • [MLSys-22] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling", Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, and Yingyan Lin, Fifth Conference on Machine Learning and Systems, Santa Clara, CA, USA. Aug 29-Sep 1, 2022. [Link]
  • [IPDPS-22] "Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU", Jou-An Chen, Hsin-Hsuan Sung , Nathan Tallent, Kevin Barker, Xipeng Shen and Ang Li, 36th IEEE International Parallel & Distributed Processing Symposium, Lyon, France. May 30-June 3, 2022. DOI:10.1109/IPDPS53621.2022.00056 [arXiv]
  • [TST] "GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm", Yanfei Li, Tong Geng, Ang Li, and Huimin Yu, Journal of Tsinghua Science and Technology. DOI:10.26599/TST.2021.9010084 [arXiv]. 
  • [HPCA-22] "DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications CGRAs", Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenghao Xie, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo, 28th IEEE International Symposium on High-Performance Computer Architecture, Seoul, South Korea, April 2-6, 2022. DOI:10.1109/HPCA53966.2022.00030
  • [HPCA-22] "GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design", Haoran You, Tong Geng, Yongan Zhang, Ang Li and Yingyan Lin, 28th IEEE International Symposium on High-Performance Computer Architecture, Seoul, South Korea, April 2-6, 2022. DOI:10.1109/HPCA53966.2022.00041
2021:
  • [Correctness-21] "Guarding Numerics Amidst Rising Heterogeneity", Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-Gonzalez and Zachary Tatlock, Fifth International Workshop on Software Correctness for HPC Applications. DOI:10.1109/Correctness54621.2021.00007
  • [ICCD-21] "DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications", Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker and Antonino Tumeo, The 39th IEEE International Conference on Computer Design, Virtual. Oct 24-27, 2021. DOI:10.1109/ICCD53106.2021.00018
           Best Paper Award!
  • [MICPRO] "BCNN: Binary Complex Neural Network", Yanfei Li, Tong Geng, Ang Li, and Huimin Yu, Microprocssors and Microsystems. DOI:10.1016/j.micpro.2021.104359 [Link]
  • [HPEC-21] "A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs", Tong Geng, Chunshu Wu, Cheng Tan, Chenhao Xie, Anqi Guo, Pouya Haghi, Sarah Yuan He, Jiajia Li, Martin Herbordt, Ang Li, IEEE High Performance Extreme Computing Conference, Sep 21-23, 2021. DOI:10.1109/HPEC49654.2021.9622877 
  • [QCE-21] "QuGAN: A Generative Adversarial Network Through Quantum States"Samuel A. Stein, Betis Baheri, Ray Marie Tischio, Ying Mao, Qiang Guan, Ang Li, Bo Fang, Shuai Xu, IEEE International Conference on Quantum Computing and Engineering, Oct 18-22, 2021. DOI:10.1109/QCE52317.2021.00023 [arXiv]
  • [MICRO-21] "I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization",Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin Herbordt, Yingyan Lin, and Ang Li, 54th IEEE/ACM International Symposium on Microarchitecture, Athens, Greece, Oct 16-20, 2021. DOI:10.1145/3466752.3480113 [pdf].
  • [ICCAD-21] "G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency"Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, Yingyan Lin, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021. DOI:10.1109/ICCAD51958.2021.9643549
  • [ICCAD-21] "Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search"Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, Caiwen Ding, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021. DOI:10.1109/ICCAD51958.2021.9643528
  • [ICCAD-21] "FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery"Daniel Manu, Yi Sheng, Junhuan Yang, Jieren Deng, Tong Geng, Ang Li, Caiwen Ding, Weiwen Jiang, Lei Yang, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021. DOI:10.1109/ICCAD51958.2021.9643440
  • [SC-21] "SV-Sim: Scalable PGAS-based State Vector Simulation of Quantum Circuits"Ang Li, Bo Fang, Christopher Granade, Guen Prawiroatmodjo, Bettina Heim, Martin Roetteler and Sriram Krishnamoorthy, The 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MI, USA. Nov 14-19, 2021. DOI:10.1145/3458817.3476169 [pdf][slides]
  • [SC-21] "APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores", Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding,The 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MI, USA. Nov 14-19, 2021. DOI:10.1145/3458817.3476157 [arXiv]
  • [ASAP-21] "Binary Complex Neural Network Acceleration on FPGA",Hongwu Peng, Shanglin Zhou, Scott Weitze, Jiaxin Li, Sahidul Islam, Tong Geng, Ang Li, Wei Zhang, Minghu Song, Mimi Xie, Hang Liu, Caiwen Ding. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, Virtual. DOI:10.1109/ASAP52443.2021.00021
  • [ASAP-21] "OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.",Cheng Tan, Nicolas Bohm Agostini, Jeff Zhang, Marco Minutoli, Vito Giovanni Castellana, Chenhao Xie, Tong Geng, Ang Li, Kevin J. Barker, Antonino Tumeo. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, Virtual. DOI:10.1109/ASAP52443.2021.00029 
  • [TPWRS] "Accelerated Derivative-free Deep Reinforcement Learning for Large-scale Grid Emergency Voltage Control"Renke Huang, Yujiao Chen, Tianzhixi Yin, Xinya Li, Ang Li, Jie Tan, Wenhao Yu, Yuan Liu, Qiuhua Huang, IEEE Transactions on Power Systems. DOI:10.1109/TPWRS.2021.3095179 [arXiv][IEEE]
  • [ICPP-21] "Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures", Chenhao Xie, Jieyang Chen, Jesun Firoz, Jiajia Li, Shuaiwen Song, Kevin Barker, Mark Raugas and Ang Li, International Conference on Parallel Processing, Aug 9-12, Chicago, IL, 2021. DOI:10.1145/3472456.3472478 [arXiv
  • [TPDS] "ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing"Cheng Tan, Chenhao Xie, Andres Marquez, Antonino Tumeo, Kevin Barker, Ang LiIEEE Transactions on Parallel and Distributed Systems. DOI:10.1109/TPDS.2021.3081074 [IEEE][arXiv]
  • [DATE-21] “AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators”, Cheng Tan, Chenhao Xie, Ang Li, Kevin Barker, Antonino Tumeo, The 2021 Design, Automation & Test in Europe Conference, Grenoble, France. February 1-5, 2021. DOI:10.23919/DATE51398.2021.9473955[pdf]
2020:
  • [TPDS] "Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs"Ang Li and Simon Su, IEEE Transactions on Parallel and Distributed Systems, Special Section on Parallel and Distributed Computing Techniques for AI/ML/DL. DOI:10.1109/TPDS.2020.3045828 [arXiv][GitHub][ppt][IEEE
  • [IISWC-20] "A Sparse Tensor Benchmark Suite for CPUs and GPUs"Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, and Kevin Barker, 2020 IEEE International Symposium on Workload Characterization, Beijing, China, Oct 27-29, 2020. DOI:10.1109/IISWC50251.2020.00027 [arXiv][GitLab].
  • [ICCD-20] "OpenCGRA: An Open-Source Framework for Modeling, Testing, Evaluating CGRAs", Cheng Tan, Chenhao Xie, Ang Li, Kevin Barker, Antonino Tumeo, The 38th IEEE International Conference on Computer Design, Hartford, Connecticut, USA. Oct 18-21, 2020. DOI:10.1109/ICCD50377.2020.00070 [pdf][GitHub]
  • [HPEC-20] "CQNN: a CGRA-based QNN Framework", Tong Geng, Chunshu Wu, Cheng Tan, Bo Fang, Ang Li, Martin Herbordt, 2020 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA. Sep 22-24, 2020. DOI:10.1109/HPEC43674.2020.9286194 [pdf]
  • [HPEC-20] "On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics", Jesun S Firoz, Ang Li, Jiajia Li, Kevin Barker, 2020 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA. Sep 22-24, 2020. DOI:10.1109/HPEC43674.2020.9286152 [pdf]
  • [TPDS] "O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference", Tong Geng, Ang Li, Tianqi Wang, Chunshu Wu, Yanfei Li, Runbin Shi, Wei Wu, and Martin Herbordt, IEEE Transactions on Parallel and Distributed Systems, 2020. DOI:10.1109/TPDS.2020.3013637 [Link]
  • [MICRO-20] "AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing",Tong Geng, Ang Li, Runbin Shi, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, and Martin Herbordt, 53rd IEEE/ACM International Symposium on Microarchitecture, Athens, Greece, Oct 17-21. DOI:10.1109/MICRO50266.2020.00079 [arXiv][pdf]
  • [SC-20] "Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters"Ang Li, Omer Subasi, Xiu Yang, and Sriram Krishnamoorthy, The 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA. Nov 15-20, 2020. DOI:10.1109/SC41405.2020.00017 [pdf] [GitHub]
                 Nominated for Best Paper Award!
  • [ICPP-20] "Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines", Pengfei Zou, Ang Li, Kevin Barker, and Rong Ge, International Conference on Parallel Processing, Aug 17-20, Edmonton, AB, Canada, 2020. DOI:10.1145/3404397.3404435 [pdf][GitHub][ppt]
  • [TC] "FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters", Tianqi Wang, Tong Geng, Ang Li, Xi Jin, and Martin Herbordt, IEEE Transactions on Computers, Volume 69, Issue 8, pp1143-1158, May, 2020. DOI:10.1109/TC.2020.3000118 [arXiv][IEEE]
  • [ICS-20] "CSB-RNN: A Faster-than-Realtime RNN Acceleration Framework with Compressed Structured Blocks", Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Hayden So, Martin Herbordt, Ang Li, and Yanzhi Wang, The 31st International Conference on SuperComputing, Barcelona, Spain. June 29-July 2, 2020. DOI:10.1145/3392717.3392749 [pdf].
  • [CCGrid-20] "Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems", Pengfei Zou, Ang Li, Kevin Barker, and Rong Ge, The 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Melbourne, Australia. May 11-14, 2020. DOI:10.1109/CCGrid49817.2020.00-37 [pdf].
2019:
  • [TPDS] "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect", Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan Tallent, and Kevin Barker, IEEE Transactions on Parallel and Distributed Systems, DOI:10.1109/TPDS.2019.2928289 [arXiv][Link][GitHub]
  • [SC-19] "BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets"Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, Kevin Barker, The 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 17-22, 2019. DOI:10.1145/3295500.3356169 [pdf] [GitHub] [ppt].
  • [IISWC-19] "Fingerprinting Anomalous Computation with RNN for GPGPU-Based HPC Machines"Pengfei ZouAng Li, Kevin Barker, and Rong Ge. 2019 IEEE International Symposium on Workload Characterization, Orlando, FL, USA, Nov 3-Nov 5, 2019. DOI:10.1109/IISWC47752.2019.9042165 [pdf]
  • [Springer] "PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite", Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker, CCF Transactions on High Performance Computing. DOI:10.1007/s42514-019-00012-w [arXiv][Link]
  • [ASAP-19] "LP-BNN: Ultra-Low-Latency BNN Inference with Layer Parallelism",Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, and Martin Herbordt. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, New York, USA, Jul 15-17, 2019. DOI:10.1109/ASAP.2019.00-43 [pdf]
  • [ICS-19] "O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning",Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, and Martin Herbordt. The 30th International Conference on SuperComputing, Phoenix, AZ, USA, Jun 26-28, 2019. DOI:10.1145/3330345.3330386 [pdf]
  • [HPCA-19] "PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World With Customized Memory Cube", Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, and Shuaiwen Leon Song. The 25th IEEE International Symposium on High-Performance Computer Architecture, Washington D.C., USA, Feb 16-20, 2019. DOI:10.1109/HPCA.2019.00013 [pdf]
2018:
  • [IISWC-18] "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite"Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. 2018 IEEE International Symposium on Workload Characterization, Raleigh, NC, USA, Sep 30-Oct 2, 2018. DOI:10.1109/IISWC.2018.8573483 [pdf][Supplementary File][Github][ppt].
             Nominated for Best Paper Award!
  • [ICS-18] "Warp-Consolidation: A Novel Execution Model for GPUs", Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, and Shuaiwen Leon Song. The 29th International Conference on SuperComputing, Beijing, China, Jun 12-15, 2018. DOI:10.1145/3205289.3205294 [pdf][ppt].
  • [CGO-18] "CUDAAdvisor: LLVM-based Runtime Profiling for Modern GPUs", Du Shen, Ang Li, Shuaiwen Leon Song and Xu Liu, International Symposium on Code Generation and Optimization, Vienna, Austria. Feb 24-28, 2018. DOI:10.1145/3168831 [pdf][Github]
  • [PPoPP-18] "SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks", Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska, Principles and Practice of Parallel Programming, Wien, Austria. Feb 24-28, 2018. DOI:10.1145/3178487.3178491 [pdf][Github]
2017:
  • [MICRO-17] "BVF: Enabling Significant On-Chip Power Savings via Bit-Value-Favor for Throughput Processors", Ang Li, Wenfeng Zhao and Shuaiwen Leon Song, The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Boston, MA, USA. Oct 14-18, 2017. DOI:10.1145/3123939.3123944 [pdf][slides]
  • [SC-17] "Exploring And Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels"Ang Li, Weifeng Liu, Xu Liu, Mads R.B.Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez and Shuaiwen Leon Song, The 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 12-17, 2017. DOI:10.1145/3126908.3126931 [pdf]
             Nominated for Best Paper Award!   
  • [CCPE] "Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides", Weifeng Liu, Ang Li, Jonathan Hogg, Iain Duff and Brian Vinter, Concurrency and Computation: Practice and Experience, Wiley. DOI:10.1002/cpe.4244 
  • [ASPLOS-17] "Locality-Aware CTA Clustering for Modern GPUs"Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar and Henk Corporaal, The 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Xi'an, China. Apr 8-12, 2017. DOI:10.1145/3093337.3037709 [pdf][ppt]
  • [ASICON-17] "Analysis and Design of Energy-Efficient Data-Dependent SRAM", Wenfeng Zhao, Ang Li, Yi Wang and Yajun Ha, IEEE 12th International Conference on ASIC,  Guiyang, China, Oct 25-28, 2017. DOI:10.1109/ASICON.2017.8252625 [pdf]
2016:
  • [PhD Thesis] GPU Performance Modeling and Optimization (Oct, 2016). ISBN:978-90-386-4155-3 [pdf][ppt]
  • [EuroPar-16] "A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves", Weifeng Liu, Ang Li, Jonathan Hogg, Iain Duff and Brian Vinter, The 22nd International European Conference on Parallel and Distributed Computing, Grenoble, France, Aug 22-26, 2016. DOI:10.1007/978-3-319-43659-3_45 [pdf][slides][GitHub]
  • [ICS-16] "SFU-Driven Transparent Approximation Acceleration on GPUs", Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar and Henk Corporaal, The 27th International Conference on Supercomputing, Istanbul, Turkey, June 1-3, 2016. DOI:10.1145/2925426.2926255 [pdf][ppt]
  • [IPDPS-16] "X: A Comprehensive Analytic Model for Parallel Machines", Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel Chavarria-Miranda and Henk Corporaal, The 30th IEEE International Parallel & Distributed Processing Symposium, Chicago, Illinois, USA, May 23-27, 2016. DOI:10.1109/IPDPS.2016.89  [pdf][ppt]
  • [DATE-16] “Critical Points Based Register-Concurrency Autotuning for GPUs”, Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, Daniel Chavarria and Henk Corporaal, The 2016 Design, Automation & Test in Europe Conference, Dresden, Germany. March 14-18, 2016. ISBN:978-3-9815-3707-9 [pdf][slides]
2015:
  • [SC-15] “Adaptive and Transparent Cache Bypassing on GPUs”, Ang Li, Gert-Jan Van Den Braak, Akash Kumar and Henk Corporaal, 2015 International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, Texas, USA. November 16-20, 2015. DOI:10.1145/2807591.2807606 [pdf][Supplementary File][ppt]
              Nominated for Best Paper Award and Best Student Paper Award!       
  • [DSD-15] A Locality Aware Convolutional Neural Networks Accelerator”, Runbin Shi, Zheng Xu, Zhihao Sun, Maurice Peemen, Ang Li, Henk Corporaal, Di Wu, the 18th International Conference on Digital Systems Design, Funchal, Portugal. August 26-28, 2015. DOI:10.1109/DSD.2015.70 [pdf]*
  • [HPDC-15] “Transit: A Visual Analytical Model for Multithreaded Machine”, Ang Li, Akash Kumar, Y.C. Tay and Henk Corporaalthe 24th International Symposium on High-Performance Parallel and Distributed Computing, Portland, Oregon, USA. June 15-19, 2015. DOI:10.1145/2749246.2749265 [pdf][ppt]
  • [ICS-15] “Fine-Grained Synchronizations and Dataflow Programming on GPUs”, Ang Li, Gert-Jan Van Den Braak, Akash Kumar and Henk CorporaalThe 26th International Conference on Supercomputing, Newport Beach, California, USA. June 8-11, 2015. DOI:10.1145/2751205.2751232 [pdf][slides]
  • [MICPRO] "Correlation Ratio Based Volume Image Registration on GPUs", Ang Li, Akash Kumar, Yajun Ha and Henk Corporaal,  Microprocssors and Microsystems Journal, vol. 39, no. 8, pp. 998--1011 (2015). DOI:10.1016/j.micpro.2015.04.002
  • [ASPDAC-15] “Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems”, Mohammad Shihabul Haque, Ang Li, Akash Kumar, Qingsong Wei, The 20th Asia and South Pacific Design Automation Conference, Chiba, Japan. January 19-22, 2015. DOI:10.1109/ASPDAC.2015.7059045 [pdf]
2014:
  • [DSD-14] “Accelerating Volume Image Registration through Correlation Ratio based Methods on GPUs”, Ang Li and Akash Kumar, the 17th International Conference on Digital Systems Design, Verona, Italy. August 27-29, 2014. DOI:10.1109/DSD.2014.29 [pdf]
  • [ISIC-14] “A Heterogeneous Platform with GPU and FPGA for Power Efficient High Performance Computing”,  Qiang Wu, Yajun Ha, Akash Kumar, Shaobo Luo, Ang Li and Shihab Mohamed. The 14th International Symposium on Integrated Circuit, Singapore, December 10-12, 2014. DOI:10.1109/ISICIR.2014.7029447



No comments:

Post a Comment