Thursday, December 7, 2017

           "外面不安的世界骚动的心情,不能熄灭曾经你拥有的炙热的心."


Dr. Ang Li is a senior computer scientist in the high-performance-computing (HPC) group of Pacific Northwest National Laboratory (PNNL) since Nov, 2016. He received his bachelor degree from the CS department of Zhejiang University, China, in 2010, and two PhD degrees from the Electrical and Computer Engineering (ECE) department of National University of Singapore (NUS), Singapore, and the Electrical Engineering (EE) department of Eindhoven University of Technology (TU/e), The Netherlands, in 2016. His research has been focusing on software-hardware co-design for scalable heterogeneous HPC, particularly GPUs, since 2009. His research covers full-stack design from circuit level up to architecture, system, library, and applications. He has published in major HPC conferences and journals including SC, ICS, PPoPP, IPDPS, HPDC, ASPLOS, MICRO, HPCA, ICPP, CGO, IISWC, EuroPar, TPDS, TC, ICPE, etc. His lead-author work was nominated for best paper award in SC-15, SC-17, IISWC-18 and SC-20. He received the European HiPEAC paper award, and PNNL's PCSD Outstanding Performance award. He served as organizing committee or review committee member for major HPC conferences including PPoPP, SC, ASPLOS, PACT, ISCA, IPDPS, etc. He used to work in industry as a HPC application developer, where he led the evaluation, development, and optimization of several industrial HPC applications. He also worked as a research intern in the INRIA-Lab in Paris-Sud University, France and Chinese University of Hong Kong. 

His research interest includes:
  • Software-Hardware Co-design for HPC accelerators, particularly GPUs, and domain-specific accelerators
  • Performance Modeling and Evaluation for HPC Architecture and Applications
  • Scalable Quantum Circuit Simulation, Transformation and Verification
  • Binarized Neural Network

Service (PC/ERC)

  • 2023: IPDPS, ISCA, MLSys, CC
  • 2022: ASPLOS, PPoPP, ISCA, MICRO, SC, ISC, SPAD-BAC, HiPC, CC, ICRC
  • 2021: SC, PPoPP, ISCA, MICRO, IPDPS, Cluster, LCTES, TPDS-SS, SPAD-BAC, HiPC, HPCC
  • 2020: PPoPP, ISCA, SPAD-BAC, HPCC, TPDS-SS, SC-MLHPC
  • 2019: PPoPP, PACT, NPC, RTSS-AE
  • 2018: PPoPP-AE, NPC, ASPLOS-SRC
  • Journal Review: TPDS, TC, TOPC, CSUR, DB, JPDC, JSA, CAL, TACO, TNNLS, JCSC, TCAS-I/II, MICPRO, FGCS, TODAES, COSE, CVIU, Nature-NPJQI, SPE, TECS, IEEE Micro
  • Others: TPDS Review Board 

Publications


2023:

  • [AAAI-23] "Ising-Traffic: An Ising-based Framework for Traffic Congestion Prediction with Uncertainty" Zhenyu Pan, Anshujit Sharma, Jerry Yao-Chieh Hu, Zhuo Liu, Ang Li, Han Liu, Michael Huang, Tong Geng, Thirty-Seventh AAAI Conference on Artificial Intelligence, Washington DC, USA, Feb 7-14, 2023 (Accepted).
  • [HPCA-23] "A Pulse Generation Framework with Augmented Program-aware Basis Gates and Criticality Analysis", Yanhao Chen, Yuwei Jin, Fei Hua, Ari Hayes, Ang Li, Yunong Shi, Eddy Z. Zhang, 29th IEEE International Symposium on High-Performance Computer Architecture, Montreal, QC, Canada, Feb 25-Mar 1, 2023 (Accepted).

2022:
  • [arXiv] "Extreme Acceleration of Graph Neural Network-based Prediction Models for Quantum Chemistry", Hatem Helal, Jesun Firoz, Jenna Bilbrey, Mario Michael Krell, Tom Murray, Ang Li, Sotiris Xantheas, Sutanay Choudhury [arXiv]
  • [TPDS] "Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numerical Behaviors", Wei Sun, Ang Li, Tong Geng, Sander Stuijk, Henk Corporaal, IEEE Transactions on Parallel and Distributed Systems [arXiv][Link]
  • [arXiv] "QuCNN: A Quantum Convolutional Neural Network with Entanglement based Backpropagation", Samuel Stein, Ying Mao, James Ang, and Ang Li [arXiv]
  • [iEnergy] "Power System Computing: Then, Now, and the Future"Yousu Chen, Zhenyu Huang, Shuangshuang Jin, Ang Li, IEEE iEnergy Jounal [Link]
  • [TQC] "A Bayesian Approach for Characterizing and Mitigating Gate and Measurement Errors"Muqing Zheng, Ang Li, Tamás Terlaky, Xiu Yang, ACM Transactions on Quantum Computing [Link][arXiv]
  • [arXiv] "CollComm: Enabling Efficient Collective Quantum Communication Based on EPR Buffering", Anbang Wu, Yufei Ding, and Ang Li [arXiv]
  • [arXiv] "A Synergistic Compilation Workflow for Tracking Crosstalk in Quantum Machines", Fei Hua, Yuwei Jin, Ang Li, Yanhao Chen, Chi Zhang, Ari Hayes, Hang Gao, Eddy Z. Zhang [arXiv]
  • [arXiv] "Quantum Bayesian Error Mitigation Employing Poisson Modelling over the Hamming Spectrum for Quantum Error Mitigation", Samuel Stein, Nathan Wiebe, Yufei Ding, James Ang and Ang Li [arXiv]
  • [TQC] "QASMBench: A Low-level QASM Benchmark Suite for NISQ Evaluation and Simulation"Ang Li, Samuel Stein, Sriram Krishnamoorthy and James Ang, ACM Transactions on Quantum Computing [Link][arXiv][Github].
  • [TCC] "Elastic Resource Management for Deep Learning Applications in a Container Cluster"Ying Mao, Vaishali Sharma, Wenjia Zheng, Qiang Guan, Long Cheng, and Ang Li, IEEE Transactions on Cloud Computing [Link]
  • [Cluster-22] "Efficient Hierarchy State Vector Simulation of Quantum Circuits via Acyclic Graph Partitioning", Bo Fang, M.Yusuf Ozkaya, Ang Li, Umit Catalyurek, Sriram Krishnamoorthy, IEEE Cluster, Heidelberg, Germany, Sep 6-9, 2022 [arXiv].
            Best Paper Award!
  • [FPL-22] "A Framework for Neural Network Inference on FPGA-Centric SmartNICs", Anqi Guo, Tong Geng, Yongan Zhang, Pouya Haghi, Chunshu Wu, Cheng Tan, Yingyan Lin, Ang Li and Martin Herbordt, International Conference on Field Programmable Logic and Applications, Belfast, UK, Aug 29-Sep 2, 2022.
  • [FPL-22] "H-GCN: A Graph Convolutional Network Accelerator on Xilinx Versal AI Engines", Chengming Zhang, Tong Geng, Anqi Guo, Martin Herbordt, Ang Li, Dingwen Tao, International Conference on Field Programmable Logic and Applications, Belfast, UK, Aug 29-Sep 2, 2022.
  • [arXiv] "Searching Similarity Measure for Binarized Neural Networks", Yanfei Li, Ang Li, and Huimin Yu [arXiv]
  • [ICS-22] "ASAP - Automatic Synthesis of Area-Efficient and Precision-Aware CGRA", Cheng Tan, Thierry Tambe, Jeff Zhang, Bo Fang, Tong Geng, Gu-Yeon Wei, David Brooks, Antonino Tumeo, Ganesh Gopalakrishnan, Ang Li, International Conference on Supercomputing. Jun 27-30, 2022 [pdf]
  • [ICS-22] "Accelerating Parallel I/O Via Hardware-Algorithm Co-Designed Adaptive Lossy Compression", Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li and Dingwen Tao, International Conference on Supercomputing. Jun 27-30, 2022 (Accepted)
  • [ISCA-22] "EQC: Ensembled Quantum Computing for Variational Quantum Algorithms", Samuel Stein, Nathan Wiebe, Yufei Ding, Bo Peng, Karol Kowalski, Nathan Baker, James Ang, and Ang Li, International Symposium on Computer Architecture, New York, NY, USA. Jun 11-15, 2022 [arXiv][pdf].
            Nominated for Best Paper Award!
  • [DAC-22] "A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining"Hongwu Peng, Shaoyi Huang, Shiyang Chen, Bingbing Li, Tong Geng, Ang Li, Weiwen Jiang, Wujie Wen, Jinbo Bi, Hang Liu and Caiwen Ding, Design Automation Conference (Accepted)
  • [TPWRS] "Learning and Fast Adaptation for Grid Emergency Control via Deep Meta Reinforcement Learning"Renke Huang, Yujiao Chen, Tianzhixi Yin, Qiuhua Huang, Jie Tan, Wenhao Yu, Xinya Li, Ang Li, Yan Du, IEEE Transactions on Power Systems [Link][arXiv]
  • [MLSys-22] "QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity", Samuel Stein, Betis Baheri, Daniel Chen, Ying Mao, Qiang Guan, Shuai Xu, Caiwen Ding, and Ang Li, Fifth Conference on Machine Learning and Systems, Santa Clara, CA, USA. Aug 29-Sep 1, 2022 [pdf]
  • [MLSys-22] "BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling", Cheng Wan, Youjie Li, Ang Li, Nam Sung Kim, and Yingyan Lin, Fifth Conference on Machine Learning and Systems, Santa Clara, CA, USA. Aug 29-Sep 1, 2022
  • [IPDPS-22] "Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU", Jou-An Chen, Hsin-Hsuan Sung , Nathan Tallent, Kevin Barker, Xipeng Shen and Ang Li, 36th IEEE International Parallel & Distributed Processing Symposium, Lyon, France. May 30-June 3, 2022. [arXiv]
  • [TST] "GAAF: Searching Activation Functions for Binary Neural Networks through Genetic Algorithm", Yanfei Li, Tong Geng, Ang Li, and Huimin Yu, Journal of Tsinghua Science and Technology [arXiv]. 
  • [HPCA-22] "DRIPS: Dynamic Rebalancing of Pipelined Streaming Applications CGRAs", Cheng Tan, Nicolas Bohm Agostini, Tong Geng, Chenghao Xie, Jiajia Li, Ang Li, Kevin Barker, Antonino Tumeo, 28th IEEE International Symposium on High-Performance Computer Architecture, Seoul, South Korea, April 2-6, 2022.
  • [HPCA-22] "GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design", Haoran You, Tong Geng, Yongan Zhang, Ang Li and Yingyan Lin, 28th IEEE International Symposium on High-Performance Computer Architecture, Seoul, South Korea, April 2-6, 2022.
2021:
  • [Correctness-21] "Guarding Numerics Amidst Rising Heterogeneity", Ganesh Gopalakrishnan, Ignacio Laguna, Ang Li, Pavel Panchekha, Cindy Rubio-Gonzalez and Zachary Tatlock, Fifth International Workshop on Software Correctness for HPC Applications.
  • [ICCD-21] "DynPaC: Coarse-Grained, Dynamic, and Partially Reconfigurable Array for Streaming Applications", Cheng Tan, Tong Geng, Chenhao Xie, Nicolas Bohm Agostini, Jiajia Li, Ang Li, Kevin Barker and Antonino Tumeo, The 39th IEEE International Conference on Computer Design, Virtual. Oct 24-27, 2021
           Best Paper Award!
  • [MICPRO] "BCNN: Binary Complex Neural Network", Yanfei Li, Tong Geng, Ang Li, and Huimin Yu, Microprocssors and Microsystems [Link]
  • [HPEC-21] "A Survey: Handling Irregularities in Neural Network Acceleration with FPGAs", Tong Geng, Chunshu Wu, Cheng Tan, Chenhao Xie, Anqi Guo, Pouya Haghi, Sarah Yuan He, Jiajia Li, Martin Herbordt, Ang Li, IEEE High Performance Extreme Computing Conference, Sep 21-23, 2021 (To Appear). 
  • [QCE-21] "QuGAN: A Generative Adversarial Network Through Quantum States"Samuel A. Stein, Betis Baheri, Ray Marie Tischio, Ying Mao, Qiang Guan, Ang Li, Bo Fang, Shuai Xu, IEEE International Conference on Quantum Computing and Engineering, Oct 18-22, 2021. [arXiv]
  • [MICRO-21] "I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization",Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan, Chenhao Xie, Haoran You, Martin Herbordt, Yingyan Lin, and Ang Li, 54th IEEE/ACM International Symposium on Microarchitecture, Athens, Greece, Oct 16-20, 2021 [pdf].
  • [ICCAD-21] "G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency"Yongan Zhang, Haoran You, Yonggan Fu, Tong Geng, Ang Li, Yingyan Lin, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021 (To Appear)
  • [ICCAD-21] "Optimizing FPGA-based Accelerator Design for Large-Scale Molecular Similarity Search"Hongwu Peng, Shiyang Chen, Zhepeng Wang, Junhuan Yang, Scott Weitze, Tong Geng, Ang Li, Jinbo Bi, Minghu Song, Weiwen Jiang, Hang Liu, Caiwen Ding, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021 (To Appear)
  • [ICCAD-21] "FL-DISCO: Federated Generative Adversarial Network for Graph-based Molecule Drug Discovery"Daniel Manu, Yi Sheng, Junhuan Yang, Jieren Deng, Tong Geng, Ang Li, Caiwen Ding, Weiwen Jiang, Lei Yang, International Conference On Computer Aided Design, Munich, Germany, Nov 1-4, 2021 (To Appear)
  • [arXiv] "CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Design of Efficient and Adaptive Lossy Compression"Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, Dingwen Tao [arXiv]
  • [SC-21] "SV-Sim: Scalable PGAS-based State Vector Simulation of Quantum Circuits"Ang Li, Bo Fang, Christopher Granade, Guen Prawiroatmodjo, Bettina Heim, Martin Roetteler and Sriram Krishnamoorthy, The 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MI, USA. Nov 14-19, 2021 [pdf][slides]
  • [SC-21] "APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores", Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, and Yufei Ding,The 2021 International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MI, USA. Nov 14-19, 2021 (To Appear) [arXiv]
  • [ASAP-21] "Binary Complex Neural Network Acceleration on FPGA",Hongwu Peng, Shanglin Zhou, Scott Weitze, Jiaxin Li, Sahidul Islam, Tong Geng, Ang Li, Wei Zhang, Minghu Song, Mimi Xie, Hang Liu, Caiwen Ding. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, Virtual. 
  • [ASAP-21] "OpenCGRA: Democratizing Coarse-Grained Reconfigurable Arrays.",Cheng Tan, Nicolas Bohm Agostini, Jeff Zhang, Marco Minutoli, Vito Giovanni Castellana, Chenhao Xie, Tong Geng, Ang Li, Kevin J. Barker, Antonino Tumeo. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, Virtual. 
  • [TPWRS] "Accelerated Derivative-free Deep Reinforcement Learning for Large-scale Grid Emergency Voltage Control"Renke Huang, Yujiao Chen, Tianzhixi Yin, Xinya Li, Ang Li, Jie Tan, Wenhao Yu, Yuan Liu, Qiuhua Huang, IEEE Transactions on Power Systems [arXiv][IEEE]
  • [ICPP-21] "Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures", Chenhao Xie, Jieyang Chen, Jesun Firoz, Jiajia Li, Shuaiwen Song, Kevin Barker, Mark Raugas and Ang Li, International Conference on Parallel Processing, Aug 9-12, Chicago, IL, 2021 (To appear) [arXiv
  • [TPDS] "ARENA: Asynchronous Reconfigurable Accelerator Ring to Enable Data-Centric Parallel Computing"Cheng Tan, Chenhao Xie, Andres Marquez, Antonino Tumeo, Kevin Barker, Ang LiIEEE Transactions on Parallel and Distributed Systems [IEEE][arXiv]
  • [DATE-21] “AURORA: Automated Refinement of Coarse-Grained Reconfigurable Accelerators”, Cheng Tan, Chenhao Xie, Ang Li, Kevin Barker, Antonino Tumeo, The 2021 Design, Automation & Test in Europe Conference, Grenoble, France. February 1-5, 2021.[pdf]
2020:
  • [arXiv] "A Hybrid System for Learning Classical Data in Quantum States"Samuel A. Stein, Betis Baheri, Ray Marie Tischio, Yiwen Chen, Ying Mao, Qiang Guan, Ang Li, Bo Fang [arXiv] (under review)
  • [TPDS] "Accelerating Binarized Neural Networks via Bit-Tensor-Cores in Turing GPUs"Ang Li and Simon Su, IEEE Transactions on Parallel and Distributed Systems, Special Section on Parallel and Distributed Computing Techniques for AI/ML/DL [arXiv][GitHub][ppt][IEEE
  • [SC-20 workshop] "QASMBench: An OpenQASM Benchmark Suite for NISQ Evaluation and Simulation"Ang Li, Bo Fang, and Sriram Krishnamoorthy, First International Workshop on Quantum Computing Software (as part of SC-20) (To Appear) 
  • [IISWC-20] "A Sparse Tensor Benchmark Suite for CPUs and GPUs"Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Catherine Olschanowsky, and Kevin Barker, 2020 IEEE International Symposium on Workload Characterization, Beijing, China, Oct 27-29, 2020 (To Appear)[arXiv][GitLab].
  • [ICCD-20] "OpenCGRA: An Open-Source Framework for Modeling, Testing, Evaluating CGRAs", Cheng Tan, Chenhao Xie, Ang Li, Kevin Barker, Antonino Tumeo, The 38th IEEE International Conference on Computer Design, Hartford, Connecticut, USA. Oct 18-21, 2020 [pdf][GitHub]
  • [HPEC-20] "CQNN: a CGRA-based QNN Framework", Tong Geng, Chunshu Wu, Cheng Tan, Bo Fang, Ang Li, Martin Herbordt, 2020 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA. Sep 22-24, 2020 [pdf]
  • [HPEC-20] "On the Feasibility of Using Reduced-Precision Tensor Core Operations for Graph Analytics", Jesun S Firoz, Ang Li, Jiajia Li, Kevin Barker, 2020 IEEE High Performance Extreme Computing Conference, Waltham, MA, USA. Sep 22-24, 2020 [pdf]
  • [TPDS] "O3BNN-R: An Out-Of-Order Architecture for High-Performance and Regularized BNN Inference", Tong Geng, Ang Li, Tianqi Wang, Chunshu Wu, Yanfei Li, Runbin Shi, Wei Wu, and Martin Herbordt, IEEE Transactions on Parallel and Distributed Systems, Volume 32, Issue 1, Aug 3, 2020 [Link]
  • [MICRO-20] "AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing",Tong Geng, Ang Li, Runbin Shi, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, and Martin Herbordt, 53rd IEEE/ACM International Symposium on Microarchitecture, Athens, Greece, Oct 17-21. [arXiv][pdf]
  • [SC-20] "Density Matrix Quantum Circuit Simulation via the BSP Machine on Modern GPU Clusters"Ang Li, Omer Subasi, Xiu Yang, and Sriram Krishnamoorthy, The 2020 International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA. Nov 15-20, 2020 [pdf] [GitHub]
                 Nominated for Best Paper Award!
  • [ICPP-20] "Detecting Anomalous Computation with RNNs on GPU-Accelerated HPC Machines", Pengfei Zou, Ang Li, Kevin Barker, and Rong Ge, International Conference on Parallel Processing, Aug 17-20, Edmonton, AB, Canada, 2020 [pdf][GitHub][ppt]
  • [TC] "FPDeep: Scalable Acceleration of CNN Training on Deeply-Pipelined FPGA Clusters", Tianqi Wang, Tong Geng, Ang Li, Xi Jin, and Martin Herbordt, IEEE Transactions on Computers, Volume 69, Issue 8, pp1143-1158, May, 2020 [arXiv][IEEE]
  • [ICS-20] "CSB-RNN: A Super Real-time RNN Framework with Compressed Structured Block", Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Hayden So, Martin Herbordt, Ang Li, and Yanzhi Wang, The 31st International Conference on SuperComputing, Barcelona, Spain. June 29-July 2, 2020 [pdf].
  • [CCGrid-20] "Indicator-Directed Dynamic Power Management for Iterative Workloads on GPU-Accelerated Systems", Pengfei Zou, Ang Li, Kevin Barker, and Rong Ge, The 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, Melbourne, Australia. May 11-14, 2020 [pdf].
  • [PPoPP-20-Poster] "A Parallel Sparse Tensor Benchmark Suite on CPUs and GPUs"Jiajia Li, Mahesh Lakshminarasimhan, Xiaolong Wu, Ang Li, Cathie Olschanowsky, and Kevin Barker, The 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, CA, USA. Feb 22-26, 2020. [GitLab]
2019:
  • [TPDS] "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect", Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan Tallent, and Kevin Barker, IEEE Transactions on Parallel and Distributed Systems, Volume-31, Issue-1 [arXiv][Link][GitHub]
  • [SC-19-Poster] "Fingerprinting Anomalous Computation with RNN for GPGPU-Based HPC Machines"Pengfei ZouAng Li, Kevin Barker, and Rong Ge, The 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 17-22, 2019. 
                 ACM student research competition (SRC) 3rd place winner!
  • [SC-19] "BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets"Ang Li, Tong Geng, Tianqi Wang, Martin Herbordt, Shuaiwen Leon Song, Kevin Barker, The 2019 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 17-22, 2019 [pdf] [GitHub] [ppt].
  • [IISWC-19] "Fingerprinting Anomalous Computation with RNN for GPGPU-Based HPC Machines"Pengfei ZouAng Li, Kevin Barker, and Rong Ge. 2019 IEEE International Symposium on Workload Characterization, Orlando, FL, USA, Nov 3-Nov 5, 2019 [pdf]
  • [Springer] "PASTA: A Parallel Sparse Tensor Algorithm Benchmark Suite", Jiajia Li, Yuchen Ma, Xiaolong Wu, Ang Li, Kevin Barker, CCF Transactions on High Performance Computing [arXiv][Link]
  • [ASAP-19] "LP-BNN: Ultra-Low-Latency BNN Inference with Layer Parallelism",Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Shuaiwen Leon Song, Ang Li, and Martin Herbordt. The 30th IEEE International Conference on Application-specific Systems, Architectures, and Processors, New York, USA, Jul 15-17, 2019 [pdf]
  • [ICS-19] "O3BNN: An Out-Of-Order Architecture for High-Performance Binarized Neural Network Inference with Fine-Grained Pruning",Tong Geng, Tianqi Wang, Chunshu Wu, Chen Yang, Wei Wu, Ang Li, and Martin Herbordt. The 30th International Conference on SuperComputing, Phoenix, AZ, USA, Jun 26-28, 2019 [pdf]
  • [HPCA-19] "PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World With Customized Memory Cube", Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, and Shuaiwen Leon Song. The 25th IEEE International Symposium on High-Performance Computer Architecture, Washington D.C., USA, Feb 16-20, 2019 [pdf]
2018:
  • [SC-18-Poster] "Binarized ImageNet Inference in 29us", Tong Geng, Ang Li, Tianqi Wang, Shuaiwen Leon Song, Martin Herbordt, The 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA. Nov 11-16, 2018. 
  • [SC-18-Poster] "Energy Efficiency of Reconfigurable Caches on FPGAs", Tianqi Wang, Ang Li, Tong Geng, Martin Herbordt, The 2018 International Conference for High Performance Computing, Networking, Storage and Analysis, Dallas, TX, USA. Nov 11-16, 2018. 
  • [IISWC-18] "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite"Ang Li, Shuaiwen Leon Song, Jieyang Chen, Xu Liu, Nathan Tallent, and Kevin Barker. 2018 IEEE International Symposium on Workload Characterization, Raleigh, NC, USA, Sep 30-Oct 2, 2018 [pdf][Supplementary File][Github][ppt].
             Nominated for Best Paper Award!
  • [ICS-18] "Warp-Consolidation: A Novel Execution Model for GPUs", Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, and Shuaiwen Leon Song. The 29th International Conference on SuperComputing, Beijing, China, Jun 12-15, 2018 [pdf][ppt].
  • [CGO-18] "CUDAAdvisor: LLVM-based Runtime Profiling for Modern GPUs", Du Shen, Ang Li, Shuaiwen Leon Song and Xu Liu, International Symposium on Code Generation and Optimization, Vienna, Austria. Feb 24-28, 2018. [pdf][Github]
  • [PPoPP-18] "SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks", Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska, Principles and Practice of Parallel Programming, Wien, Austria. Feb 24-28, 2018. [pdf][Github]
2017:
  • [MICRO-17] "BVF: Enabling Significant On-Chip Power Savings via Bit-Value-Favor for Throughput Processors", Ang Li, Wenfeng Zhao and Shuaiwen Leon Song, The 50th Annual IEEE/ACM International Symposium on Microarchitecture, Boston, MA, USA. Oct 14-18, 2017. [pdf][slides]
  • [SC-17] "Exploring And Analyzing the Real Impact of Modern On-Package Memory on HPC Scientific Kernels"Ang Li, Weifeng Liu, Xu Liu, Mads R.B.Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez and Shuaiwen Leon Song, The 2017 International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA. Nov 12-17, 2017. [pdf]
             Nominated for Best Paper Award!   
  • [CCPE] "Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides", Weifeng Liu, Ang Li, Jonathan Hogg, Iain Duff and Brian Vinter, Concurrency and Computation: Practice and Experience, Wiley. 
  • [ASPLOS-17] "Locality-Aware CTA Clustering for Modern GPUs"Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar and Henk Corporaal, The 22nd ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Xi'an, China. Apr 8-12, 2017. [pdf][ppt]
  • [ASICON-17] "Analysis and Design of Energy-Efficient Data-Dependent SRAM", Wenfeng Zhao, Ang Li, Yi Wang and Yajun Ha, IEEE 12th International Conference on ASIC,  Guiyang, China, Oct 25-28, 2017. [pdf]
2016:
  • [PhD Thesis] GPU Performance Modeling and Optimization (Oct, 2016) [pdf][ppt]
  • [EuroPar-16] "A Synchronization-Free Algorithm for Parallel Sparse Triangular Solves", Weifeng Liu, Ang Li, Jonathan Hogg, Iain Duff and Brian Vinter, The 22nd International European Conference on Parallel and Distributed Computing, Grenoble, France, Aug 22-26, 2016. [pdf][slides][GitHub]
  • [ICS-16] "SFU-Driven Transparent Approximation Acceleration on GPUs", Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar and Henk Corporaal, The 27th International Conference on Supercomputing, Istanbul, Turkey, June 1-3, 2016. [pdf][ppt]
  • [IPDPS-16] "X: A Comprehensive Analytic Model for Parallel Machines", Ang Li, Shuaiwen Leon Song, Eric Brugel, Akash Kumar, Daniel Chavarria-Miranda and Henk Corporaal, The 30th IEEE International Parallel & Distributed Processing Symposium, Chicago, Illinois, USA, May 23-27, 2016.  [pdf][ppt]
  • [DATE-16] “Critical Points Based Register-Concurrency Autotuning for GPUs”, Ang Li, Shuaiwen Leon Song, Akash Kumar, Eddy Z. Zhang, Daniel Chavarria and Henk Corporaal, The 2016 Design, Automation & Test in Europe Conference, Dresden, Germany. March 14-18, 2016. [pdf][slides]

2015:
  • [SC-15] “Adaptive and Transparent Cache Bypassing on GPUs”, Ang Li, Gert-Jan Van Den Braak, Akash Kumar and Henk Corporaal, 2015 International Conference for High Performance Computing, Networking, Storage and Analysis, Austin, Texas, USA. November 16-20, 2015. [pdf][Supplementary File][ppt]
              Nominated for Best Paper Award and Best Student Paper Award!       
  • [DSD-15] A Locality Aware Convolutional Neural Networks Accelerator”, Runbin Shi, Zheng Xu, Zhihao Sun, Maurice Peemen, Ang Li, Henk Corporaal, Di Wu, the 18th International Conference on Digital Systems Design, Funchal, Portugal. August 26-28, 2015. [pdf]*
  • [HPDC-15] “Transit: A Visual Analytical Model for Multithreaded Machine”, Ang Li, Akash Kumar, Y.C. Tay and Henk Corporaalthe 24th International Symposium on High-Performance Parallel and Distributed Computing, Portland, Oregon, USA. June 15-19, 2015. [pdf][ppt]
  • [ICS-15] “Fine-Grained Synchronizations and Dataflow Programming on GPUs”, Ang Li, Gert-Jan Van Den Braak, Akash Kumar and Henk CorporaalThe 26th International Conference on Supercomputing, Newport Beach, California, USA. June 8-11, 2015. [pdf][slides]
  • [MICPRO] "Correlation Ratio Based Volume Image Registration on GPUs", Ang Li, Akash Kumar, Yajun Ha and Henk Corporaal,  Microprocssors and Microsystems Journal, vol. 39, no. 8, pp. 998--1011 (2015).
  • [ASPDAC-15] “Accelerating non-volatile/hybrid processor cache design space exploration for application specific embedded systems”, Mohammad Shihabul Haque, Ang Li, Akash Kumar, Qingsong Wei, The 20th Asia and South Pacific Design Automation Conference, Chiba, Japan. January 19-22, 2015. [pdf]
2014:

  • [DSD-14] “Accelerating Volume Image Registration through Correlation Ratio based Methods on GPUs”, Ang Li and Akash Kumar, the 17th International Conference on Digital Systems Design, Verona, Italy. August 27-29, 2014. [pdf]
  • [ISIC-14] “A Heterogeneous Platform with GPU and FPGA for Power Efficient High Performance Computing”,  Qiang Wu, Yajun Ha, Akash Kumar, Shaobo Luo, Ang Li and Shihab Mohamed. The 14th Internatinoal Symposium on Integrated Circuit, Singapore, December 10-12, 2014.