CV

Education

Ph.D in Version Control Theory, GitHub University, 2018 (expected)
M.S. in Jekyll, GitHub University, 2014
B.S. in GitHub, GitHub University, 2012

Work experience

Spring 2024: Academic Pages Collaborator
- GitHub University
- Duties includes: Updates and improvements to template
- Supervisor: The Users
Fall 2015: Research Assistant
- GitHub University
- Duties included: Merging pull requests
- Supervisor: Professor Hub
Summer 2015: Research Assistant
- GitHub University
- Duties included: Tagging issues
- Supervisor: Professor Git

Skills

Skill 1
Skill 2
- Sub-skill 2.1
- Sub-skill 2.2
- Sub-skill 2.3
Skill 3

Publications

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression

**Ruibo Fan**, Xiangrui Yu, Xinglin Pan, Zeyu Li, Weile Luo, Qiang Wang, Wei Wang, and Xiaowen Chu, ‘‘ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression,’’ in the Proceedings of ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’26), Pittsburgh, PA, USA, March 2026.

ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities

Weile Luo, Yuxin Chen, Xiangrui Yu, Qiang Wang, **Ruibo Fan**, Haibo Liu, et al., "ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities," in *Proceedings of the 31st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP)*, 2026.

Dissecting the NVIDIA Hopper Architecture through Micro-benchmarking and Multiple Level Analysis

Weile Luo, **Ruibo Fan**, Zeyu Li, et al., "Dissecting the NVIDIA Hopper Architecture through Micro-benchmarking and Multiple Level Analysis," *ACM Transactions on Computer Systems (TOCS)*, under review.

Exploiting Low-Level Sparsity for Efficient Large Language Model Inference on GPUs with SpInfer

**Ruibo Fan**, et al., "Exploiting Low-Level Sparsity for Efficient Large Language Model Inference on GPUs with SpInfer," *ACM Transactions on Computer Systems (TOCS)*, invited, under review.

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peng Dong, Lin Li, Yuke Zhong, Dazhen Du, **Ruibo Fan**, Yuxin Chen, et al., "STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs," in *Proceedings of the 13th International Conference on Learning Representations (ICLR)*, 2025.

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

**Ruibo Fan**, et al., "SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs," in *Proceedings of the 20th European Conference on Computer Systems (EuroSys)*, 2025.

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

Weile Luo, **Ruibo Fan**, Zeyu Li, et al., "Benchmarking and Dissecting the Nvidia Hopper GPU Architecture," in *Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS)*, 2024.

DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores

**Ruibo Fan**, Wei Wang, and Xiaowen Chu, "DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores," in *Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)*, 2024.

Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks

**Ruibo Fan**, Wei Wang, and Xiaowen Chu, "Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks," in *Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS)*, 2023.

Service and leadership

Currently signed in to 43 different slack teams

Ruibo Fan

CV

Education

Work experience

Skills

Publications

ZipServ: Fast and Memory-Efficient LLM Inference with Hardware-Aware Lossless Compression

ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities

Dissecting the NVIDIA Hopper Architecture through Micro-benchmarking and Multiple Level Analysis

Exploiting Low-Level Sparsity for Efficient Large Language Model Inference on GPUs with SpInfer

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs

Benchmarking and Dissecting the Nvidia Hopper GPU Architecture

DTC-SpMM: Bridging the Gap in Accelerating General Sparse Matrix Multiplication with Tensor Cores

Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks

Talks

Conference Proceeding talk 3 on Relevant Topic in Your Field

Talk 2 on Relevant Topic in Your Field

Tutorial 1 on Relevant Topic in Your Field

Talk 1 on Relevant Topic in Your Field

Teaching

Teaching Assistant – Mathematics for Data Science

Teaching Assistant – Introduction to Computer Science

Teaching Assistant – Parallel Computing II

Service and leadership