Doctoral colloquium - Vladimír Macko (5.5.2025)
Monday 5.5.2025, at 13:10 hod. v miestnosti I/9
Vladimír Macko:
Advances in Neural Network Pruning and Quantization: Theory, Practice, and Sparse Computation
Abstract:
As neural networks grow in depth and parameter count, the development of efficient compression techniques has become a central research focus in machine learning. Among these, pruning and quantization have emerged as prominent strategies to reduce model complexity and computational cost while preserving predictive accuracy. However, significant open questions remain regarding the theoretical foundations, algorithmic trade-offs, and practical implementation challenges of these methods.
This talk surveys recent progress in neural network pruning and quantization from an academic perspective, with emphasis on algorithmic innovations, formal guarantees, and empirical performance across diverse model architectures and tasks. We further examine the computational implications of sparsity induced by pruning, particularly in the context of inference-time acceleration.
In addition, we present recent work on developing custom GPU kernels for sparse matrix multiplication, highlighting how low-level optimization can align with high-level model compression objectives. Through a combination of theoretical insights and experimental evaluation, we aim to shed light on the current state and future directions of efficient neural network research.