Profiling Large Language Model Inference on Apple Silicon: A Quantization Perspective

This paper investigates Apple Silicon's unique memory architecture that offers a unified memory integrating CPU and GPU memory and its implications …
View full source