CONNECT WITH US

AI PCs and architectures

Jim Hsiao, DIGITIMES Research, Taipei

Credit: DIGITIMES

Subscribe to Research Notebooks to read this report or purchase it separately. Subscribe Now
DIGITIMES Research observes that on-device large-scale AI model inference is determined not only by the computing performance of xPU, but also by model compression and memory bandwidth all of which will affect the inference performance of AI PCs.
Abstract

As Microsoft, Meta and others actively launch lightweight AI models, and notebook processor vendors introduce system architectures and designs to enhance AI computing performance, AI PCs to be launched in 2024 will be able to execute multiple generative AI tasks offline.

The original versions of large language models (LLM) or large vision models (LVM) cannot be run on notebooks due to their huge volumes and enormous demands for computing power. Through compression technologies, such as model pruning and knowledge distillation, AI models with tens of billions of parameters can be compressed to one-tenth of the original. By quantizing parameters, the model can be further compressed by a factor of four or eight, effectively compressing the large models for use in notebooks with certain accuracy.

The general matrix multiplication (GEMM) and general matrix vector multiplication (GEMV) algorithms, which are the most important components of large-scale AI models, are compute bound and memory bound respectively.

Table of contents
Price: NT$29,250 (approx. US$900)