As Microsoft, Meta and others actively launch lightweight AI models, and notebook processor vendors introduce system architectures and designs to enhance AI computing performance, AI PCs to be launched in 2024 will be able to execute multiple generative AI tasks offline.
The original versions of large language models (LLM) or large vision models (LVM) cannot be run on notebooks due to their huge volumes and enormous demands for computing power. Through compression technologies, such as model pruning and knowledge distillation, AI models with tens of billions of parameters can be compressed to one-tenth of the original. By quantizing parameters, the model can be further compressed by a factor of four or eight, effectively compressing the large models for use in notebooks with certain accuracy.
The general matrix multiplication (GEMM) and general matrix vector multiplication (GEMV) algorithms, which are the most important components of large-scale AI models, are compute bound and memory bound respectively.
Table 1: Three major challenges AI PCs facings when running LLM
Table 2: Parameters used by LLMs before/after compression and key technologies
Chart 1: LLM parameters, precision, performance and memory demand
Table 4: Changes in notebook hardware to improve AI efficiency
Chart 2: Parameter scales of generative AI applications (b units with INT4 precision)
Table 5: Computing performance comparison among major AI PC platforms, 1H24
Table 6: Memory specification comparison among major AI PC platforms, 1H24
Table 7: AI notebook shipments and specifications by price sector, 2024 (m units)