Demand for generative AI (Gen AI) and large language model (LLM) is rising rapidly, driven by the emergence of ChatGPT, a chatbot developed by OpenAI. For their large scale as well as the massive data sets and resources required to train LLMs, cloud service providers (CSP) are generally adopting the method of combining inference and prompt engineering for their AI solutions to support clients' customization needs.
As such, cloud inference has now become the primary running model for LLMs. However, as language applications mostly require instant responses and need to support huge simultaneous usages by multiple users, only large clusters of high-speed interconnected AI servers can perform LLM inference that satisfies most of the usage scenarios.
First-tier CSPs are aggressively deploying Gen AI cloud services. Apart from the commonly known creation of content such as texts, images, documents, and codes, CSPs have also been actively promoting Gen AI platform as a service (PaaS), providing users with pre-trained models, prompt engineering tools, and all types of APIs to allow enterprises to quickly create customized application tools.
Table 1: Generative AI adoption by cloud services and hardware
Table 2: Four major training processes needed for complete LLM
Table 3: Comparison of LLM and deep learning model development
Table 6: Advantages and disadvantages of each LLM business model
Table 7: Comparison between traditional servers and AI servers in LLM training