The standardization of Large Language Model (LLM) sizes to 3B, 7B, 13B, 35B, and 70B parameters is largely influenced by historical precedent and the need for comparability and consistency in research and performance evaluation. The original LLaMA models, which introduced these sizes, set a benchmark that subsequent models have adhered to. This consistency allows researchers to effectively compare new models against established baselines, ensuring that any improvements can be attributed to the changes made rather than differences in model size.
Another critical factor is the importance of hardware compatibility. Although these sizes do not perfectly align with common VRAM configurations, they were chosen to fit within practical limits for the hardware available at the time of their inception. This makes it easier for researchers and developers to work with these models without requiring specialized equipment, facilitating broader experimentation and adoption.
Additionally, these standardized sizes help streamline the evaluation process. Having models that are close in size, such as 7B and 8B, makes it easier to assess the performance impact of different architectures and training methodologies. This consistency is crucial for academic and commercial research, where demonstrating incremental improvements is often necessary.
In summary, the standardization of LLM sizes to 3B, 7B, 13B, 35B, and 70B is primarily due to historical precedent, the need for comparability, and practical considerations related to hardware compatibility. This approach ensures that new models can be effectively evaluated and compared, fostering ongoing advancements in the field.