Navigating the landscape of local base models can be a daunting task given the rapid advancements in technology. Fortunately, the June 2024 edition of the best local base models by size provides clear guidance based on extensive testing and community feedback. This informal yet insightful testing involved datasets, various judges like Opus, GPT-4, and Gemini Pro, and comparisons using the LMSYS chat arena. The chat arena is highly recommended for comparing different models and gaining a better understanding of their performance in real-world scenarios.

For models 70B and above, Llama-3 70B stands out distinctly. Despite the recent release of Qwen2, Llama-3 70B outperforms even larger local models, proving its exceptional capabilities. In the 35B and under category, Yi 1.5 34B is the surprising frontrunner. Although earlier versions and fine-tunes of Yi 34B were underwhelming, Yi 1.5 34B outshines its peers, leaving Command-R as a distant competitor due to its lack of GQA and higher VRAM usage.

When it comes to models under 20B, Llama-3 8B is unparalleled. While Mistral offers excellent fine-tunes for specific tasks, Llama-3 8B's rapid advancements in fine-tuning make it an incredible choice for its size. Conversely, Phi Medium is a disappointment despite its larger size, failing to match up in performance even with similar datasets. However, Phi Mini impresses in the 6B and under category, offering remarkable performance on low-power devices, making it an excellent choice for mobile applications.

Special mentions go to paid models like Opus, GPT-4, and the updated Gemini Pro 1.5. Gemini Pro 1.5's improvements bring it close to the leaders, with certain tasks seeing it outperform Opus. Additionally, the Yi Large preview model shows promise, often ranking close to these top-tier paid models in blind tests conducted in the chat arena.

This guide is intended to help users and fine-tuners keep pace with the rapid evolution of local language models. Whether you're looking for a model for creative writing, programming, or other specific tasks, these recommendations provide a solid starting point. Remember, as with any technology, personal experience and specific use cases will significantly influence which model is best for you.