When we talk about open source AI, the term often doesn't mean what we think it does. Unlike traditional open source software, where you can access, modify, and distribute the source code, AI models, particularly large language models (LLMs), present a different challenge. The weights of these models, which are essentially the learned parameters, are often shared as an inscrutable blob. This means that even if you have access to the weights, understanding and modifying them is not straightforward.
Robert Miles highlights this issue in his discussion, pointing out that simply having access to the weights does not equate to true openness. To genuinely replicate or understand an AI model, one would need access to the entire training process, including the data, the order of data presentation, and the specific techniques and software used. Even with all this information, the model remains a black box, making it difficult to truly grasp how it functions.
This distinction is crucial because it impacts how we perceive and utilize these AI models. While companies like Meta may claim to offer open source models, the reality is that without full transparency and reproducibility, the term "open source" loses its traditional meaning. This is not just a semantic issue; it has practical implications. For instance, if a company decides to change the terms of service or discontinue a model, users who depend on it could find themselves in a difficult position.
Moreover, the licensing of the weights matters significantly. Can you fine-tune the model for your specific needs? Are you allowed to use the outputs to train new models? Is commercial use permitted? These questions are vital for developers and businesses relying on these models. Therefore, while the spirit of open source might be present, the execution often falls short, leading to a situation where users are still at the mercy of the providing company.
In conclusion, while the idea of open source AI is appealing, it's essential to recognize its limitations. True openness would require not just access to the model weights but also a comprehensive understanding of the training process and the freedom to modify and use the model as needed. Until then, the term "open source AI" remains more of an illusion than a reality.