Microsoft unveiled the first of three compact versions it intends to deliver, the Phi-3 Mini, the company’s latest lightweight AI model.
3.8 billion parameters make up the Phi-3 Mini, which was trained on a smaller dataset than large language models such as GPT-4. Hugging Face, Ollama, and Azure are currently selling it. Microsoft intends to make Phi-3 Medium and Phi-3 Small available. A model’s parameter count is the number of complex instructions it can comprehend.
In December, the business unveiled Phi-2, which outperformed larger models like Llama 2. According to Microsoft, Phi-3 outperforms the previous iteration and can yield results that are comparable to those of a model ten times larger.
Phi-3 Mini is as capable as LLMs like GPT-3.5, according to Eric Boyd, corporate vice president of Microsoft Azure AI Platform, “just in a smaller form factor.”
Smaller AI models operate more affordably and deliver superior performance on mobile devices such as laptops and phones than their larger versions. Earlier in the year, The Information revealed that Microsoft was assembling a team dedicated to creating AI models that are more lightweight. In addition to Phi, the business has developed a math-focused model called Orca-Math.
Small AI models are also available from Microsoft’s rivals; the majority of these models are intended for easier tasks like document summarizing or coding support. Gemma 2B and 7B from Google are useful for language-related tasks and basic chatbots. Claude 3 Haiku from Anthropic can swiftly explain complex research papers with graphs, while Llama 3 8B, which Meta recently released, can be used for certain chatbots and coding help.
According to Boyd, developers used a “curriculum” to train Phi-3. Their inspiration came from the way kids picked up knowledge from bedtime stories, simpler-word novels, and language structures that addressed more complex subjects.
Boyd explains, “There aren’t enough children’s books out there, so we took a list of more than 3,000 words and asked an LLM to make ‘children’s books’ to teach Phi,”
Phi-3, he continued, only built on what earlier iterations had discovered. Phi-3 is superior at both coding and reasoning, whereas Phi-1 concentrated on coding and Phi-2 started to learn to reason. A GPT-4 or another LLM can outperform the Phi-3 family of models in terms of breadth, even if the former can learn more general information than the latter. An LLM trained on the full internet will yield very different answers than a smaller model like Phi-3.
According to Boyd, businesses frequently discover that smaller models, such as Phi-3, perform better for their own applications because many businesses already have very modest internal data sets. Furthermore, these models are frequently significantly more economical because they require less processing power.