Early this year, Microsoft introduced the Phi-3 family of small language models. Today, Microsoft introduced Phi-4, a 14B parameter state-of-the-art small language model (SLM) that even beats OpenAI’s GPT-4 large language model in MATH and GPQA AI benchmarks.
Microsoft claims that Phi-4’s strong performance on math-related reasoning is due to the use of high-quality synthetic datasets, curation of high-quality organic data, and post-training improvements. Synthetic data for training was generated using several techniques, including multi-agent prompting, self-revision workflows, and instruction reversal, and the generated synthetic data constitutes the bulk of the training data for Phi-4. Microsoft also used techniques such as rejection sampling to refine the model’s outputs during the post-training process.
In the Phi-4 technical paper, Microsoft also addressed the concerns around the leakage of benchmark test sets via the web. Microsoft has improved the data decontamination process for Phi-4 to ensure no unfair influence on evaluation results. To confirm this, Microsoft tested the Phi-4 model on the November 2024 AMC-10 and AMC-12 math competitions, which occurred after Microsoft’s training data was collected.
As you can notice in the image below, Phi-4 outperforms both similar-size or open-weight models and also larger frontier models, including Gemini 1.5 Pro. Through this test, Microsoft claims that Phi-4’s top-tier performance on the MATH benchmark is not due to overfitting or contamination.
Phi-4 also comes with weaknesses since it is still fundamentally limited by its size. It will hallucinate around factual knowledge, and it is less proficient at rigorously following detailed instructions. For model safety evaluation, the Phi-4 team worked with the independent AI Red Team (AIRT) at Microsoft to identify safety and security risks posed by Phi-4 in both average and adversarial user scenarios.
Phi-4 is now available on Azure AI Foundry under a Microsoft Research License Agreement (MSRLA). Microsoft will also release Phi-4 on Hugging Face next week.