Connect with us

Tech

Mistral unveils Pixtral 12B, a multimodal AI model that can process both text and images – SiliconANGLE

Published

on

Mistral unveils Pixtral 12B, a multimodal AI model that can process both text and images – SiliconANGLE

Mistral AI, a Paris-based artificial intelligence startup, today unveiled its latest advanced AI model capable of processing both images and text.

The new model, called Pixtral 12B, employs about 12 billion parameters and is the first of its models capable of vision encoding, making it possible for it to “see” images alongside text.

The new model is based on Mistral’s Nemo 12B, an AI model previously released by the company capable of understanding text, with the addition of a 400 million-parameter vision adapter. The adapter allows users to add images through URLs or encode them via base64 within the inputted text.

Many other AI large language models have also added multimodal capabilities that allow users to input images such as Anthropic PBC’s Claude family, OpenAI’s GPT-4o and Google LLC’s Gemini. The addition of image reasoning capabilities to Pixtral 12B should provide it the ability similarly to answer questions about images, provide captioning, count objects and more.

The company released the parameters and code via a torrent link on GitHub and the AI distribution platform Hugging Face. The company has encouraged developers to start downloading and using it.

Now that the model is available for download, developers will be able to fine-tune and train the model for their own purposes. The company offers some of its models open-source under the Apache 2.0 license without restrictions. For others, Mistral offers a dev license that is free for development, but requires a paid license for commercial applications, but not for research uses. The company has not clarified what license Pixtral 12B will fall under.

 Sophia Yang, head of Mistral developer relations, said in a post on X, that the model will soon be available for testing on Mistral’s chatbot and application programming interface platforms, Le Chat and Le Platforme.

Image: Pixabay

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU

Continue Reading