Hugging Face launches Idefics2, an open-source visual language model with 8 billion parameters

In 2023, Hugging Face launched Idefics, a visual language model developed with DeepMind technology. 

Today, the company introduced an upgraded version, Idefics2, boasting a tenth of the original’s parameter size, improved Optical Character Recognition (OCR) capabilities, and better image manipulation. 

Idefics2 is now available on Hugging Face.

Idefics2 is a general multimodal model responding to text and image prompts. 

It is smaller than its predecessor, with a parameter size of 8 billion compared to 80 billion. 

It offers enhanced image manipulation, handling images up to 980 x 980 pixels without resizing. 

OCR capabilities have been improved, making it better at transcribing text from images or documents. 

The model’s architecture has been simplified, moving away from gated cross-attentions.

The training of Idefics2 involved a diverse range of openly available datasets, such as Mistral-7B-v0.1 and siglip-so400m-patch14-384, which are widely used in the AI community. 

In addition, other data sources, such as web documents, image caption pairs, OCR data, and image-to-code data, were integrated to ensure a comprehensive training process. 

This release follows a trend of introducing multimodal models in the AI industry alongside models like Reka’s Core model, xAI’s Grok-1.5V, and Google’s Imagen 2.

Hugging Face’s Idefics2 brings advancements in multimodal AI technology, offering improved performance in handling text and image inputs. 

With a smaller parameter size, enhanced OCR capabilities, and simplified architecture, it represents a significant step forward in natural language processing and computer vision.

The release of Idefics2 underscores Hugging Face’s commitment to innovation in AI technology. 

The company has developed a model that understands and processes text and images by using openly available datasets and integrating various data sources.

For users, Idefics2 offers a more efficient and accurate experience when interacting with multimodal AI systems. 

Its enhanced image manipulation capabilities ensure that images are processed in their native resolution, eliminating the need for resizing and preserving their original quality. 

The improved OCR capabilities enable better text extraction from images or documents, enhancing the model’s ability to comprehend and respond to textual prompts.

With its smaller parameter size, enhanced OCR capabilities, and simplified architecture, Idefics2 sets a new standard for processing text and image inputs efficiently and accurately. 

As the AI industry evolves, innovations like Idefics2 pave the way for further advancements in natural language processing and computer vision technologies.

