HPT 1.5 Air: Best open-sourced 8B Multimodal LLM with Llama 3
May 3rd, 2024 - HyperGAI Team

TL;DR: We released HPT 1.5 Air, the best open-sourced 8B multimodal LLM with Llama 3. HPT 1.5 Air achieved impressive performances on a wide range of benchmarks, even outperforming bigger, proprietary models in some occasions. HPT 1.5 Air is publicly available on Huggingface and Github. With a powerful contextualized understanding of multimodal inputs and superb reasoning abilities, HPT 1.5 Air is breaking the barrier between open-sourced and proprietary models.

Overview

We are excited to announce the release of HPT 1.5 Air, the best open-sourced 8B multimodal LLM based on our previously released HPT Air architecture. HPT 1.5 Air sets a new standard for efficacy, efficiency, and transparency. With only a total of ~8.5B total parameters, HPT 1.5 Air belongs to the small model category (<10B), yet it can punch above its weight, outperforming bigger, proprietary models that have more parameters in several occasions. We now highlight the exciting new changes in HPT 1.5 Air:

Improved visual understanding and complex reasoning. HPT 1.5 Air can work well on real-world scenarios while maintaining competitive performances on other types of inputs such as chart and diagrams.
Impressive performance. HPT 1.5 Air is the best multimodal Llama 3 in the market, even outperforming bigger, proprietary models on several benchmarks.
Transparency. We released HPT 1.5 Air with all of its components publicly available under the Apache 2.0 license.

Model Architecture

HPT 1.5 Air follows the similar recipe as its predecessor, HPT 1.0 Air, with a visual encoder, the novel H-Former, and an LLM. Compared with HPT 1.0 Air, we upgraded the visual encoder and changed the LLM to the latest LLaMA 3 8B version, and trained on an improved larger dataset mixed of image and text data. Thus, the new HPT 1.5 Air is more powerful and capable, open-sourced and fully available at both Huggingface and Github, empowering developers in building various real-world applications.

Benchmark Performance

We compare HPT 1.5 Air with many competitors across a wide range of benchmarks. Overall, HPT 1.5 Air achieved the best results in the multimodal LLM with less than 10B parameter category. Interestingly, HPT 1.5 Air even outperforms bigger or proprietary models such as LLaVA-Next, GPT-4V, and Gemini 1.0 Pro in several benchmarks such as SEED-I, SQA, and MMStar. In the following table, we provide a comprehensive comparison of HPT 1.5 Air, highlight the best results in bold, and underline the second-best results within the open-sourced category.

Examples

With the improved visual understanding and complex reasoning capabilities, HPT 1.5 Air demonstrated an impressive performance in many scenarios. In the following, we provide several examples showcasing its ability to understand social references, solving complex visual math problems, and operating well in real-world environments.

The Bottom Line

With the full release of HPT 1.5 Air, we‘re eager to see what people can create with it! Additionally, our HPT Pro models are under training with many impressive features such as better OCR capabilities, multiple images understanding, support for higher resolution inputs, and many more. You can join our waitlist to get early access and latest updates on our HPT Pro series.

How to Access HPT

Open-source release of HPT 1.5 Air

Github repo: https://github.com/hyperGAI/HPT
HuggingFace: https://huggingface.co/HyperGAI/HPT1_5-Air-Llama-3-8B-Instruct-multimodal

HPT 1.5 Air: Best open-sourced 8B Multimodal LLM with Llama 3May 3rd, 2024 - HyperGAI Team

HPT 1.5 Air: Best open-sourced 8B Multimodal LLM with Llama 3
May 3rd, 2024 - HyperGAI Team