Llama 3 vs. Llama 2: Why the Newest Model Leaves Its Predecessor in the Dust

Overall Findings

Released in July 2023.

Trained on smaller datasets.

Available models include 69B, 13B, and 6.7B.

Llama 2 vs. Llama 3

Context length of 4,096 tokens.

Primarily a text-only LLM.

Released in April 2024.

Trained on much larger datasets.

Much larger 128,000 token context length.

Available models include 405B, 70B, and 8B.

Supports up to 30 languages,

Designed to be multi-modal eventually.

Llama 2 launched in 2023 and was, at the time, Meta’s most capablelarge language model.

It has since vastly surpassed Llama 2 in every way.

Training: Llama 3 Has a Much Larger Set

Cost 22,000 petaflops a day to train.

Trained on two trillion tokens of data.

Trained on older hardware.

Trained on data up to 2023.

Mostly trained on English data.

Used so much hardware time that Meta had to limit model training.

Used millions of tokens of human input for fine tuning.

Trained on data up to 2024.

Upwards of 5% of data was not English-language.

The main advantage of Llama 3 is that it trained on more data.

It used over 15 trillion tokens, with extensive pre-training and human fine-tuning after the fact.

Meta introduced new training practices for the development of Llama 3 to optimize the process.

This process included automated error detection, as well as the use of newer hardware.

Llama 3 was much more expensive to train, though.

Performance: Llama 3 Is Faster

Models include 69B, 13B, and 6.7B.

Limited context window means it can’t work with as large datasets.

Is only competitive with older LLMs when it comes to accuracy and Turing tests.

Available models include 405B, 70B, 8B.

Handles complex tasks much more effectively.

Larger context window lets it work with much larger datasets.

Wins almost all head-to-heads in LLM performance against a range of opponents.

The Llama 3 LLM, and particularly the latest 3.1 version, are far more capable than Llama 2.

All of that additional training data makes Llama 3 far faster, too.

It’s used in Meta’s Facebook Messenger, and in the Whatsapp app in the US.

It is able to be used in real-time, and can deliver prompt responses to user inputs.

Capabilities: Llama 3 Canand WillDo More

Coding support is limited.

Almost exclusively a text-based LLM.

Will be able to handle multi-modal inputs and outputs in the future.

Can handle complicated coding tasks.

Llama 2 is almost exclusively a tool for text generation, with some coding generation capabilities.

Llama 3, however, is designed to be multi-modal.

It’s already excellent for text generation and coding and can accept some media inputs.

In the future, it will be capable of image and video inputs and outputs, too.

It’s faster, more powerful, and can simply do more.

If Llama 4 comes out someday, we expect it will similarly outpace Llama 3.

Meanwhile, this is one of the best LLMs currently available.

Overall Findings#

Training: Llama 3 Has a Much Larger Set#

Performance: Llama 3 Is Faster#

Capabilities: Llama 3 Canand WillDo More#

Overall Findings

Training: Llama 3 Has a Much Larger Set

Performance: Llama 3 Is Faster

Capabilities: Llama 3 Canand WillDo More