Meta defends opposition to ‘mixed quality reports’, and blames the mistakes


Join our daily and weekly newsletters for the latest updates and exclusive content in the industry’s leading AI coverage. Learn more


Meta’s new flagship AI language Model Llama 4, weekend suddenly cameWith Facebook, Instagram, WhatsApp and Home Company Quest vr (Among other services and products) one, not two, not two versions – a new learning method and a new learning method covering stable hyperparameters known as the popular « mixed-professionals » architecture and metaph.

Also, all three are equipped with mass context windows – the amount of information that an AI language model can manage with an access / exit exchange with a user or tool.

However, after the surprise announcement and public release of two of these models for download and use – low-parameter Llama 4 scout and Medium Llama 4 Maverick – was less to adore the response of the AI ​​community on social media.

Llama 4, Sparks of confusion and criticism among AI users

Unconfirmed post The North American Chinese Community Forum set out at 1point3acres R / Locallama Subreddit Reddit claiming to be a researcher in the Genay Organization of Meta, claimed that the model was in third party criteria and management of this company « Mixed test sets from various criteria during various criteria during relevant tests are to meet the goals of different dimensions and to draw a ‘presentable’ result.

The writing was met with skepticism in the authenticity and did not respond to a meta spokesperson.

However, other users found the reasons to doubt the criteria in comparison.

« At this point, I am very doubting that Meta has tied something in the released weights … if not, it has been working on it and then said they should use money to get noous« @ CTO_JUnior commented on @ Cto_junior, referring to an independent user test showing 4 Maverick’s weak performance (16%) Benchmark known as Aider Poliglot225 is a model that works in a model with coding task. Performance of comparable sizes, old models of old models like DeepSEEK v3 and ClaD 3.7 Sonnet.

10 million Token Context window boasted meta llama 4 scout, AI PHD and author Andriy Burkov wrote in X Partial: « The declared 10 m context is virtual because no model is not taught for a longer period of 256K token.

Also R / Locallama Subreddit, the user wrote Dr_karminski « I am incredibly disappointed from the llama-4,« And a heptagon demonstrated a weak performance compared to the inappropriate V3 model of the gatherings that simulates the balls that simulate balls.

Old Meta Researcher and Current AI2 (Artificial Intelligence Institute) Chief Researcher Nathan Lambert received Its interactions To record a benchmark comparison of a benchmark placed by meta on Monday, download other models of third party to other models of the head-on comparison to other models of llama 4 Maverick ____ Aka Chatbot Arena really used differently Llama 4 Maverick also presented the company to the public – one is « optimal for negotiations. »

As Lambert writes: « Sneaky. The following results were fake, this is a very open model of the model, which is maximally in the maximum extent of the model as the model of the model they used to create great marketing pushes. »

Lambert continued to note that although this special model in the arena « Tanking the technical reputation of releasing as an unparalleled Including lots of emotional dialogue and lightly emotional dialogue « The actual model in other hosting providers is quite smart and has a reasonable tone! »

In response to the selection of criticism and benchmark cooking charges, Meta VP and Genay President Ahmad al-Dahle took to X:

« We are glad to start buying Llama 4. We are already hearing a large number of great results, people are taking these models.

He said that we also hear some information about the mixed quality between various services. Since we are ready for models, we expect the models to take several days to collect all public apps. We will continue to work with our bug fixes and onboarding partners.

We have also heard the allegations we train on test sets – it’s just not true and we will never do it. Our best understanding is that the variable quality people see, according to the need to stabilize applications.

We believe that the Llama 4 is a significant progress and look forward to opening their values ​​with society.« 

Yet this answer was met with many Weak performance complaints and requires more information more Technical documents There were additional questions about the model and their training processes and their training processes, as well as additional questions that all of this release were added to the previous LLAM release Especially found on issues.

In addition, the investigation (fair) organization (fair) organization (fair) organization (fair) organization (fair) organization (fair) organization (fair) organization (fair) organization (fair) organizational (fair) organization is on the heights of the screws in Joelle Pineau. The company is going from Last week with « nothing but admiration and managers except for » in LinkedIn. Pineau also needs to be noted Llama encouraged the release of 4 model families This weekend.

Llama 4 continues to spread with mixed outcomes to other results, but the initial release of the model family was not slam with the AI ​​community.

And the upcoming Meta Llamacon on April 29The first record and collection for third-party developers of the model family will probably feed for discussion. We will follow all, we will be adjusted.



Source link

Leave a Reply

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *