Apple researchers build multimodal LLM as AI strategy takes shape

This audio is auto-generated. Please let us know if you have feedback.

Dive Brief:

Apple built a multimodal large language model trained with a combination of data types and model architectures, according to a research paper published Thursday. Around 30 Apple employees were cited as authors.
Called MM1, Apple researchers used a mix of text-only data, images paired with captions and longer documents with relevant images to train the model. Researchers found fine-tuning the model with high-resolution photos improved outputs as well.
Researchers also identified a way to boost computing capacity. The approach creates multiple smaller sub-networks within the model and only a few are activated based on the specific prompt.

Dive Insight:

The iPhone maker's hesitancy to dive into the generative AI deep end could soon come to an end.

“In terms of generative AI… we have a lot of work going on internally,” CEO Tim Cook said during the company’s Q1 2024 earnings call in February. “Our M.O., if you will, has always been to do work and then talk about work, and not to get out in front of ourselves.”

Behind the scenes, the company has spent “a tremendous amount of time and effort” on AI, Cook said.

Apple reportedly restricted internal use of OpenAI’s ChatGPT and Microsoft-backed GitHub Copilot, citing concerns over confidential corporate data leakage in May 2023.

“It’s totally understandable because everything likely goes to Microsoft and Apple doesn’t want to leak all its secrets to Microsoft,” Ed Skoudis, president at SANS Technology Institute, told CIO Dive. But because off-the-shelf generative AI tools are easily accessible, companies should give employees alternatives, he said, and Apple is reportedly in the process of doing just that.

The company built a large language model and rolled out an internal chatbot to test it out, Bloomberg reported in July. The news outlet also reported Monday that Apple has discussed embedding Google’s Gemini or an OpenAI model to power iPhone AI features.

MM1 outperforms other multimodal large language models in captioning and visual question answering, according to Apple’s research.

Details on how Apple plans to introduce generative AI to its customers will come later this year, Cook said during the company’s earnings call in February. Apple’s developer conference is set to take place in June.

Apple researchers build multimodal LLM as AI strategy takes shape

Dive Brief:

Dive Insight:

After 30 years of code, Java remains an enterprise cornerstone

How Mondelēz laid the groundwork for a major digital overhaul

Company Announcements

After 30 years of code, Java remains an enterprise cornerstone

How Mondelēz laid the groundwork for a major digital overhaul

Reach our audience

Related Publications

Don't miss tomorrow's tech industry news

Apple researchers build multimodal LLM as AI strategy takes shape

Dive Brief:

Dive Insight:

CIO Dive news delivered to your inbox

Editors' picks

After 30 years of code, Java remains an enterprise cornerstone

How Mondelēz laid the groundwork for a major digital overhaul

CIO Dive news delivered to your inbox

Company Announcements

After 30 years of code, Java remains an enterprise cornerstone

How Mondelēz laid the groundwork for a major digital overhaul