Building a ChatGPT for the Arab World
Meet the companies and countries that are leading the way
Welcome to FWDstart! š¢
This weekās deep-dive is a big one, as we explore the companies bridging the gap between the Arab-speaking world and advancements in AI.
If you know someone who would be interested reading, please feel free to share the newsletter with them here.
And a reminder, if you haven't subscribed yet, join readers from 500 Global, Speedinvest and Antler getting a FWDstart twice weekly.
AI is taking over the world, gradually transforming how we work and live.
Yet, many Arabic speakers risk being left out.
A few years ago, Mohammad AlSharekh, the renowned Kuwaiti entrepreneur who brought Arabic to computing, noticed something alarming.
Many Arabs had stopped using dictionaries.
They were too outdated and complicated, filled with archaic words and definitions.
AlSharekhās solution was the release of Sakhr Software Companyās online Modern Arabic Dictionary, featuring 50 million Arabic vocabularies for everyday use.
In a TEDX Talk in 2018, he delivered a clear message: only Arabs can address the challenges facing the Arabic language.
In recent years, local founders and companies have embraced this challenge, to drive an Arabic AI revolution.
š Bridging the Arabic AI gap
When ChatGPT was released, it was a total revelation.
But there was a big problem - it struggled enormously with Arabic.
Given that Arabic is spoken by over 400 million people, this is remarkable for all the wrong reasons.
The underperformance of major LLMs can be put down to a variety of factors:
Arabic is a complicated language: Itās filled with diacritical markings and an inflected letter system. Letters can take up to three shapes depending on their position and are often connected, which computers can struggle to make sense of.
Lack of online content: LLMs need training on vast amounts of digital text. Although Arabic is the fourth-most-spoken language globally, it makes up less than 1% of internet content. Thatās not a lot for AI developers to play around with.
Diverse dialects: There are at least 25 dialects. Some are similar, but others can be difficult to understand even for Modern Standard Arabic speakers.
Map of Arabic dialects
Add all of that together and you come out the other side with a language thatās harder to represent in a coding model than most others.
But if thereās one common thread that has emerged throughout this newsletter to date, itās that MENAās founders love a challenge.
And they donāt come much bigger that bridging the gap between the Arab-speaking world and advancements in AI.
š¢ Finding their voice
Back in 2021, two founders from Egypt launched a startup called Intella, on a mission to do just that.
Meet Nour Taher and Omar Mansour.
Their startup has developed an Arabic speech-to-text AI model that localises AI across all Arabic dialects.
Nour Taher and Omar Mansour
And it doesnāt just talk the talk - the platform has a 95.70% average accuracy across 25 Arabic dialects.
Their speech-to-text and analytics models use advanced AI to continuously enhance accuracy and efficiency by processing large datasets of various Arabic accents and dialects.
This has powered it to surpass industry leaders including Googleās speech-to-text, ChatGPT maker OpenAIās Whisper, Meta Platformās SeamlessM4T and IBM's Watson.
Intella Voice can be used in chatbots, voice assistants, customer service centres, emergency hotlines, and IVR systems.
And the company plans to expand into audio analytics, including summarisation and sentiment analysis
Nour and Omar are by no means alone.
Many MENA founders are committed to enhancing Arabic representation in AI.
Other startups like Maqsam, Uktob, ClusterLab, and LisanAI, to name but a few.
The importance of ensuring Arabic catches up, cannot be overstated.
š The importance of an Arabic LLM
The genie is out of the AI bottle.
There are only going to be more and more people in Arabic-speaking countries turning to AI to complete tasks over the coming months and years.
But why is it so important that Arabic doesnāt fall behind in the AI race in the first place?
1. Productivity and education
Made with DALLĀ·E
AI is amazing when it comes to automating tasks, analysing data, and helping you rethink whether sending that faintly passive aggressive email is really the wisest decision.
Okay but seriously, to drive productivity gains in Arabic-speaking countries, supporting a variety of dialects is crucial, as they are more commonly used than Modern Standard Arabic in business.
The same goes for education, where if AI is to truly deliver on the promise of highly personalised and adaptive learning journeys for students - tailoring to local dialects will be crucial to their success in MENA context, where the need is significant given that 59% of children are in learning poverty.
AI needs to meet Arabic speakers where they already are.
2. Language preservation
Made with DALLĀ·E
As weāve mentioned, thereās a significant shortage of Arabic content online.
While AI has the potential to remedy this by increasing the amount of Arabic content available, an unintended consequence arises when mainstream LLMs with poor Arabic skills produce low-quality text.
These models learn from internet data, so if they consume this poor-quality text, the Arabic language suffers.
This could make future AI models struggle with Arabic even more, harming the language's quality and preservation.
3. Representation and cultural nuance
The bias of Stable Diffusion (Bloomberg)
LLMs are trained on billions of examples of human language in all its flawed glory.
However, not every culture is represented inclusively, especially since Arabic content accounts for only 1% of online content.
Consequently, LLMs often learn about Arab culture from non-Arab perspectives, potentially incorporating unfair or untrue biases, and missing important cultural nuances.
To ensure accurate and fair representation, it is crucial to increase high-quality Arabic content and involve native Arabic speakers in the training process.
ā°ļø A new peak
In recent months, the MENA region has made significant strides in advancing Arabic AI, most notably with the release of G42ās Jais (named after the UAE's highest peak, Jebal Jais).
Jais chat interface
This 13-billion parameter model was trained on a unique dataset of 116 billion Arabic tokens, capturing the complexity and richness of the language.
To ensure comprehensive training, the model also included 279 billion English tokens, resulting in a fully bilingual LLM that supports a variety of Arabic/English digital services.
The project was a collaboration between Inception, Mohamed bin Zayed University of Artificial Intelligence, and AI chip maker Cerebras Systems.
Open-sourced under an Apache 2.0 license and available via Hugging Face, Jais enables AI professionals, developers, and researchers worldwide to create their own use cases.
Jais has already been integrated into government, finance, energy, climate, and healthcare sectors in the UAE, with launch partners including the UAE Ministry of Foreign Affairs, Ministry of Industry and Advanced Technology, Department of Health ā Abu Dhabi, ADNOC, Etihad Airways, First Abu Dhabi Bank (FAB), e&, and Mubadala Investment Company.
Microsoft CEO Satya Nadella introduces Model-as-a-Service at Microsoft Ignite (Image credit: Talal Al Kaissi, Core42)
But Jais is by no means alone, in May alone there was a flurry of Arabic AI model announcements, with Saudi Arabia, Qatar, and Huawei all announcing major developments.
The Saudi Data and Artificial Intelligence Authority and IBM launched 'ALLaM,' an open-source Arabic LLM on IBM's watsonx platform.
Qatar introduced "Fanar," a gen-AI Arabic language model developed in collaboration with the Ministry of Communications and Information Technology, Qatar Computing Research Institute, and other partners.
Huawei unveiled a 100+ billion parameter Arabic language model with 96% accuracy in Arabic speech recognition (ASR) tests. Trained on MSA and diverse data from the Arab world, it covers local culture, history, customs, and industry-specific knowledge like oil, gas, and financial services.
š¤ Whatās next?
The introduction of Arabic language models like Jais is a game-changer.
Until recently, developers lacked high-quality Arabic AI models.
Now, they can build specifically for the Arabic-speaking world's companies, consumers, cultures, and locations.
There's still a long road ahead, but rather than diminishing Arabic, AI might actually strengthen it.
š Message from the team
Thanks for reading this weekās edition!
If youāre enjoying the newsletter, donāt forget to share it with a friend!
Have a question or any feedback? Just hit reply, or provide a rating below - we want to hear from you!!
How was this newsletter edition?Rate it and shell out your feedback! |
Was this forwarded to you? Sign up here.