Models / Gherbal
Language Identification v4

Gherbal

State-of-the-art language identification that outperforms models many times its size. The only LID model that identifies all Arabic dialects, detects Arabizi, and covers dozens of African languages competitors miss.

Architecture
FastText (quantized)
Model Size
200 MB
Languages
214
Avg Accuracy
0.836
Arabic Dialects
16 identified
Arabizi Detection
96–98%
Benchmarks Evaluated
8
Models Compared
10

Benchmark results

How Gherbal compares

ModelSizeLanguagesAvg AccuracyArabic Dialects
Gherbal v4200 MB2140.83616 / 16
OpenLID v21,230 MB2010.8248 / 16
GlotLID1,690 MB2,1020.8035 / 16
NLLB-LID1,180 MB2180.7111 / 16
OpenLID v11,230 MB2010.8086 / 16
FastText-176131 MB1760.5100 / 16

Arabic Dialect Coverage

Only model to identify all 16 Arabic dialect variants tested — Darija, Egyptian, Gulf, Tunisian, Hassaniya, and more. Competitors identify 0–8.

Arabizi Detection

96–98% accuracy on Latin-script Darija. Every competing model scores exactly 0%. Critical for North African social media analysis.

Lightweight Deployment

Deployable on mobile, serverless, and browser environments. Dramatically better accuracy-per-MB than competing models — production-grade results at a fraction of the resource cost.

Full coverage

214 Languages, 28 Writing Systems

Every language Gherbal can identify — search by name or code, filter by writing system.

Aa
Aa

Latin

131 languages
AchineseAfrikaansTosk AlbanianMoroccan ArabicAsturianCentral AymaraNorth AzerbaijaniBambaraBalineseBembaBanjarBosnianBugineseCatalanCebuano
ع
ع

Arabic

29 languages
Baharna ArabicAchineseMesopotamian ArabicTa'izzi-Adeni ArabicOmani ArabicTunisian ArabicGulf ArabicLevantine ArabicSudanese ArabicStandard ArabicAlgerian ArabicNajdi ArabicMoroccan ArabicEgyptian ArabicLibyan Arabic
Кк
Кк

Cyrillic

12 languages
BashkirBelarusianBulgarianKazakhHalh MongolianKirghizMacedonianRussianSerbianTatarTajikUkrainian

Devanagari

10 languages
AwadhiBhojpuriHindiChhattisgarhiKashmiriMagahiMaithiliMarathiNepaliSanskrit
Bengali
AssameseBengaliManipuri
Ethiopic
AmharicTigrinya
Tibetan
TibetanDzongkha
Chinese
Mandarin ChineseChinese
က
က Myanmar
BurmeseShan
Tifinagh
TamasheqStandard Moroccan Tamazight
א
א Hebrew
HebrewEastern Yiddish
Ω
Ω Greek
Modern Greek
Gujarati
Gujarati
Ա
Ա Armenian
Armenian
Japanese
Japanese
Kannada
Kannada
Georgian
Georgian
Khmer
Khmer
Lao
Lao
Malayalam
Malayalam
Odia
Odia
Gurmukhi
Panjabi
Ol Chiki
Santali
Sinhala
Sinhala
Tamil
Tamil
Telugu
Telugu
Thai
Thai
Hangul
Korean

Try Gherbal now

Send text, get language identification results. Fast inference across Arabic dialects, Arabizi, and African languages.

Open Playground