Evaluating LLMs’ Capability in Transcribing Arabic Text to IPA (Prof. Imtiaz Ahmad)
Computer Engineering Department
Large language models (LLMs) have been widely used by researchers for multiple tasks and natural language processing applications, ranging from translation to text generation. This study evaluates the capability of LLMs to transcribe Arabic text Modern Standard Arabic (MSA), Quranic Arabic, and dialects—into the International Phonetic Alphabet (IPA), a universal standard for representing phonemes, which is crucial for linguists, speech therapists, and computational linguists. Using three datasets, we assess the transcription accuracy of GPT-4, Gemini, LLaMA, Aya, and Arabic-specific LLMs such as ALLaM and Jais across phoneme, word, and sentence levels. Our analysis reveals that GPT-4 outperforms other models, achieving the highest transcription accuracy for Quranic Arabic, with a phoneme error rate (PER) of 0.29. Among other models, Gemini demonstrates superior performance in dialect transcription, especially for Moroccan Arabic, followed by LLaMA and Aya, with a significance difference among the models. For MSA, Gemini excelled with a Levenshtein distance of 18.0, and 75% of samples achieve distances of 25 or less. However, the models exhibit high error rates for word-by-word transcription, with a 91% average error rate and a ROUGE-L score of 0.24. For Quranic Arabic, GPT-4 achieved 49.27% accuracy and a ROUGE-L score of 0.94, consistently outperforming other models. Notably, Arabic LLMs such as ALLaM outperforms Jais in the Aleph subset, with a PER of 0.37, while Jais demonstrates weaker performance (PER = 0.72). This study highlights the strengths and weaknesses of current LLMs in Arabic IPA transcription. While models show promising results in phoneme-level transcription, especially for Levant Arabic and Quranic data, further work is required to enhance the performance.
Supervisor: Prof. Imtiaz Ahmad
Convener: Prof. Khalid Al-Zamel
Examination Committee: Prof. Ayed Salman