Pursing a PhD in Machine Learning to Contribute to NLP Research on Arabic and Syriac

Having worked in the technology industry in Iraq for several years, I have noticed that language-driven technologies and products take time to be introduced to the local market. And while there exist considerable e orts extended on creating tools for Arabic language processing, research conducted on Arabic and other languages like Syriac is inadequate to the requirement of the users of these languages. Contribution to the NLP research on these languages would help millions of their speakers have access to technologies whose importance become equal to that of electricity. Therefore, I plan to pursue a Ph.D. degree in Machine Learning via MILA program in order to contribute to this much-needed eld of study in Iraq, a country that is in dire need of non-traditional solutions to the countless number of problems it has for which Machine Learning is uniquely able to solve.

Studying at the University of Arizona, funded by the Fulbright Scholarship, has provided me with an equal balance of theory and practice of language and speech processing. I have learned about discriminative, generative and Bayesian models; and implemented important algorithms such as Viterbi, Shift-reduce, LSTM, and CNN using Python and Scala and frameworks such as Tensor ow. In addition, I have worked with acoustic analysis using Matlab and Praat as well as using Kaldi to work with simple speech processing assignments. Being part of the NLP reading group at UofA enabled me to present, discuss and implement state-of-the-art papers as well as collaborating with very talented scholars and researchers. One class I found extremely helpful was Algorithms for NLP as introduced topics like semi-supervised learning, domain adaptation, and sequential learning as well as exploring a blend of old and recent ideas such as BiLSTM-CRF, Tree Convolution, Ladder Networks, and transformer networks.

In addition to the classes that I have taken at UofA, interning at Lum.ai was a transformative experience. The sta was very helpful and exible enough to allow me to implement my own ideas. One task I worked on aimed to retro t word embeddings using etymological information in an unsupervised manner. For example, given the word telescope, the algorithm aims to compose a new representation out of its vector components: `tele’ and `scope’ while simultaneously preserve the original distributional space. While the idea was promising theoretically, it did not work in practice. It was mainly because the vectors of the etymological root (obtained from Word2Vec or GloVe) do not encode their meaning. For example, the vectors of the roots Tele and Scope do not re ect the meaning of `remoteness’ and `Seeing’ respectively. Another task I worked on was Machine Comprehension SQUAD 2.0 competition on which I tried simple two-stage logistic regression classi er with linguistically-rich features that achieved 0.71 F1 score [unpublished yet], a score that competes with state-of-the-art attention-based neural models. Part of my job was replicating the results shown in research papers, and It was surprising the number of studies that I found irreproducible due to reasons like the lack of performing statistical signi cance missing details on their coding environments.

I am planning to direct my research interests towards semi-supervised learning methods such as bootstrapped learning, Co-training (Blum and Mitchell, 1998), and apply them on advanced tasks such as question-answering and relation extraction in under-resourced languages such as Arabic and Syriac. Semi-supervised methods have demonstrated success in wide-range of NLP tasks[cite], and very few studies in the literature have re-employed them in more recent neural contexts. Recent breakthroughs in word representation such as word embeddings, Elmo, BERT would yield a great result if combined with semi-supervised models. I also would like to investigate the applications of reinforcement learning in natural language processing as recent research papers show promising results on di cult tasks like abstractive summarization. On the other hand, languages like Arabic and Syriac have been underserved for reasons beyond the scope of this statement. Rarely do we nd studies conducted on advanced tasks in Arabic, and little to nearly no work has been done on Syriac. The latter lacks the essential tools such as tokenizers, part-of-speech tagger and named entity recognizer; let alone advanced tasks such as speech recognition and synthesis. Such languages encode long periods of history across decades, and mining these resources using current deep learning techniques would help restore important achievement of mankind across time. These advancements also make NLP-driven products and services more accessible to the hundreds of thousands of speakers of these languages.

The triangulation of expertise, inspiration, and resources available at MILA is what I need to conduct cutting-edge research in a challenging and interdisciplinary area like natural language processing. In particular, I aspire to work with professor Will Hamilton and professor Jian Tang as their research projects on representation learning and reinforcement learning in natural language understanding are very much aligned with my own, and collaborating with them would help me bring me closer to my long-term goals.

This example shows a draft research-focused personal statement, but it is not in a submission-ready form. The content is strong, but the writing contains formatting errors, unclear structure, and sections that read more like technical notes. Students should use it only as a rough illustration of ideas and should ensure their own personal statements are carefully edited, clearly structured, and professionally presented before applying.

Essay: Pursing a PhD in Machine Learning to Contribute to NLP Research on Arabic and Syriac

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: