Terminology-Grounded Translation
Bridging the Gap Between Wikipedians and Scientists with Terminology-Aware Translation: A Case Study in Turkish
This project addresses the gap between the escalating volume of English-to-Turkish Wikipedia translations and the insufficient number of contributors, particularly in technical domains. Leveraging expertise from academics’ collaborative terminology dictionary effort, we propose a pipeline system to enhance translation quality. Our focus is on bridging academic and Wikipedia communities, creating datasets, and developing NLP models for terminology identification and retrieval, and terminology-aware translation. The aim is to foster sustained contributions and improve the overall quality of Turkish Wikipedia articles.
Goals
The project will focus on the following tasks:
-
High-quality parallel corpora for terminology-aware translation: We aim to generate 3,000 parallel sentences in English-Turkish containing the following: i)English text annotated with the technical terms, ii) links to correct terminology entries in the database, and iii) edited translations using the correct terminology with Turkish terms.
-
Term Identification: Build models to identify the technical terms in a multilingual setup.
-
Term Linking: Build models to ground the identified terms in a terminology database (if possible). In case the DB does not contain the term, make a notification system for the domain experts.
-
Terminology-Aware Translation: We will build post-editing and translation systems that will be constrained with the terminology database.
-
Build an Effective Communication Channel: We will survey both communities (Wikipedians and scientists) to identify the best practices to build the bridges, and the ways these two communities can help each other in a sustainable way. We will publish reports, best practices and guidelines.
Team
- PI: Gözde Gül Şahin
- Graduate student(s): Ali Gebeşçe
- Interns: Ege Uğur Amasya, Mina Durhasan
- Duration: 01.06.2024 - 31.05.2025
Funding
This project is funded by Wikimedia Research Fund. Official URL for the funded project is here.