Greek NLP Research

Greek NLP

Resources and Tools for Modern Greek Language Technology

Overview

CLLT specializes in developing computational resources and tools for Modern Greek and its dialectal varieties. Our work addresses the critical gap in NLP resources for Greek, which remains significantly under-resourced compared to major European languages.

We focus on creating high-quality datasets, benchmarks, and models that capture the linguistic richness of Greek, including its standard form and regional dialects.

Greek Regional Dialects Dataset (GRDD)

GRDD & GRDD+

The Greek Regional Dialects Dataset (GRDD) is a comprehensive corpus covering four major dialectal varieties of Greek: Cypriot, Pontic, Cretan, and Northern Greek. This resource enables NLP research on low-resource language varieties.

Key Features:

  • Multi-dialectal coverage
  • Fine-tuning experiments across multiple model architectures
  • Publicly available for research

View Paper Access Dataset

Greek Dialects

Dialectal Llama-krikri

Specialized for Cypriot, Cretan, Pontic and Northern Varieties

Cretan Dialect Model

Cretan Dialect

Fine-tuned model for Cretan Greek

Cypriot Dialect Model

Cypriot Dialect

Specialized model for Cypriot Greek

Northern Dialect Model

Northern Greek Dialect

Model trained on Northern Greek varieties

Pontic Dialect Model

Pontic Dialect

Model for Pontic Greek

OYXOY Benchmark

OYXOY is a modern NLP test suite for Modern Greek, featuring expert-verified evaluation tasks for natural language inference, word sense disambiguation, and metaphor detection.

Innovation: First finer-grained and multi-label NLI dataset for Greek, and the first dataset to use a traditional resource to automatically extract information and construct polysemy and metaphor datasets.

View Paper