Datasets & Resources

Greek and Dialectal Greek NLP Resources

2024 NLI

OYXOY Test Suite

Modern Greek NLI benchmark with multi-label annotations. 1,763 pairs covering entailment, contradiction, and neutrality with full word sense disambiguation.

Format: JSON
License: CC BY 4.0
Citation: Kogkalidis et al. (EACL 2024)

GitHub Zenodo Paper

2023 Dialectology

GRDD: Greek Regional Dialects Dataset

Comprehensive corpus of Greek dialectal varieties: Cypriot, Pontic, Cretan, and Northern Greek. First large-scale resource for Greek dialectal NLP.

Varieties: 4 major dialects
License: CC BY 4.0
Citation: Chatzikyriakidis et al. (2023)

GitHub Paper

2022 NLI

Extended Greek FraCaS

Greek translation and extension of the FraCaS test suite for natural language inference. 774 inference examples covering quantifiers, plurals, adjectives, and more.

Examples: 774
Phenomena: 9 categories
Citation: Amanaki et al. (LREC 2022)

GitHub Paper

2022 NLI

De-dropped Greek XNLI

Modified version of Greek XNLI with dropped subjects restored, addressing a key morphosyntactic property of Greek that affects NLI performance.

Based on: XNLI
Modification: Pro-drop restoration
Citation: Amanaki et al. (LREC 2022)

GitHub Paper

Ongoing DH

MEDEA-NEUMOUSA Platform

Platform for computational analysis of ancient Greek texts with knowledge graph extraction and neuro-symbolic reasoning capabilities.

Focus: Classical texts
Methods: KG extraction, NLI
Status: Active development

🌐 Platform

2024 Dialogue

Natural Language Dialogue Inferences

Dataset of inferences from natural language dialogues including disfluencies, hesitations, and interactive phenomena often absent from written text.

Features: Disfluencies, repairs
Citation: Ek et al. (SemDial 2024)
Collaboration: CLASP

Conference

Using Our Datasets

License Information

All CLLT datasets are released under permissive open licenses (typically CC BY 4.0) to encourage research and development. Please check individual dataset repositories for specific license details.

Citation

If you use our datasets in your research, please cite the corresponding papers. BibTeX entries are available in each dataset's GitHub repository and in our publications page.

Contribute

We welcome contributions, error corrections, and extensions to our datasets. Please submit issues or pull requests on the respective GitHub repositories.

Contact

For questions, collaboration inquiries, or access to unreleased resources, please contact the lab.

Datasets & Resources

OYXOY: Modern Greek NLP Test Suite

Greek and Dialectal Greek NLP Resources

OYXOY Test Suite

GRDD: Greek Regional Dialects Dataset

Extended Greek FraCaS

De-dropped Greek XNLI

MEDEA-NEUMOUSA Platform

Natural Language Dialogue Inferences

Using Our Datasets

License Information

Citation

Contribute

Contact

Upcoming Resources