Xertus AI Research Foundation¶
Welcome to the Xertus AI Research Foundation documentation.
Featured Projects¶
Indonesia-MTEB Benchmark¶
A comprehensive text embedding benchmark for the Indonesian language.
11 documents covering: - 3-pronged data strategy (aggregation, translation, AI generation) - Cultural term preservation framework - Code-mixing evaluation - Regional language support
Quick Navigation¶
Indonesia-MTEB Benchmark¶
In Progress
This is an active research project. Documentation is being updated regularly.
| Document | Description |
|---|---|
| Project Overview | Problem statement, 3-pronged strategy |
| MTEB Structure | 8 task categories, formats, metrics |
| Indonesian Datasets | Inventory of 50+ datasets |
| Regional MTEBs | C-MTEB, VN-MTEB, TR-MTEB analysis |
| Translation Models | EN-ID translation comparison |
| AI Generation | LLM-based dataset generation |
| Validation | 3-stage pipeline, quality thresholds |
| ACL Standards | ARR submission, licensing |
| Novelty & Publication | Unique contributions |
| Implementation | 12-month timeline |
| Python Package | Building the package |
About¶
Xertus AI focuses on advancing NLP capabilities for Indonesian and Southeast Asian languages through open research.
- GitHub: xertusai
- Website: xertusai.com