Document 09: Novelty Angle & Publication Strategy¶
Overview¶
This document outlines the novel contributions of Indonesia-MTEB, positions it relative to existing benchmarks, and provides a comprehensive publication strategy for top-tier NLP venues. It addresses the critical question: "What is new and why does it matter?" from the perspective of reviewers, program committees, and the broader NLP community.
1. Core Novelty Contributions¶
1.1 Primary Novelty Claims¶
| Novelty Dimension | Indonesia-MTEB Contribution | Differentiation |
|---|---|---|
| Language Coverage | First comprehensive Indonesian text embedding benchmark covering all 8 MTEB task categories | Existing resources (IndoNLU, NusaCrowd) focus on classification/generation; no embedding benchmark exists |
| Regional Language Integration | Incorporates Javanese, Sundanese, Malay, and regional code-mixing evaluation | Regional MTEBs (VN-MTEB, TR-MTEB) are monolingual; SEA-BED covers 10 languages but only 71% human-curated |
| Cultural Preservation Framework | Novel evaluation framework for Indonesian cultural terms, register detection, and code-mixing | No existing benchmark evaluates cultural term preservation in translation |
| 3-Pronged Data Strategy | Combines aggregation (50+ existing), translation (full MTEB), AI generation (novel tasks) | Other benchmarks primarily use translation only |
| Kept Ratio Analysis | First systematic analysis of EN-ID translation quality by task type with empirical thresholds | VN-MTEB reports kept ratios but lacks linguistic proximity analysis |
1.2 Novel Technical Contributions¶
1.2.1 Cultural Term Preservation Validation¶
# Novel evaluation component: Cultural term preservation
INDONESIAN_CULTURAL_TERMS = {
# Social concepts
"gotong royong", "pancasila", "rukun", "siskamling",
# Religious/cultural
"lebaran", "puasa", "halal bil halal", "nyepi", "waisak",
# Culinary
"warung", "nasi goreng", "rendang", "sate", "bakso",
# Arts/crafts
"batik", "wayang", "gamelan", "keris", "ikat",
# Geographic/identity
"merantau", "kampung", "desa", "kos"
}
def evaluate_cultural_preservation(source: str, translation: str) -> dict:
"""
Novel evaluation metric for Indonesian MTEB.
Not found in VN-MTEB, TR-MTEB, or C-MTEB.
"""
source_terms = [t for t in CULTURAL_TERMS if t.lower() in source.lower()]
preserved = [t for t in source_terms if t.lower() in translation.lower()]
return {
"preservation_rate": len(preserved) / len(source_terms) if source_terms else 1.0,
"missing_terms": set(source_terms) - set(preserved)
}
1.2.2 Register Detection Evaluation¶
# Novel: Indonesian register (formal/informal) validation
def evaluate_register_preservation(source: str, translation: str) -> dict:
"""
Evaluates whether translation preserves formality level.
Critical for Indonesian with distinct formal (baku) and informal (cakap santai) registers.
"""
FORMAL_MARKERS = ["yang", "dengan", "untuk", "tersebut", "melakukan"]
INFORMAL_MARKERS = ["yg", "dgn", "utk", "tu", "lakuin", "gan", "deh"]
source_formality = classify_register(source)
translation_formality = classify_register(translation)
return {
"register_preserved": source_formality == translation_formality,
"source_register": source_formality,
"translation_register": translation_formality
}
1.2.3 Code-Mixing Validation¶
# Novel: Indonesian-English code-mixing detection
# A phenomenon increasingly common in Indonesian social media
def detect_code_mixing(text: str) -> dict:
"""
Detects and characterizes Indonesian-English code-mixing.
Novel evaluation dimension not present in other MTEBs.
"""
tokens = word_tokenize(text)
lang_ids = [language_id(t) for t in tokens]
switches = sum(1 for i in range(1, len(lang_ids)) if lang_ids[i] != lang_ids[i-1])
return {
"has_code_mixing": switches > 0,
"switch_count": switches,
"dominant_lang": max(set(lang_ids), key=lang_ids.count),
"mixing_ratio": lang_ids.count("en") / len(tokens)
}
1.3 Novel Methodological Contributions¶
| Contribution | Description | Why Novel |
|---|---|---|
| EN-ID Linguistic Proximity Analysis | Systematic analysis of how linguistic proximity affects kept ratios | VN-MTEB (EN-VI) and TR-MTEB (EN-TR) analyze different language pairs; EN-ID has unique characteristics |
| Cultural Term Impact Study | First study on how cultural terms affect embedding similarity | No existing benchmark evaluates this |
| Regional Language Cross-Lingual Transfer | Evaluation of how Indonesian embeddings transfer to Javanese/Sundanese | SEA-BED covers multiple languages but doesn't study transfer learning |
| AI-Generated Dataset Validation Protocol | Framework for validating LLM-generated datasets for novel tasks | Goes beyond translation-focused approaches |
2. Positioning Relative to Existing Benchmarks¶
2.1 Comparative Analysis¶
| Benchmark | Languages | Tasks | Datasets | Translation Source | Novelty Gap |
|---|---|---|---|---|---|
| MTEB | 112 | 8 | 1,308+ | N/A (original) | No Indonesian focus |
| MMTEB | 250+ | 8 | 500+ | Mixed | Indonesian coverage sparse |
| C-MTEB | 1 (zh) | 6 | 35 | Translation | Single language focus |
| VN-MTEB | 1 (vi) | 6 | 41 | Translation | No regional languages |
| TR-MTEB | 1 (tr) | 6 | 26 | Translation | No cultural evaluation |
| SEA-BED | 10 | 9 | 169 | 29% translated | Limited Indonesian datasets |
| ArabicMTEB | Arabic dialects | 8 | 47 | Mixed | Dialect focus, not typological |
| AfriMTEB | 59 | 14 | 38 | Mixed | African language focus |
| Indonesia-MTEB | 5+ (id, jv, su, ms, en) | 8 | 50-100+ | Aggregation + Translation + AI | First comprehensive Indonesian embedding benchmark with cultural evaluation |
2.2 Unique Value Proposition¶
Indonesia-MTEB is the only benchmark that:
-
Covers all 8 MTEB task categories for Indonesian (existing benchmarks cover 0-6 tasks)
-
Evaluates cultural term preservation in machine-translated datasets (no existing benchmark has this)
-
Incorporates regional language evaluation (Javanese, Sundanese, Malay) alongside Indonesian
-
Validates code-mixing detection for Indonesian-English social media text
-
Combines three data strategies (aggregation, translation, AI generation) in a unified framework
-
Provides linguistic proximity analysis for Austronesian language pair (EN-ID)
-
Offers register-aware evaluation for formal/informal Indonesian distinction
2.3 Competitive Advantages¶
| Dimension | Indonesia-MTEB Advantages |
|---|---|
| Coverage | Only benchmark with 8/8 tasks for Indonesian |
| Quality | 3-stage validation pipeline with cultural term preservation |
| Cultural Sensitivity | Novel evaluation of Indonesian cultural concepts |
| Regional Integration | First to evaluate Javanese/Sundanese cross-lingual transfer |
| Sociolinguistic Awareness | Code-mixing and register evaluation |
| Data Diversity | Combines existing datasets + translation + AI generation |
3. Publication Strategy¶
3.1 Target Venues¶
3.1.1 Primary Targets (Tier 1)¶
| Venue | Deadline | Acceptance Rate | Fit |
|---|---|---|---|
| ACL 2026 | Feb 2026 | ~22% | Highest prestige, strong fit for resource paper |
| EMNLP 2026 | May/June 2026 | ~22% | Strong fit for empirical benchmark paper |
| NAACL 2026 | Sep/Oct 2025 | ~23% | Regional relevance (Americas focus decreasing) |
| COLING 2026 | Early 2026 | ~25% | International focus, good for multilingual work |
3.1.2 Secondary Targets (Specialized Tracks)¶
| Venue | Track | Deadline | Fit |
|---|---|---|---|
| NeurIPS Datasets & Benchmarks | Datasets Track | ~May 2026 | Strong fit for novelty |
| ICLR | Datasets & Benchmarks | ~Sep 2026 | Growing NLP presence |
| AAAI | Senior/General Track | ~August 2026 | AI breadth, good fit |
| AACL | Main Track | ~Mid 2026 | Asian language focus |
3.1.3 Backup Options¶
| Venue | Notes |
|---|---|
| TACL | Journal format; after conference rejection |
| Findings of ACL/EMNLP/NAACL | After main conference rejection |
| LREC | Language resources focus |
| INTERSPEECH | If focusing on spoken/parallel datasets |
3.2 Submission Timeline Strategy¶
Phase 1: Preparation (3-4 months before target deadline)
├── Complete dataset creation and validation
├── Run benchmark experiments
├── Draft paper
└── Pre-print on arXiv (builds citations)
Phase 2: ARR Submission (2-3 months before deadline)
├── Submit to ACL Rolling Review
├── Select target venue (e.g., EMNLP 2026)
├── Address reviewer feedback
└── Resubmit if needed
Phase 3: Conference Selection
├── Upon acceptance, select from ACL/EMNLP/NAACL
├── Prepare camera-ready version
└── Prepare presentation materials
3.3 Publication Phases (Recommended)¶
| Phase | Venue | Purpose | Timeline |
|---|---|---|---|
| 1. Pre-print | arXiv | Establish priority, gather feedback | Month 1 |
| 2. Workshop | AACL/CL/WS | Refine methodology, build community | Month 3-4 |
| 3. Main Conference | ACL/EMNLP | Primary publication target | Month 6-8 |
| 4. Journal Extension | TACL/CL | Extended version with additional analysis | Month 12+ |
4. Addressing Reviewer Concerns¶
4.1 Common Reviewer Criticisms and Responses¶
| Concern | Anticipated Question | Response Strategy |
|---|---|---|
| "Just translation of MTEB" | What's novel beyond translation? | Emphasize: (1) Cultural term preservation framework, (2) Regional language integration, (3) Aggregation of 50+ existing Indonesian datasets, (4) AI-generated datasets for novel tasks |
| "Limited to Indonesian" | Why does this matter globally? | Position as: (1) 4th most spoken language (270M+ speakers), (2) Proxy for Austronesian languages, (3) Case study in cultural preservation, (4) Bridge between SEA and global NLP |
| "Translation quality issues" | How do you ensure quality? | Cite: (1) 3-stage validation pipeline, (2) Empirical similarity thresholds, (3) Kept ratio transparency, (4) Human calibration data |
| "Existing work covers this" | What about NusaCrowd/SEACrowd? | Differentiate: (1) NusaCrowd is a data hub, not embedding benchmark, (2) SEACrowd/SEA-BED lack Indonesian cultural evaluation, (3) Indonesia-MTEB is embedding-specific with 8/8 task coverage |
| ** "LLM-generated data concerns"** | Is synthetic data valid? | Response: (1) Only for novel tasks (Clustering, Reranking) with no existing Indonesian data, (2) Rigorous validation pipeline, (3) Transparency in data cards |
| "Benchmark drift" | Won't this become outdated quickly? | Response: (1) Open-source framework allows community updates, (2) MTEB integration ensures longevity, (3) Version control and documentation |
4.2 "So What?" Test¶
Reviewers will ask: Why does this matter?
Tier 1 Responses (Societal Impact): 1. Language Justice: 270M Indonesian speakers deserve embedding evaluation parity with English/Chinese 2. Cultural Preservation: Framework for preserving cultural concepts in ML pipelines 3. Regional Language Support: First benchmark to evaluate Javanese (98M speakers) and Sundanese (42M speakers) embeddings
Tier 2 Responses (Technical Contributions): 1. Linguistic Proximity Insights: EN-ID kept ratios differ from EN-VI (VN-MTEB) and EN-TR (TR-MTEB) due to typological factors 2. Cultural Term Evaluation: Novel framework applicable to other cultures 3. Code-Mixing Evaluation: First systematic code-mixing validation for embeddings
Tier 3 Responses (Research Infrastructure): 1. MTEB Integration: Adds Indonesian to the global embedding evaluation ecosystem 2. Open Resource: Enables Indonesian researchers to evaluate locally-relevant models 3. Benchmark Diversity: Addresses geographic imbalance in NLP benchmarks
4.3 Positioning Statement¶
"Indonesia-MTEB addresses a critical gap in the embedding evaluation landscape: the absence of comprehensive benchmarks for the world's fourth most spoken language. Beyond translating existing resources, we introduce novel evaluation frameworks for cultural preservation, code-mixing, and register—phenomena central to Indonesian sociolinguistics but absent from existing benchmarks. Our work provides both immediate utility for Indonesian NLP and generalizable insights for culturally-aware embedding evaluation."
5. Novelty Deep Dives¶
5.1 Cultural Term Preservation Framework¶
Novel Research Questions:
- How do embedding similarity scores correlate with cultural term preservation?
- Which translation models best preserve Indonesian cultural concepts?
- Does cultural term loss affect downstream embedding performance?
Methodology:
# Novel evaluation: Cultural term semantic drift
def measure_cultural_semantic_drift(source: str, translation: str,
embedder, cultural_terms: dict) -> float:
"""
Measures whether cultural terms maintain semantic similarity
across translation. Novel contribution not in other MTEBs.
"""
source_embedding = embedder.encode(source)
translation_embedding = embedder.encode(translation)
baseline_similarity = cosine_similarity(source_embedding, translation_embedding)
# Extract sentences containing cultural terms
cultural_sentences = extract_cultural_sentences(source, cultural_terms)
translated_cultural = extract_cultural_sentences(translation, cultural_terms)
if len(cultural_sentences) == 0:
return baseline_similarity
cultural_similarity = cosine_similarity(
embedder.encode(cultural_sentences),
embedder.encode(translated_cultural)
).mean()
return {
"baseline_similarity": baseline_similarity,
"cultural_similarity": cultural_similarity,
"drift": baseline_similarity - cultural_similarity
}
Publication Angle: - Empirical study of cultural concept preservation in machine translation - Framework applicable to other under-resourced cultures - Insights for multilingual embedding model training
5.2 Code-Mixing Evaluation¶
Novel Research Questions:
- Do embedding models treat code-mixed Indonesian-English differently from monolingual text?
- How does code-mixing affect retrieval and semantic similarity performance?
- Can embeddings identify language boundaries in code-mixed text?
Novel Dataset Component:
# Novel: Code-mixing evaluation dataset
class IndonesianCodeMixingDataset:
"""
Novel dataset for evaluating code-mixed Indonesian-English embeddings.
First of its kind in the MTEB ecosystem.
"""
def __init__(self):
self.tasks = [
"code_mixing_classification", # Detect if text is code-mixed
"language_boundary_detection", # Identify ID/EN boundaries
"code_mixed_similarity", # Semantic similarity in code-mixed pairs
"code_mixed_retrieval" # Retrieval with code-mixed queries
]
Publication Angle: - First systematic code-mixing evaluation for text embeddings - Indonesian as a case study for global code-mixing phenomena - Insights for embedding model development in multilingual societies
5.3 Regional Language Cross-Lingual Transfer¶
Novel Research Questions:
- Can Indonesian embeddings transfer to Javanese/Sundanese without retraining?
- How does cross-lingual transfer performance compare to dedicated models?
- Which tasks show better cross-lingual transfer?
Novel Evaluation:
# Novel: Regional language cross-lingual transfer evaluation
def evaluate_cross_lingual_transfer(indonesian_model, javanese_texts,
javanese_labels, task: str) -> dict:
"""
Evaluates how well Indonesian-trained embeddings perform on Javanese.
Novel contribution: no existing benchmark studies Austronesian transfer.
"""
# Encode Javanese texts with Indonesian model
embeddings = indonesian_model.encode(javanese_texts)
# Evaluate on task (classification, clustering, etc.)
performance = evaluate_task(embeddings, javanese_labels, task)
return {
"cross_lingual_performance": performance,
"task": task,
"source_language": "id",
"target_language": "jv"
}
Publication Angle: - Austronesian language family transfer learning - Resource efficiency for regional languages - Implications for multilingual embedding model design
6. Novelty Claims by Paper Section¶
6.1 Abstract Novelty Hooks¶
We introduce Indonesia-MTEB, the first comprehensive text embedding benchmark
for Indonesian covering all 8 MTEB task categories. Despite 270+ million
speakers, Indonesian lacks embedding evaluation infrastructure. Our benchmark
offers three novel contributions: (1) a [CULTURAL TERM PRESERVATION FRAMEWORK]
for evaluating translation quality, (2) first systematic [CODE-MIXING EVALUATION]
for Indonesian-English embeddings, and (3) [REGIONAL LANGUAGE TRANSFER ANALYSIS]
for Javanese and Sundanese. Through a three-pronged data strategy—aggregating
50+ existing datasets, translating MTEB resources, and AI-generating novel tasks—
we provide 50-100+ datasets across all embedding task types. Experiments on
18 models reveal unique patterns in [AUSTRONESIAN EMBEDDING PERFORMANCE], with
implications for multilingual embedding development.
6.2 Introduction Novelty Claims¶
| Paragraph | Novelty Emphasis |
|---|---|
| Motivation | Indonesian = 4th most spoken language; zero comprehensive embedding benchmarks |
| Gap Analysis | Existing work (IndoNLU, NusaCrowd) focus on classification; embedding evaluation missing |
| Our Contribution | 8/8 MTEB tasks + cultural framework + regional languages |
| Implications | Model for other under-resourced languages; cultural preservation insights |
6.3 Methodology Novelty Claims¶
| Component | Novelty Description |
|---|---|
| 3-Stage Validation | Adapted from VN-MTEB but with Indonesian-specific thresholds (≥0.75 vs ≥0.80) |
| Cultural Term Validation | Novel; no equivalent in any MTEB variant |
| Code-Mixing Detection | Novel for embedding evaluation |
| Register Preservation | Novel; leverages Indonesian formal/informal distinction |
| Regional Transfer | Novel; Austronesian language family analysis |
6.4 Experiments Novelty Claims¶
| Experiment | Novelty Hook |
|---|---|
| Kept Ratio by Task | First EN-ID analysis; differs from EN-VI (VN-MTEB) and EN-TR (TR-MTEB) |
| Cultural Term Impact | Novel study of cultural concepts on embedding similarity |
| Model Comparison | RoPE vs APE analysis in Indonesian context |
| Regional Transfer | First Javanese/Sundanese embedding evaluation |
| Code-Mixing Performance | Novel task type for Indonesian |
7. Angle-Specific Publication Strategies¶
7.1 Angle 1: Cultural Preservation in NLP¶
Core Thesis: Machine translation and embedding evaluation must account for cultural concepts that lack direct translations.
Target Venues: AACL, ACL, EMNLP
Novel Emphasis: - Cultural term preservation framework - Empirical analysis of translation's impact on cultural concepts - Implications for culturally-aware NLP systems
Potential Title:
"Indonesia-MTEB: A Cultural Preservation Framework for Text Embedding Benchmarks"
7.2 Angle 2: Code-Mixing and Sociolinguistics¶
Core Thesis: Embedding models must handle real-world sociolinguistic phenomena like code-mixing and register variation.
Target Venues: EMNLP, NAACL, ACL
Novel Emphasis: - Code-mixing detection and evaluation - Register-aware embedding evaluation - Real-world Indonesian social media validation
Potential Title:
"Beyond Monolingual: Evaluating Embeddings on Code-Mixed Indonesian Text"
7.3 Angle 3: Regional Language Support¶
Core Thesis: Benchmarks for major languages should enable evaluation for related regional languages.
Target Venues: AACL, COLING, LREC
Novel Emphasis: - Javanese and Sundanese evaluation - Cross-lingual transfer within Austronesian family - Resource-efficient multilingual embedding development
Potential Title:
"Indonesia-MTEB: Benchmarking Embeddings for Indonesian and Regional Languages"
7.4 Angle 4: Translation Quality for Embeddings¶
Core Thesis: Machine translation for embedding evaluation requires different quality metrics than traditional MT.
Target Venues: WMT, EMNLP, ACL
Novel Emphasis: - Semantic similarity thresholds for EN-ID - Kept ratio analysis by task type - Comparison with VN-MTEB (EN-VI) and TR-MTEB (EN-TR)
Potential Title:
"Translation Quality for Embedding Evaluation: The EN-ID Case Study"
8. Building Citation Potential¶
8.1 Citable Components¶
Each dataset and methodology component should be independently citable:
| Component | Citation Type | Expected Citations |
|---|---|---|
| Individual Datasets | Dataset (HuggingFace) | Per dataset use |
| Benchmark Framework | Conference paper | Primary citation |
| Cultural Framework | Method paper | Cultural NLP work |
| Code-Mixing Protocol | Method paper | Code-mixing research |
| Kept Ratio Analysis | Analysis paper | Translation research |
8.2 Pre-Publication Strategy¶
Before Conference Submission:
- arXiv Pre-print (Month 1): Establish priority
- HuggingFace Datasets (Month 2): Enable early adoption
- Blog Post (Month 2): Build community awareness
- Workshop Paper (Month 3-4): Refine methodology
Target Workshops: - AACL Workshop (Asian Language Resources) - Workshop on NLP for Indigenous Languages - Workshop on Translation and Semantics - Workshop on Benchmarking
8.3 Building a Citation Ecosystem¶
% Primary citation
@inproceedings{indonesia_mteb_2026,
title={Indonesia-MTEB: A Comprehensive Text Embedding Benchmark for Indonesian},
author={Authors},
booktitle={ACL/EMNLP},
year={2026}
}
% Dataset citation
@dataset{indonesia_mteb_datasets,
title={Indonesia-MTEB Dataset Collection},
author={Authors},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/indonesia-mteb}
}
% Cultural framework citation
@inproceedings{cultural_preservation_2026,
title={Cultural Term Preservation in Machine Translation for Embedding Evaluation},
author={Authors},
booktitle={Workshop on Culturally-Aware NLP},
year={2026}
}
9. Risk Mitigation¶
9.1 Identified Risks and Mitigations¶
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Reviewer: "Just translation" | High | High | Emphasize cultural framework, regional languages, AI-generated datasets |
| Reviewer: "Not generalizable" | Medium | Medium | Position as case study for Austronesian, cultural preservation framework |
| MTEB rejection | Low | Medium | Engage with MTEB maintainers early; prepare independent release |
| Translation quality concerns | Medium | Medium | Transparent kept ratios; human calibration data; multiple validation stages |
| Competition from SEA-BED | Medium | Low | SEA-BED covers 10 languages shallowly; Indonesia-MTEB goes deep on Indonesian |
| Code-mixing criticism | Low | Low | Limited to specific datasets; not main contribution |
9.2 Contingency Plans¶
If Rejected from Main Conference:
- Findings Track: Resubmit to same venue's Findings
- Alternative Venue: Submit to different tier-1 conference
- Workshop + Arxiv: Build citations, resubmit to main venue
- Journal Submission: Convert to TACL/CL journal submission
If MTEB Integration Fails:
- Independent Release: Publish on HuggingFace with custom evaluation code
- Community Engagement: Build Indonesia-MTEB community
- Alternative Framework: Explore integration with other evaluation frameworks
10. Novelty Checklist¶
10.1 Pre-Submission Novelty Verification¶
- Cultural Term Preservation Framework is clearly described and empirically validated
- Code-Mixing Evaluation component is included (even if limited)
- Regional Language Analysis (Javanese/Sundanese) is present
- Kept Ratio Analysis includes comparison with VN-MTEB and TR-MTEB
- 3-Pronged Strategy (aggregation + translation + AI) is emphasized
- MTEB Integration is documented or in progress
- Open Source Release on HuggingFace is completed
- Cultural Sensitivity is acknowledged and addressed
- Linguistic Proximity Analysis (EN-ID vs EN-VI vs EN-TR) is included
- Broader Impact Statement addresses language justice
10.2 Novelty Narrative Checklist¶
- Abstract clearly states unique contributions
- Introduction quantifies the gap (270M speakers, 0 comprehensive benchmarks)
- Related Work positions relative to all major MTEBs
- Methodology section describes novel validation frameworks
- Experiments section includes unique analyses (cultural, code-mixing, regional)
- Discussion generalizes findings beyond Indonesian
- Conclusion articulates implications for other under-resourced languages
11. Summary: Core Novelty Statement¶
Indonesia-MTEB's unique contributions:
-
Coverage: First comprehensive Indonesian embedding benchmark (all 8 MTEB tasks)
-
Cultural Framework: Novel evaluation of cultural term preservation in translation
-
Sociolinguistic Awareness: First code-mixing and register evaluation for Indonesian embeddings
-
Regional Integration: Evaluation of Javanese and Sundanese alongside Indonesian
-
Linguistic Proximity Insights: Empirical analysis of EN-ID translation quality by task type
-
3-Pronged Strategy: Combines aggregation, translation, and AI generation
-
Open Resource: Community-driven integration with MTEB ecosystem
Positioning: Indonesia-MTEB serves both as an immediate resource for Indonesian NLP and as a case study for culturally-aware, sociolinguistically-informed embedding evaluation applicable to other under-resourced languages and cultures.
12. References and Related Work¶
12.1 Must-Cite Benchmarks¶
| Benchmark | Citation | Key Differentiation |
|---|---|---|
| MTEB | Muennighoff et al., 2023 | Original framework |
| MMTEB | Enevoldsen et al., 2025 | Multilingual expansion |
| VN-MTEB | Pham et al., 2025 | Translation pipeline |
| TR-MTEB | Baysan et al., 2025 | Turkish benchmark |
| SEA-BED | Ponwitayarat et al., 2025 | SEA regional coverage |
| C-MTEB | – | Chinese benchmark |
12.2 Must-Cite Indonesian Resources¶
| Resource | Citation | Use |
|---|---|---|
| IndoNLU | Wilie et al., 2020 | Indonesian NLP tasks |
| Indo4B | Wilie et al., 2020 | Pre-training corpus |
| NusaCrowd | Cahyawijaya et al., 2023 | Indonesian dataset hub |
| SEACrowd | SEACrowd Consortium, 2024 | SEA dataset hub |
12.3 Supporting Literature¶
| Topic | Key Citations |
|---|---|
| Cultural NLP | R transparent |
| Code-Mixing | |
| Embedding Evaluation | |
| Low-Resource Languages | |
| Machine Translation Quality |
Next Steps: 1. Finalize datasets and validation 2. Run benchmark experiments 3. Draft paper following novelty framework 4. Submit to ARR for target venue 5. Engage with MTEB community for integration