Document 11: Python Package Development¶
Overview¶
This document provides a comprehensive, step-by-step guide for building indonesia-mteb, a community-managed Python package for Indonesian text embedding benchmarking. Since you haven't built a Python package before, this guide covers everything from initial setup to community management.
Part 1: Understanding the MTEB Package Architecture¶
1.1 What MTEB Does¶
MTEB is a framework for: 1. Loading datasets in standardized formats 2. Running evaluations on embedding models 3. Aggregating results across tasks 4. Managing benchmarks (collections of tasks)
1.2 Key Components to Implement¶
| Component | Purpose | Priority |
|---|---|---|
| Dataset Loaders | Load Indonesian datasets in MTEB format | Critical |
| Task Definitions | Define evaluation tasks (Classification, etc.) | Critical |
| Benchmark Class | Group tasks into Indonesia-MTEB benchmark | Critical |
| CLI | Command-line interface for running evaluations | High |
| Leaderboard Integration | Submit results to MTEB leaderboard | Medium |
| Documentation | API docs and user guides | High |
1.3 Package Architecture Decision¶
Recommended Approach: Subset + Extension Pattern
Rather than forking the entire MTEB package, build a lightweight package that: 1. Depends on MTEB as a base dependency 2. Adds Indonesian datasets as extensions 3. Defines Indonesia-MTEB benchmark 4. Provides convenience functions for Indonesian use cases
Benefits: - Smaller codebase to maintain - Automatic MTEB updates - Community can contribute datasets independently - Can be integrated into main MTEB later
Part 2: Package Structure and Setup¶
2.1 Complete Directory Structure¶
indonesia-mteb/
├── pyproject.toml # Modern Python packaging config
├── README.md # Package overview
├── LICENSE # Apache-2.0 (matches MTEB)
├── CHANGELOG.md # Version history
├── CONTRIBUTING.md # Contribution guidelines
├── .pre-commit-config.yaml # Pre-commit hooks
├── .github/
│ ├── workflows/
│ │ ├── test.yml # CI/CD testing
│ │ ├── publish.yml # PyPI publishing
│ │ └── labeler.yml # PR labeler
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ ├── dataset_request.md
│ │ └── feature_request.md
│ └── PULL_REQUEST_TEMPLATE.md
├── docs/
│ ├── index.md # Documentation home
│ ├── getting-started.md
│ ├── tasks.md
│ ├── adding-datasets.md
│ └── api/ # Auto-generated API docs
├── indonesia_mteb/ # Main package
│ ├── __init__.py # Public API exports
│ ├── __version__.py # Version info
│ ├── tasks/ # Task definitions
│ │ ├── __init__.py
│ │ ├── classification/ # Classification tasks
│ │ │ ├── __init__.py
│ │ │ ├── indo_sentiment.py
│ │ │ └── ...
│ │ ├── clustering/
│ │ ├── reranking/
│ │ ├── retrieval/
│ │ ├── sts/
│ │ └── summarization/
│ ├── benchmarks/ # Benchmark definitions
│ │ ├── __init__.py
│ │ └── indonesia_mteb.py
│ ├── models/ # Model adapters (optional)
│ │ ├── __init__.py
│ │ └── indonesian_models.py
│ └── resources/ # Static resources
│ ├── cultural_terms.txt
│ └── stopwords_id.txt
├── tests/ # Test suite
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures
│ ├── test_tasks.py
│ ├── test_benchmarks.py
│ └── test_integration.py
├── scripts/ # Utility scripts
│ ├── validate_datasets.py
│ └── generate_leaderboard.py
└── examples/ # Usage examples
├── basic_usage.py
└── custom_model.py
2.2 pyproject.toml (Complete Template)¶
# pyproject.toml - Modern Python packaging configuration
[build-system]
requires = ["setuptools>=61.0", "wheel", "setuptools-scm>=8.0"]
build-backend = "setuptools.build_meta"
[project]
name = "indonesia-mteb"
version = "0.1.0" # Will be managed by setuptools-scm
description = "Indonesian Massive Text Embedding Benchmark"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "Apache-2.0"}
authors = [
{name = "Indonesia-MTEB Team", email = "contact@indonesia-mteb.org"}
]
maintainers = [
{name = "Indonesia-MTEB Team", email = "contact@indonesia-mteb.org"}
]
keywords = [
"embedding",
"benchmark",
"indonesian",
"nlp",
"text-embedding",
"evaluation",
"mteb"
]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: Apache Software License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
]
# Dependencies
dependencies = [
"mteb>=2.7.0",
"datasets>=2.14.0",
"sentence-transformers>=2.2.0",
"numpy>=1.21.0",
"requests>=2.28.0",
"tqdm>=4.65.0",
"pyarrow>=12.0.0", # For parquet support
]
# Optional dependencies
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"ruff>=0.1.0",
"mypy>=1.5.0",
"pre-commit>=3.3.0",
]
docs = [
"mkdocs>=1.5.0",
"mkdocs-material>=9.0.0",
"mkdocstrings[python]>=0.22.0",
"mkdocs-gen-files>=0.5.0",
"mkdocs-literate-nav>=0.6.0",
"mkdocs-section-index>=0.3.0",
]
benchmark = [
"openai>=1.0.0", # For API-based models
"anthropic>=0.18.0",
"transformers[torch]>=4.35.0",
"accelerate>=0.24.0",
]
vision = [
"pillow>=10.0.0",
]
all = [
"indonesia-mteb[dev,docs,benchmark,vision]",
]
[project.urls]
Homepage = "https://github.com/indonesia-mteb/indonesia-mteb"
Documentation = "https://indonesia-mteb.github.io/indonesia-mteb"
Repository = "https://github.com/indonesia-mteb/indonesia-mteb"
"Bug Tracker" = "https://github.com/indonesia-mteb/indonesia-mteb/issues"
Changelog = "https://github.com/indonesia-mteb/indonesia-mteb/blob/main/CHANGELOG.md"
[project.scripts]
indonesia-mteb = "indonesia_mteb.cli:main"
# Tool configurations
[tool.setuptools]
packages = ["indonesia_mteb"]
[tool.setuptools_scm]
write_to = "indonesia_mteb/_version.py"
version_scheme = "guess-next-dev"
local_scheme = "no-local-version"
# Ruff configuration
[tool.ruff]
target-version = "py39"
line-length = 100
indent-width = 4
[tool.ruff.lint]
select = [
"E", # pycodestyle errors
"W", # pycodestyle warnings
"F", # Pyflakes
"I", # isort
"B", # flake8-bugbear
"C4", # flake8-comprehensions
"UP", # pyupgrade
"ARG", # flake8-unused-arguments
"SIM", # flake8-simplify
]
ignore = [
"E501", # line too long (handled by formatter)
"B008", # do not perform function calls in argument defaults
"W191", # indentation contains tabs
]
[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"
[tool.ruff.lint.isort]
known-first-party = ["indonesia_mteb"]
# Pytest configuration
[tool.pytest.ini_options]
minversion = "7.0"
testpaths = ["tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = [
"--cov=indonesia_mteb",
"--cov-report=term-missing",
"--cov-report=html",
"--strict-markers",
]
markers = [
"slow: marks tests as slow (deselect with '-m \"not slow\"')",
"integration: marks tests as integration tests",
"unit: marks tests as unit tests",
]
# Coverage configuration
[tool.coverage.run]
source = ["indonesia_mteb"]
omit = [
"*/tests/*",
"*/__pycache__/*",
"*/site-packages/*",
]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"raise AssertionError",
"raise NotImplementedError",
"if __name__ == .__main__.:",
"if TYPE_CHECKING:",
"@abstractmethod",
]
# MyPy configuration
[tool.mypy]
python_version = "3.9"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = false # Can be enabled later
disallow_incomplete_defs = false
check_untyped_defs = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
follow_imports = "normal"
ignore_missing_imports = true
[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false
2.3 Version Management¶
# indonesia_mteb/__version__.py
__version__ = "0.1.0"
# Or use setuptools-scm for automatic versioning from git tags
Part 3: Core Implementation¶
3.1 Package init.py (Public API)¶
# indonesia_mteb/__init__.py
"""
Indonesia-MTEB: Indonesian Massive Text Embedding Benchmark
A community-managed benchmark for evaluating Indonesian text embeddings.
"""
__version__ = "0.1.0"
# Import main components for public API
from indonesia_mteb.benchmarks import IndonesiaMTEB
from indonesia_mteb.tasks import get_indonesian_tasks
from indonesia_mteb.tasks import get_benchmark
__all__ = [
"__version__",
"IndonesiaMTEB",
"get_indonesian_tasks",
"get_benchmark",
]
# Convenience functions
def get_tasks(tasks=None, languages=None, domains=None):
"""Get Indonesian MTEB tasks with optional filtering.
Args:
tasks: List of task names to include (None = all)
languages: List of languages to filter by (e.g., ["id", "jv", "su"])
domains: List of domains to filter by (e.g., ["Social", "News"])
Returns:
List of MTEB task objects
"""
all_tasks = get_indonesian_tasks()
if tasks:
all_tasks = [t for t in all_tasks if t.metadata.name in tasks]
if languages:
all_tasks = [t for t in all_tasks if any(
lang in t.metadata.eval_langs for lang in languages
)]
if domains:
all_tasks = [t for t in all_tasks if any(
domain in t.metadata.domains for domain in domains
)]
return all_tasks
def get_benchmark(name="Indonesia-MTEB"):
"""Get an Indonesia-MTEB benchmark.
Args:
name: Benchmark name ("Indonesia-MTEB" or "Indonesia-MTEB-lite")
Returns:
MTEB benchmark object
"""
from indonesia_mteb.benchmarks.indonesia_mteb import (
INDONESIA_MTEB,
INDONESIA_MTEB_LITE
)
benchmarks = {
"Indonesia-MTEB": INDONESIA_MTEB,
"Indonesia-MTEB-lite": INDONESIA_MTEB_LITE
}
return benchmarks.get(name, INDONESIA_MTEB)
3.2 Task Definition Template¶
# indonesia_mteb/tasks/classification/indo_sentiment.py
"""Indonesian Sentiment Classification Task"""
from mteb.abstasks.AbsTaskClassification import AbsTaskClassification
from mteb.abstasks.TaskMetadata import TaskMetadata
# This import pattern allows datasets to be loaded from HuggingFace
# or locally for development
class IndoSentimentClassification(AbsTaskClassification):
"""Indonesian sentiment classification task.
Dataset sourced from Indonesian e-commerce and social media reviews.
Predicts sentiment (positive, negative, neutral).
"""
metadata = TaskMetadata(
name="IndoSentimentClassification",
description="Indonesian sentiment classification from reviews and social media",
reference="https://huggingface.co/datasets/indonesia-mteb/indo-sentiment",
dataset={
"path": "indonesia-mteb/indo-sentiment",
"revision": "main",
# For local development during dataset creation:
# "path": "/path/to/local/data",
},
type="Classification",
category="s2s", # sentence-to-sentence
eval_splits=["test"],
eval_langs=["id-ID"],
main_score="accuracy",
date=("2024-01-01", "2024-12-31"),
form=["written"],
domains=["Social", "Reviews"],
task_subtypes=["Sentiment analysis"],
license="CC-BY-4.0",
annotations_creators="human-verified",
dialect=[],
sample_creation="found",
bibtex_citation="""@dataset{indo_sentiment_2024,
title={Indonesian Sentiment Classification Dataset},
author={Indonesia-MTEB Team},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/datasets/indonesia-mteb/indo-sentiment}
}""",
)
def dataset_transform(self):
"""Optional: Transform dataset after loading.
Use this to rename columns, convert labels, etc.
"""
# Example transformations if needed
# self.dataset = self.dataset.rename_column("label", "label_text")
pass
3.3 Benchmark Definition¶
# indonesia_mteb/benchmarks/indonesia_mteb.py
"""Indonesia-MTEB Benchmark Definitions"""
from mteb.benchmarks import Benchmark
from indonesia_mteb.tasks.classification import (
IndoSentimentClassification,
IndoEmotionClassification,
IndoTopicsClassification,
)
from indonesia_mteb.tasks.clustering import (
IndoWikiClustering,
IndoNewsClustering,
)
from indonesia_mteb.tasks.retrieval import (
IndoMSMARCO,
IndoNFCorpus,
)
from indonesia_mteb.tasks.sts import (
IndoSTS,
IndoSICKRel,
)
# Define benchmarks
INDONESIA_MTEB_TASKS = [
# Classification
IndoSentimentClassification,
IndoEmotionClassification,
IndoTopicsClassification,
# Clustering
IndoWikiClustering,
IndoNewsClustering,
# Retrieval
IndoMSMARCO,
IndoNFCorpus,
# STS
IndoSTS,
IndoSICKRel,
# Add more tasks as they are created...
]
INDONESIA_MTEB = Benchmark(
name="Indonesia-MTEB",
tasks=INDONESIA_MTEB_TASKS,
description="Indonesian Massive Text Embedding Benchmark - Full version",
citation="""@inproceedings{indonesia_mteb_2026,
title={Indonesia-MTEB: A Comprehensive Text Embedding Benchmark for Indonesian},
author={Authors},
booktitle={Conference},
year={2026}
}""",
)
# Lite version with faster tasks
INDONESIA_MTEB_LITE_TASKS = [
IndoSentimentClassification,
IndoEmotionClassification,
IndoTopicsClassification,
IndoSTS,
]
INDONESIA_MTEB_LITE = Benchmark(
name="Indonesia-MTEB-lite",
tasks=INDONESIA_MTEB_LITE_TASKS,
description="Indonesian Massive Text Embedding Benchmark - Lite version (quick evaluation)",
citation=INDONESIA_MTEB.citation,
)
# Domain-specific benchmarks
INDONESIA_MTEB_SOCIAL = Benchmark(
name="Indonesia-MTEB-social",
tasks=[
IndoSentimentClassification,
IndoEmotionClassification,
],
description="Indonesia-MTEB - Social Media domain only",
)
INDONESIA_MTEB_RETRIEVAL = Benchmark(
name="Indonesia-MTEB-retrieval",
tasks=[
IndoMSMARCO,
IndoNFCorpus,
],
description="Indonesia-MTEB - Retrieval tasks only",
)
3.4 CLI Implementation¶
# indonesia_mteb/cli.py
"""Command-line interface for Indonesia-MTEB"""
import argparse
from typing import Optional
import mteb
from indonesia_mteb import get_benchmark, get_tasks
def main():
"""Main CLI entry point."""
parser = argparse.ArgumentParser(
description="Indonesia-MTEB: Evaluate Indonesian text embeddings"
)
# Model arguments
parser.add_argument(
"-m", "--model",
type=str,
required=True,
help="Model name or path (e.g., 'sentence-transformers/all-MiniLM-L6-v2')"
)
# Task selection
parser.add_argument(
"-t", "--tasks",
type=str,
nargs="+",
help="Specific tasks to run (default: all Indonesia-MTEB tasks)"
)
parser.add_argument(
"-b", "--benchmark",
type=str,
default="Indonesia-MTEB",
choices=["Indonesia-MTEB", "Indonesia-MTEB-lite", "Indonesia-MTEB-social", "Indonesia-MTEB-retrieval"],
help="Benchmark to run (default: Indonesia-MTEB)"
)
# Evaluation options
parser.add_argument(
"--output-folder",
type=str,
default="results",
help="Output folder for results"
)
parser.add_argument(
"--batch-size",
type=int,
default=32,
help="Batch size for encoding"
)
parser.add_argument(
"--eval-splits",
type=str,
nargs="+",
default=["test"],
help="Evaluation splits (default: ['test'])"
)
parser.add_argument(
"--co2_tracker",
action="store_true",
help="Track CO2 emissions during evaluation"
)
args = parser.parse_args()
# Get model
model = mteb.get_model(args.model)
# Get tasks or benchmark
if args.tasks:
tasks = get_tasks(tasks=args.tasks)
print(f"Running {len(tasks)} specific tasks")
else:
benchmark = get_benchmark(args.benchmark)
tasks = benchmark.tasks
print(f"Running {args.benchmark} with {len(tasks)} tasks")
# Run evaluation
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
model,
output_folder=args.output_folder,
eval_splits=args.eval_splits,
batch_size=args.batch_size,
co2_tracker=args.co2_tracker,
)
# Print summary
print("\n" + "="*60)
print("EVALUATION COMPLETE")
print("="*60)
for task_name, task_results in results.items():
main_score = task_results.get("main_score", "N/A")
print(f"{task_name}: {main_score}")
print(f"\nResults saved to: {args.output_folder}")
if __name__ == "__main__":
main()
Part 4: Testing Infrastructure¶
4.1 Test Configuration¶
# tests/conftest.py
"""Pytest configuration and fixtures"""
import pytest
from sentence_transformers import SentenceTransformer
@pytest.fixture(scope="session")
def sample_model():
"""Fixture providing a sample model for testing."""
return SentenceTransformer("average_word_embeddings_levy_dependency")
@pytest.fixture(scope="session")
def sample_texts():
"""Fixture providing sample Indonesian texts."""
return [
"Produk ini sangat bagus dan pengiriman cepat.",
"Saya sangat kecewa dengan kualitas barang ini.",
"Harga terjangkau dengan kualitas yang oke.",
]
@pytest.fixture
def temp_output_dir(tmp_path):
"""Fixture providing a temporary output directory."""
output_dir = tmp_path / "results"
output_dir.mkdir()
return output_dir
# Markers for different test types
def pytest_configure(config):
"""Configure custom pytest markers."""
config.addinivalue_mark("slow", "mark test as slow running")
config.addinivalue_mark("integration", "mark test as integration test")
config.addinivalue_mark("unit", "mark test as unit test")
4.2 Sample Tests¶
# tests/test_tasks.py
"""Tests for task definitions"""
import pytest
from indonesia_mteb import get_tasks, get_benchmark
class TestTaskDiscovery:
"""Test task discovery and loading."""
def test_get_indonesian_tasks(self):
"""Test that Indonesian tasks can be retrieved."""
tasks = get_tasks()
assert len(tasks) > 0
assert all(hasattr(t, 'metadata') for t in tasks)
def test_tasks_have_required_metadata(self):
"""Test that all tasks have required metadata fields."""
tasks = get_tasks()
for task in tasks:
metadata = task.metadata
assert metadata.name is not None
assert metadata.description is not None
assert metadata.main_score is not None
assert "id" in str(metadata.eval_langs)
def test_filter_tasks_by_language(self):
"""Test filtering tasks by language."""
indonesian_tasks = get_tasks(languages=["id"])
javanese_tasks = get_tasks(languages=["jv"])
assert len(indonesian_tasks) > 0
# Javanese might be 0 until we add those datasets
def test_filter_tasks_by_domain(self):
"""Test filtering tasks by domain."""
social_tasks = get_tasks(domains=["Social"])
assert len(social_tasks) > 0
class TestBenchmarks:
"""Test benchmark definitions."""
def test_get_default_benchmark(self):
"""Test getting the default benchmark."""
benchmark = get_benchmark()
assert benchmark is not None
assert "Indonesia-MTEB" in benchmark.name
def test_benchmark_has_tasks(self):
"""Test that benchmark contains tasks."""
benchmark = get_benchmark()
assert len(benchmark.tasks) > 0
def test_lite_benchmark(self):
"""Test the lite benchmark."""
benchmark = get_benchmark("Indonesia-MTEB-lite")
assert "lite" in benchmark.name.lower()
assert len(benchmark.tasks) < len(get_benchmark().tasks)
@pytest.mark.integration
class TestTaskEvaluation:
"""Integration tests for task evaluation."""
def test_classification_task_evaluation(self, sample_model, temp_output_dir):
"""Test actual evaluation on a classification task."""
tasks = get_tasks(tasks=["IndoSentimentClassification"])
if not tasks:
pytest.skip("Task not yet implemented")
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
sample_model,
output_folder=str(temp_output_dir),
eval_splits=["test"] # Use test split
)
assert "IndoSentimentClassification" in results
@pytest.mark.unit
class TestDatasetLoading:
"""Unit tests for dataset loading."""
def test_dataset_can_be_loaded(self):
"""Test that datasets can be loaded from HuggingFace."""
tasks = get_tasks()
if not tasks:
pytest.skip("No tasks available")
task = tasks[0]
# This will fail if dataset doesn't exist on HF yet
try:
dataset = task.load_data()
assert dataset is not None
except Exception as e:
pytest.skip(f"Dataset not available: {e}")
Part 5: Documentation Setup¶
5.1 MkDocs Configuration¶
# mkdocs.yml
site_name: Indonesia-MTEB
site_description: Indonesian Massive Text Embedding Benchmark
site_url: https://indonesia-mteb.github.io/indonesia-mteb
repo_url: https://github.com/indonesia-mteb/indonesia-mteb
repo_name: indonesia-mteb/indonesia-mteb
theme:
name: material
language: en
palette:
- scheme: default
primary: red
accent: red
toggle:
icon: material/brightness-7
name: Switch to dark mode
- scheme: slate
primary: red
accent: red
toggle:
icon: material/brightness-4
name: Switch to light mode
features:
- navigation.instant
- navigation.tracking
- navigation.tabs
- navigation.sections
- navigation.expand
- navigation.indexes
- search.suggest
- search.highlight
- content.code.copy
- content.tabs.link
icon:
logo: material/logo.svg
repo: fontawesome/brands/github
plugins:
- search
- mkdocstrings:
handlers:
python:
options:
docstring_style: google
show_source: true
show_root_heading: true
show_root_members_full_path: false
members_order: source
- gen-files:
- files:
- name: api
url: https://indonesia-mteb.github.io/indonesia-mteb/api/
- literate-nav:
nav_file: SUMMARY.md
- section-index
- tags
nav:
- Home: index.md
- Getting Started:
- Installation: getting-started/installation.md
- Quick Start: getting-started/quickstart.md
- Basic Usage: getting-started/basic-usage.md
- Tasks:
- Overview: tasks/overview.md
- Classification: tasks/classification.md
- Clustering: tasks/clustering.md
- Retrieval: tasks/retrieval.md
- STS: tasks/sts.md
- Guides:
- Adding Datasets: guides/adding-datasets.md
- Custom Models: guides/custom-models.md
- Benchmarking: guides/benchmarking.md
- API Reference:
- api/index.md
- Community:
- Contributing: community/contributing.md
- Code of Conduct: community/code-of-conduct.md
extra:
social:
- icon: fontawesome/brands/github
link: https://github.com/indonesia-mteb/indonesia-mteb
- icon: fontawesome/brands/python
link: https://pypi.org/project/indonesia-mteb/
5.2 Documentation Example¶
# docs/getting-started/quickstart.md
# Quick Start
## Installation
Install indonesia-mteb via pip:
```bash
pip install indonesia-mteb
Or with uv for faster installation:
Basic Usage¶
Running the Full Benchmark¶
import indonesia_mteb
from sentence_transformers import SentenceTransformer
# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Get Indonesia-MTEB benchmark
benchmark = indonesia_mteb.get_benchmark("Indonesia-MTEB")
# Run evaluation
results = benchmark.run(
model,
output_folder="results",
)
# Print results
for task_name, task_results in results.items():
print(f"{task_name}: {task_results['main_score']}")
Running Specific Tasks¶
import indonesia_mteb
import mteb
# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Get specific tasks
tasks = indonesia_mteb.get_tasks(tasks=[
"IndoSentimentClassification",
"IndoEmotionClassification",
])
# Run evaluation
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
model,
output_folder="results",
)
Using the CLI¶
# Run full benchmark
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
-b Indonesia-MTEB \
--output-folder results
# Run specific tasks
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
-t IndoSentimentClassification IndoEmotionClassification \
--output-folder results
# Run lite benchmark (faster)
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
-b Indonesia-MTEB-lite \
--output-folder results
---
## Part 6: CI/CD Pipeline
### 6.1 GitHub Actions Test Workflow
```yaml
# .github/workflows/test.yml
name: Tests
on:
push:
branches: [main, develop]
pull_request:
branches: [main, develop]
jobs:
test:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: ["3.9", "3.10", "3.11", "3.12"]
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e ".[dev]"
- name: Run linting
run: |
ruff check indonesia_mteb
ruff format --check indonesia_mteb
- name: Run type checking
run: mypy indonesia_mteb
continue-on-error: true
- name: Run tests
run: |
pytest --cov=indonesia_mteb --cov-report=xml --cov-report=term
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v4
with:
file: ./coverage.xml
flags: unittests
name: codecov-${{ matrix.os }}-py${{ matrix.python-version }}
test-minimal:
# Run with minimal dependencies
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .
- name: Run import test
run: python -c "import indonesia_mteb; print(indonesia_mteb.__version__)"
6.2 Publish Workflow¶
# .github/workflows/publish.yml
name: Publish to PyPI
on:
push:
tags:
- 'v*'
permissions:
contents: read
id-token: write # Required for trusted publishing
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.11"
- name: Install build dependencies
run: |
python -m pip install --upgrade pip
pip install build twine
- name: Build package
run: python -m build
- name: Check distribution
run: twine check dist/*
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist/
build-github-release:
runs-on: ubuntu-latest
needs: build
permissions:
contents: write
steps:
- uses: actions/checkout@v4
- name: Create GitHub Release
uses: softprops/action-gh-release@v2
with:
files: dist/*
generate_release_notes: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Part 7: Pre-Commit Configuration¶
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.0
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
hooks:
- id: mypy
additional_dependencies:
- types-requests
- types-setuptools
args: [--ignore-missing-imports]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict
- id: check-case-conflict
- id: check-docstring-first
- id: debug-statements
- id: mixed-line-ending
Setup command:
Part 8: Community Management¶
8.1 CONTRIBUTING.md Template¶
# Contributing to Indonesia-MTEB
First, thank you for considering contributing to Indonesia-MTEB!
## How to Contribute
### Reporting Bugs
Report bugs using [GitHub Issues](https://github.com/indonesia-mteb/indonesia-mteb/issues).
### Suggesting New Datasets
We welcome new Indonesian datasets! Before proposing:
1. Check if the dataset is already in the roadmap
2. Ensure the dataset has appropriate licensing
3. Verify the dataset is in MTEB-compatible format
See [Adding Datasets](docs/guides/adding-datasets.md) for details.
### Development Setup
```bash
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/indonesia-mteb.git
cd indonesia-mteb
# Create a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install in development mode
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
Pull Request Process¶
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and linting
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Style¶
- Follow PEP 8
- Use Ruff for formatting
- Write docstrings in Google style
- Add tests for new features
Dataset Contribution Requirements¶
- License: Must be compatible with Apache-2.0
- Format: Must match MTEB schema
- Documentation: Complete dataset card
- Testing: Must pass validation
Maintainer Responsibilities¶
- Review PRs within 7 days
- Respond to issues within 14 days
- Release monthly versions
- Maintain documentation
Community Guidelines¶
Be respectful, constructive, and inclusive. See project contribution guidelines.
### 8.2 Issue Templates
```markdown
# .github/ISSUE_TEMPLATE/dataset_request.md
---
name: Dataset Request
about: Suggest a new Indonesian dataset to add
title: '[DATASET] '
labels: dataset, enhancement
---
## Dataset Information
**Dataset Name:**
**Dataset URL:**
**Task Type:** (Classification/Clustering/Reranking/Retrieval/STS/Summarization)
## Licensing
**License:**
**Link to License:**
## Dataset Details
- **Number of samples:**
- **Splits available:** (train/validation/test)
- **Languages:** (Indonesian/Javanese/Sundanese/etc.)
- **Domain:** (Social/News/Wikipedia/etc.)
## Why should we add this dataset?
## Additional Context
8.3 Pull Request Template¶
# .github/PULL_REQUEST_TEMPLATE.md
## Description
Briefly describe your changes.
## Type of Change
- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] New dataset
## Dataset Information (if adding a dataset)
- **Dataset Name:**
- **HuggingFace URL:**
- **Task Category:**
- **License:**
- **Number of samples:**
## Testing
- [ ] Tests added/updated
- [ ] All tests pass
- [ ] Pre-commit hooks passed
## Checklist
- [ ] Code follows project style
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new warnings generated
- [ ] Added tests/changed existing tests
- [ ] All tests pass
## Additional Notes
Part 9: Step-by-Step Implementation Guide¶
Phase 1: Initial Setup (Week 1)¶
Step 1: Create Repository Structure¶
# Create the repository
mkdir indonesia-mteb
cd indonesia-mteb
# Initialize git
git init
# Create basic structure
mkdir -p indonesia_mteb/{tasks/{classification,clustering,retrieval,reranking,sts,summarization},benchmarks,models,resources}
mkdir -p tests
mkdir -p docs
mkdir -p examples
mkdir -p scripts
Step 2: Create Configuration Files¶
Create these files with the templates provided above:
- pyproject.toml
- README.md
- LICENSE
- .gitignore
- .pre-commit-config.yaml
Step 3: Create GitHub Repository¶
# Create repository on GitHub first, then:
git remote add origin https://github.com/YOUR_ORG/indonesia-mteb.git
git add .
git commit -m "Initial commit"
git push -u origin main
Phase 2: Core Implementation (Weeks 2-4)¶
Step 1: Implement Package Skeleton¶
# Create __init__.py files
touch indonesia_mteb/__init__.py
touch indonesia_mteb/tasks/__init__.py
touch indonesia_mteb/benchmarks/__init__.py
Step 2: Implement First Task¶
Start with one classification task:
# indonesia_mteb/tasks/classification/indo_sentiment.py
from mteb.abstasks.AbsTaskClassification import AbsTaskClassification
from mteb.abstasks.TaskMetadata import TaskMetadata
class IndoSentimentClassification(AbsTaskClassification):
metadata = TaskMetadata(
name="IndoSentimentClassification",
# ... fill in from template above
)
Step 3: Create Benchmark¶
# indonesia_mteb/benchmarks/indonesia_mteb.py
from mteb.benchmarks import Benchmark
from indonesia_mteb.tasks.classification.indo_sentiment import IndoSentimentClassification
INDONESIA_MTEB = Benchmark(
name="Indonesia-MTEB",
tasks=[IndoSentimentClassification],
description="Indonesian Massive Text Embedding Benchmark",
)
Step 4: Test Locally¶
# test.py
import indonesia_mteb
import mteb
from sentence_transformers import SentenceTransformer
# Test import
print(indonesia_mteb.__version__)
# Test benchmark
benchmark = indonesia_mteb.get_benchmark()
print(f"Benchmark: {benchmark.name}")
print(f"Tasks: {len(benchmark.tasks)}")
# Test evaluation (if dataset exists)
model = SentenceTransformer("average_word_embeddings_levy_dependency")
tasks = benchmark.tasks
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder="test_results")
print(f"Results: {results}")
Phase 3: Dataset Integration (Weeks 5-8)¶
Step 1: Create HuggingFace Organization¶
- Go to https://huggingface.co/
- Create organization:
indonesia-mteb
Step 2: Upload First Dataset¶
# Install huggingface_hub
pip install huggingface_hub
# Login
huggingface-cli login
# Create dataset
huggingface-cli repo create indonesia-mteb/indo-sentiment --type dataset
# Upload
cd indo-sentiment
huggingface-cli upload indo-sentiment . .
Step 3: Update Task Definition¶
# Update the dataset path to use HuggingFace
dataset={
"path": "indonesia-mteb/indo-sentiment",
"revision": "main",
}
Phase 4: Documentation (Weeks 9-10)¶
Step 1: Set Up MkDocs¶
Step 2: Create Documentation Files¶
Following the structure in Part 5.
Step 3: Deploy Documentation¶
# Install mkdocs-material
pip install mkdocs-material
# Build locally
mkdocs serve
# Deploy to GitHub Pages (set up in GitHub repo settings first)
mkdocs gh-deploy
Phase 5: CI/CD (Week 11)¶
Step 1: Set Up GitHub Actions¶
Create the workflow files in .github/workflows/
Step 2: Configure PyPI¶
- Go to https://pypi.org/
- Create account
- Configure "Trusted Publishers" for your GitHub repo
Step 3: Test Release¶
# Tag the release
git tag v0.1.0
git push origin v0.1.0
# GitHub Actions will automatically publish to PyPI
Phase 6: Community Launch (Week 12)¶
Step 1: Prepare Announcement¶
# Indonesia-MTEB v0.1.0 Release
We're excited to announce the first release of Indonesia-MTEB!
## Features
- X classification tasks
- Y clustering tasks
- Z retrieval tasks
## Installation
```bash
pip install indonesia-mteb
Quick Start¶
...
Contributing¶
We welcome contributions! See CONTRIBUTING.md
Citation¶
...
#### Step 2: Announce on Channels
- GitHub Discussion
- Indonesian NLP communities
- Academic networks
- Social media
---
## Part 10: Common Issues and Solutions
### Issue 1: Dataset Not Found on HuggingFace
**Problem**: Import errors when trying to load datasets
**Solution**: Create datasets locally first, use absolute paths during development:
```python
# During development
dataset = {
"path": "/absolute/path/to/local/data",
# Not on HuggingFace yet
}
# After upload to HuggingFace
dataset = {
"path": "indonesia-mteb/my-dataset",
"revision": "main",
}
Issue 2: MTEB Version Compatibility¶
Problem: Tasks don't work with installed MTEB version
Solution: Pin MTEB version in dependencies:
Issue 3: Tests Fail on GitHub CI¶
Problem: Tests pass locally but fail on CI
Solution: Ensure all dependencies are listed and use same Python version:
Part 11: Best Practices Summary¶
Code Organization¶
| Practice | Recommendation |
|---|---|
| Imports | Use relative imports within package |
| Public API | Export everything in __init__.py |
| Type Hints | Add to all public functions |
| Docstrings | Google style, include examples |
| Tests | At least 80% coverage |
Versioning¶
Use semantic versioning: - MAJOR: Breaking changes - MINOR: New features, backward compatible - PATCH: Bug fixes
Example: 0.1.0 → 0.1.1 (bug fix) → 0.2.0 (new feature) → 1.0.0 (stable)
Release Frequency¶
- Patch releases: As needed for bug fixes
- Minor releases: Monthly with new datasets
- Major releases: Quarterly with breaking changes
Summary¶
Building indonesia-mteb as a Python package involves:
- Package Structure: Modern pyproject.toml-based setup
- Task Definitions: Extending MTEB's AbsTask classes
- Benchmark Class: Grouping tasks into Indonesia-MTEB
- CLI: Command-line tool for evaluation
- Testing: Pytest with coverage
- CI/CD: GitHub Actions for testing and publishing
- Documentation: MkDocs with API reference
- Community: Contribution guidelines and templates
Next Steps: 1. Create repository with basic structure 2. Implement first task 3. Set up CI/CD 4. Add documentation 5. Release v0.1.0 6. Engage community for contributions
This approach creates a maintainable, community-friendly package that leverages the MTEB ecosystem while providing Indonesian-specific functionality.