Skip to content

Document 11: Python Package Development

Overview

This document provides a comprehensive, step-by-step guide for building indonesia-mteb, a community-managed Python package for Indonesian text embedding benchmarking. Since you haven't built a Python package before, this guide covers everything from initial setup to community management.


Part 1: Understanding the MTEB Package Architecture

1.1 What MTEB Does

MTEB is a framework for: 1. Loading datasets in standardized formats 2. Running evaluations on embedding models 3. Aggregating results across tasks 4. Managing benchmarks (collections of tasks)

1.2 Key Components to Implement

Component Purpose Priority
Dataset Loaders Load Indonesian datasets in MTEB format Critical
Task Definitions Define evaluation tasks (Classification, etc.) Critical
Benchmark Class Group tasks into Indonesia-MTEB benchmark Critical
CLI Command-line interface for running evaluations High
Leaderboard Integration Submit results to MTEB leaderboard Medium
Documentation API docs and user guides High

1.3 Package Architecture Decision

Recommended Approach: Subset + Extension Pattern

Rather than forking the entire MTEB package, build a lightweight package that: 1. Depends on MTEB as a base dependency 2. Adds Indonesian datasets as extensions 3. Defines Indonesia-MTEB benchmark 4. Provides convenience functions for Indonesian use cases

# indonesia-mteb structure
# Leverages mteb as base, adds Indonesian-specific content

Benefits: - Smaller codebase to maintain - Automatic MTEB updates - Community can contribute datasets independently - Can be integrated into main MTEB later


Part 2: Package Structure and Setup

2.1 Complete Directory Structure

indonesia-mteb/
├── pyproject.toml              # Modern Python packaging config
├── README.md                   # Package overview
├── LICENSE                     # Apache-2.0 (matches MTEB)
├── CHANGELOG.md                # Version history
├── CONTRIBUTING.md             # Contribution guidelines
├── .pre-commit-config.yaml     # Pre-commit hooks
├── .github/
│   ├── workflows/
│   │   ├── test.yml            # CI/CD testing
│   │   ├── publish.yml         # PyPI publishing
│   │   └── labeler.yml         # PR labeler
│   ├── ISSUE_TEMPLATE/
│   │   ├── bug_report.md
│   │   ├── dataset_request.md
│   │   └── feature_request.md
│   └── PULL_REQUEST_TEMPLATE.md
├── docs/
│   ├── index.md                # Documentation home
│   ├── getting-started.md
│   ├── tasks.md
│   ├── adding-datasets.md
│   └── api/                    # Auto-generated API docs
├── indonesia_mteb/             # Main package
│   ├── __init__.py             # Public API exports
│   ├── __version__.py          # Version info
│   ├── tasks/                  # Task definitions
│   │   ├── __init__.py
│   │   ├── classification/     # Classification tasks
│   │   │   ├── __init__.py
│   │   │   ├── indo_sentiment.py
│   │   │   └── ...
│   │   ├── clustering/
│   │   ├── reranking/
│   │   ├── retrieval/
│   │   ├── sts/
│   │   └── summarization/
│   ├── benchmarks/             # Benchmark definitions
│   │   ├── __init__.py
│   │   └── indonesia_mteb.py
│   ├── models/                 # Model adapters (optional)
│   │   ├── __init__.py
│   │   └── indonesian_models.py
│   └── resources/              # Static resources
│       ├── cultural_terms.txt
│       └── stopwords_id.txt
├── tests/                      # Test suite
│   ├── __init__.py
│   ├── conftest.py             # Pytest fixtures
│   ├── test_tasks.py
│   ├── test_benchmarks.py
│   └── test_integration.py
├── scripts/                    # Utility scripts
│   ├── validate_datasets.py
│   └── generate_leaderboard.py
└── examples/                   # Usage examples
    ├── basic_usage.py
    └── custom_model.py

2.2 pyproject.toml (Complete Template)

# pyproject.toml - Modern Python packaging configuration
[build-system]
requires = ["setuptools>=61.0", "wheel", "setuptools-scm>=8.0"]
build-backend = "setuptools.build_meta"

[project]
name = "indonesia-mteb"
version = "0.1.0"  # Will be managed by setuptools-scm
description = "Indonesian Massive Text Embedding Benchmark"
readme = "README.md"
requires-python = ">=3.9"
license = {text = "Apache-2.0"}
authors = [
    {name = "Indonesia-MTEB Team", email = "contact@indonesia-mteb.org"}
]
maintainers = [
    {name = "Indonesia-MTEB Team", email = "contact@indonesia-mteb.org"}
]
keywords = [
    "embedding",
    "benchmark",
    "indonesian",
    "nlp",
    "text-embedding",
    "evaluation",
    "mteb"
]
classifiers = [
    "Development Status :: 3 - Alpha",
    "Intended Audience :: Developers",
    "Intended Audience :: Science/Research",
    "License :: OSI Approved :: Apache Software License",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
    "Topic :: Scientific/Engineering :: Artificial Intelligence",
]

# Dependencies
dependencies = [
    "mteb>=2.7.0",
    "datasets>=2.14.0",
    "sentence-transformers>=2.2.0",
    "numpy>=1.21.0",
    "requests>=2.28.0",
    "tqdm>=4.65.0",
    "pyarrow>=12.0.0",  # For parquet support
]

# Optional dependencies
[project.optional-dependencies]
dev = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
    "pre-commit>=3.3.0",
]
docs = [
    "mkdocs>=1.5.0",
    "mkdocs-material>=9.0.0",
    "mkdocstrings[python]>=0.22.0",
    "mkdocs-gen-files>=0.5.0",
    "mkdocs-literate-nav>=0.6.0",
    "mkdocs-section-index>=0.3.0",
]
benchmark = [
    "openai>=1.0.0",  # For API-based models
    "anthropic>=0.18.0",
    "transformers[torch]>=4.35.0",
    "accelerate>=0.24.0",
]
vision = [
    "pillow>=10.0.0",
]
all = [
    "indonesia-mteb[dev,docs,benchmark,vision]",
]

[project.urls]
Homepage = "https://github.com/indonesia-mteb/indonesia-mteb"
Documentation = "https://indonesia-mteb.github.io/indonesia-mteb"
Repository = "https://github.com/indonesia-mteb/indonesia-mteb"
"Bug Tracker" = "https://github.com/indonesia-mteb/indonesia-mteb/issues"
Changelog = "https://github.com/indonesia-mteb/indonesia-mteb/blob/main/CHANGELOG.md"

[project.scripts]
indonesia-mteb = "indonesia_mteb.cli:main"

# Tool configurations
[tool.setuptools]
packages = ["indonesia_mteb"]

[tool.setuptools_scm]
write_to = "indonesia_mteb/_version.py"
version_scheme = "guess-next-dev"
local_scheme = "no-local-version"

# Ruff configuration
[tool.ruff]
target-version = "py39"
line-length = 100
indent-width = 4

[tool.ruff.lint]
select = [
    "E",      # pycodestyle errors
    "W",      # pycodestyle warnings
    "F",      # Pyflakes
    "I",      # isort
    "B",      # flake8-bugbear
    "C4",     # flake8-comprehensions
    "UP",     # pyupgrade
    "ARG",    # flake8-unused-arguments
    "SIM",    # flake8-simplify
]
ignore = [
    "E501",   # line too long (handled by formatter)
    "B008",   # do not perform function calls in argument defaults
    "W191",   # indentation contains tabs
]

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"

[tool.ruff.lint.isort]
known-first-party = ["indonesia_mteb"]

# Pytest configuration
[tool.pytest.ini_options]
minversion = "7.0"
testpaths = ["tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = [
    "--cov=indonesia_mteb",
    "--cov-report=term-missing",
    "--cov-report=html",
    "--strict-markers",
]
markers = [
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "integration: marks tests as integration tests",
    "unit: marks tests as unit tests",
]

# Coverage configuration
[tool.coverage.run]
source = ["indonesia_mteb"]
omit = [
    "*/tests/*",
    "*/__pycache__/*",
    "*/site-packages/*",
]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise AssertionError",
    "raise NotImplementedError",
    "if __name__ == .__main__.:",
    "if TYPE_CHECKING:",
    "@abstractmethod",
]

# MyPy configuration
[tool.mypy]
python_version = "3.9"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = false  # Can be enabled later
disallow_incomplete_defs = false
check_untyped_defs = true
no_implicit_optional = true
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
follow_imports = "normal"
ignore_missing_imports = true

[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false

2.3 Version Management

# indonesia_mteb/__version__.py
__version__ = "0.1.0"

# Or use setuptools-scm for automatic versioning from git tags

Part 3: Core Implementation

3.1 Package init.py (Public API)

# indonesia_mteb/__init__.py
"""
Indonesia-MTEB: Indonesian Massive Text Embedding Benchmark

A community-managed benchmark for evaluating Indonesian text embeddings.
"""

__version__ = "0.1.0"

# Import main components for public API
from indonesia_mteb.benchmarks import IndonesiaMTEB
from indonesia_mteb.tasks import get_indonesian_tasks
from indonesia_mteb.tasks import get_benchmark

__all__ = [
    "__version__",
    "IndonesiaMTEB",
    "get_indonesian_tasks",
    "get_benchmark",
]

# Convenience functions
def get_tasks(tasks=None, languages=None, domains=None):
    """Get Indonesian MTEB tasks with optional filtering.

    Args:
        tasks: List of task names to include (None = all)
        languages: List of languages to filter by (e.g., ["id", "jv", "su"])
        domains: List of domains to filter by (e.g., ["Social", "News"])

    Returns:
        List of MTEB task objects
    """
    all_tasks = get_indonesian_tasks()

    if tasks:
        all_tasks = [t for t in all_tasks if t.metadata.name in tasks]
    if languages:
        all_tasks = [t for t in all_tasks if any(
            lang in t.metadata.eval_langs for lang in languages
        )]
    if domains:
        all_tasks = [t for t in all_tasks if any(
            domain in t.metadata.domains for domain in domains
        )]

    return all_tasks


def get_benchmark(name="Indonesia-MTEB"):
    """Get an Indonesia-MTEB benchmark.

    Args:
        name: Benchmark name ("Indonesia-MTEB" or "Indonesia-MTEB-lite")

    Returns:
        MTEB benchmark object
    """
    from indonesia_mteb.benchmarks.indonesia_mteb import (
        INDONESIA_MTEB,
        INDONESIA_MTEB_LITE
    )

    benchmarks = {
        "Indonesia-MTEB": INDONESIA_MTEB,
        "Indonesia-MTEB-lite": INDONESIA_MTEB_LITE
    }

    return benchmarks.get(name, INDONESIA_MTEB)

3.2 Task Definition Template

# indonesia_mteb/tasks/classification/indo_sentiment.py
"""Indonesian Sentiment Classification Task"""

from mteb.abstasks.AbsTaskClassification import AbsTaskClassification
from mteb.abstasks.TaskMetadata import TaskMetadata

# This import pattern allows datasets to be loaded from HuggingFace
# or locally for development


class IndoSentimentClassification(AbsTaskClassification):
    """Indonesian sentiment classification task.

    Dataset sourced from Indonesian e-commerce and social media reviews.
    Predicts sentiment (positive, negative, neutral).
    """

    metadata = TaskMetadata(
        name="IndoSentimentClassification",
        description="Indonesian sentiment classification from reviews and social media",
        reference="https://huggingface.co/datasets/indonesia-mteb/indo-sentiment",
        dataset={
            "path": "indonesia-mteb/indo-sentiment",
            "revision": "main",
            # For local development during dataset creation:
            # "path": "/path/to/local/data",
        },
        type="Classification",
        category="s2s",  # sentence-to-sentence
        eval_splits=["test"],
        eval_langs=["id-ID"],
        main_score="accuracy",
        date=("2024-01-01", "2024-12-31"),
        form=["written"],
        domains=["Social", "Reviews"],
        task_subtypes=["Sentiment analysis"],
        license="CC-BY-4.0",
        annotations_creators="human-verified",
        dialect=[],
        sample_creation="found",
        bibtex_citation="""@dataset{indo_sentiment_2024,
            title={Indonesian Sentiment Classification Dataset},
            author={Indonesia-MTEB Team},
            year={2024},
            publisher={Hugging Face},
            url={https://huggingface.co/datasets/indonesia-mteb/indo-sentiment}
        }""",
    )

    def dataset_transform(self):
        """Optional: Transform dataset after loading.

        Use this to rename columns, convert labels, etc.
        """
        # Example transformations if needed
        # self.dataset = self.dataset.rename_column("label", "label_text")
        pass

3.3 Benchmark Definition

# indonesia_mteb/benchmarks/indonesia_mteb.py
"""Indonesia-MTEB Benchmark Definitions"""

from mteb.benchmarks import Benchmark
from indonesia_mteb.tasks.classification import (
    IndoSentimentClassification,
    IndoEmotionClassification,
    IndoTopicsClassification,
)
from indonesia_mteb.tasks.clustering import (
    IndoWikiClustering,
    IndoNewsClustering,
)
from indonesia_mteb.tasks.retrieval import (
    IndoMSMARCO,
    IndoNFCorpus,
)
from indonesia_mteb.tasks.sts import (
    IndoSTS,
    IndoSICKRel,
)

# Define benchmarks
INDONESIA_MTEB_TASKS = [
    # Classification
    IndoSentimentClassification,
    IndoEmotionClassification,
    IndoTopicsClassification,
    # Clustering
    IndoWikiClustering,
    IndoNewsClustering,
    # Retrieval
    IndoMSMARCO,
    IndoNFCorpus,
    # STS
    IndoSTS,
    IndoSICKRel,
    # Add more tasks as they are created...
]

INDONESIA_MTEB = Benchmark(
    name="Indonesia-MTEB",
    tasks=INDONESIA_MTEB_TASKS,
    description="Indonesian Massive Text Embedding Benchmark - Full version",
    citation="""@inproceedings{indonesia_mteb_2026,
        title={Indonesia-MTEB: A Comprehensive Text Embedding Benchmark for Indonesian},
        author={Authors},
        booktitle={Conference},
        year={2026}
    }""",
)

# Lite version with faster tasks
INDONESIA_MTEB_LITE_TASKS = [
    IndoSentimentClassification,
    IndoEmotionClassification,
    IndoTopicsClassification,
    IndoSTS,
]

INDONESIA_MTEB_LITE = Benchmark(
    name="Indonesia-MTEB-lite",
    tasks=INDONESIA_MTEB_LITE_TASKS,
    description="Indonesian Massive Text Embedding Benchmark - Lite version (quick evaluation)",
    citation=INDONESIA_MTEB.citation,
)

# Domain-specific benchmarks
INDONESIA_MTEB_SOCIAL = Benchmark(
    name="Indonesia-MTEB-social",
    tasks=[
        IndoSentimentClassification,
        IndoEmotionClassification,
    ],
    description="Indonesia-MTEB - Social Media domain only",
)

INDONESIA_MTEB_RETRIEVAL = Benchmark(
    name="Indonesia-MTEB-retrieval",
    tasks=[
        IndoMSMARCO,
        IndoNFCorpus,
    ],
    description="Indonesia-MTEB - Retrieval tasks only",
)

3.4 CLI Implementation

# indonesia_mteb/cli.py
"""Command-line interface for Indonesia-MTEB"""

import argparse
from typing import Optional

import mteb
from indonesia_mteb import get_benchmark, get_tasks


def main():
    """Main CLI entry point."""
    parser = argparse.ArgumentParser(
        description="Indonesia-MTEB: Evaluate Indonesian text embeddings"
    )

    # Model arguments
    parser.add_argument(
        "-m", "--model",
        type=str,
        required=True,
        help="Model name or path (e.g., 'sentence-transformers/all-MiniLM-L6-v2')"
    )

    # Task selection
    parser.add_argument(
        "-t", "--tasks",
        type=str,
        nargs="+",
        help="Specific tasks to run (default: all Indonesia-MTEB tasks)"
    )

    parser.add_argument(
        "-b", "--benchmark",
        type=str,
        default="Indonesia-MTEB",
        choices=["Indonesia-MTEB", "Indonesia-MTEB-lite", "Indonesia-MTEB-social", "Indonesia-MTEB-retrieval"],
        help="Benchmark to run (default: Indonesia-MTEB)"
    )

    # Evaluation options
    parser.add_argument(
        "--output-folder",
        type=str,
        default="results",
        help="Output folder for results"
    )

    parser.add_argument(
        "--batch-size",
        type=int,
        default=32,
        help="Batch size for encoding"
    )

    parser.add_argument(
        "--eval-splits",
        type=str,
        nargs="+",
        default=["test"],
        help="Evaluation splits (default: ['test'])"
    )

    parser.add_argument(
        "--co2_tracker",
        action="store_true",
        help="Track CO2 emissions during evaluation"
    )

    args = parser.parse_args()

    # Get model
    model = mteb.get_model(args.model)

    # Get tasks or benchmark
    if args.tasks:
        tasks = get_tasks(tasks=args.tasks)
        print(f"Running {len(tasks)} specific tasks")
    else:
        benchmark = get_benchmark(args.benchmark)
        tasks = benchmark.tasks
        print(f"Running {args.benchmark} with {len(tasks)} tasks")

    # Run evaluation
    evaluation = mteb.MTEB(tasks=tasks)

    results = evaluation.run(
        model,
        output_folder=args.output_folder,
        eval_splits=args.eval_splits,
        batch_size=args.batch_size,
        co2_tracker=args.co2_tracker,
    )

    # Print summary
    print("\n" + "="*60)
    print("EVALUATION COMPLETE")
    print("="*60)

    for task_name, task_results in results.items():
        main_score = task_results.get("main_score", "N/A")
        print(f"{task_name}: {main_score}")

    print(f"\nResults saved to: {args.output_folder}")


if __name__ == "__main__":
    main()

Part 4: Testing Infrastructure

4.1 Test Configuration

# tests/conftest.py
"""Pytest configuration and fixtures"""

import pytest
from sentence_transformers import SentenceTransformer

@pytest.fixture(scope="session")
def sample_model():
    """Fixture providing a sample model for testing."""
    return SentenceTransformer("average_word_embeddings_levy_dependency")


@pytest.fixture(scope="session")
def sample_texts():
    """Fixture providing sample Indonesian texts."""
    return [
        "Produk ini sangat bagus dan pengiriman cepat.",
        "Saya sangat kecewa dengan kualitas barang ini.",
        "Harga terjangkau dengan kualitas yang oke.",
    ]


@pytest.fixture
def temp_output_dir(tmp_path):
    """Fixture providing a temporary output directory."""
    output_dir = tmp_path / "results"
    output_dir.mkdir()
    return output_dir


# Markers for different test types
def pytest_configure(config):
    """Configure custom pytest markers."""
    config.addinivalue_mark("slow", "mark test as slow running")
    config.addinivalue_mark("integration", "mark test as integration test")
    config.addinivalue_mark("unit", "mark test as unit test")

4.2 Sample Tests

# tests/test_tasks.py
"""Tests for task definitions"""

import pytest
from indonesia_mteb import get_tasks, get_benchmark


class TestTaskDiscovery:
    """Test task discovery and loading."""

    def test_get_indonesian_tasks(self):
        """Test that Indonesian tasks can be retrieved."""
        tasks = get_tasks()
        assert len(tasks) > 0
        assert all(hasattr(t, 'metadata') for t in tasks)

    def test_tasks_have_required_metadata(self):
        """Test that all tasks have required metadata fields."""
        tasks = get_tasks()

        for task in tasks:
            metadata = task.metadata
            assert metadata.name is not None
            assert metadata.description is not None
            assert metadata.main_score is not None
            assert "id" in str(metadata.eval_langs)

    def test_filter_tasks_by_language(self):
        """Test filtering tasks by language."""
        indonesian_tasks = get_tasks(languages=["id"])
        javanese_tasks = get_tasks(languages=["jv"])

        assert len(indonesian_tasks) > 0
        # Javanese might be 0 until we add those datasets

    def test_filter_tasks_by_domain(self):
        """Test filtering tasks by domain."""
        social_tasks = get_tasks(domains=["Social"])
        assert len(social_tasks) > 0


class TestBenchmarks:
    """Test benchmark definitions."""

    def test_get_default_benchmark(self):
        """Test getting the default benchmark."""
        benchmark = get_benchmark()
        assert benchmark is not None
        assert "Indonesia-MTEB" in benchmark.name

    def test_benchmark_has_tasks(self):
        """Test that benchmark contains tasks."""
        benchmark = get_benchmark()
        assert len(benchmark.tasks) > 0

    def test_lite_benchmark(self):
        """Test the lite benchmark."""
        benchmark = get_benchmark("Indonesia-MTEB-lite")
        assert "lite" in benchmark.name.lower()
        assert len(benchmark.tasks) < len(get_benchmark().tasks)


@pytest.mark.integration
class TestTaskEvaluation:
    """Integration tests for task evaluation."""

    def test_classification_task_evaluation(self, sample_model, temp_output_dir):
        """Test actual evaluation on a classification task."""
        tasks = get_tasks(tasks=["IndoSentimentClassification"])

        if not tasks:
            pytest.skip("Task not yet implemented")

        evaluation = mteb.MTEB(tasks=tasks)
        results = evaluation.run(
            sample_model,
            output_folder=str(temp_output_dir),
            eval_splits=["test"]  # Use test split
        )

        assert "IndoSentimentClassification" in results


@pytest.mark.unit
class TestDatasetLoading:
    """Unit tests for dataset loading."""

    def test_dataset_can_be_loaded(self):
        """Test that datasets can be loaded from HuggingFace."""
        tasks = get_tasks()

        if not tasks:
            pytest.skip("No tasks available")

        task = tasks[0]

        # This will fail if dataset doesn't exist on HF yet
        try:
            dataset = task.load_data()
            assert dataset is not None
        except Exception as e:
            pytest.skip(f"Dataset not available: {e}")

Part 5: Documentation Setup

5.1 MkDocs Configuration

# mkdocs.yml
site_name: Indonesia-MTEB
site_description: Indonesian Massive Text Embedding Benchmark
site_url: https://indonesia-mteb.github.io/indonesia-mteb
repo_url: https://github.com/indonesia-mteb/indonesia-mteb
repo_name: indonesia-mteb/indonesia-mteb

theme:
  name: material
  language: en
  palette:
    - scheme: default
      primary: red
      accent: red
      toggle:
        icon: material/brightness-7
        name: Switch to dark mode
    - scheme: slate
      primary: red
      accent: red
      toggle:
        icon: material/brightness-4
        name: Switch to light mode
  features:
    - navigation.instant
    - navigation.tracking
    - navigation.tabs
    - navigation.sections
    - navigation.expand
    - navigation.indexes
    - search.suggest
    - search.highlight
    - content.code.copy
    - content.tabs.link
  icon:
    logo: material/logo.svg
    repo: fontawesome/brands/github

plugins:
  - search
  - mkdocstrings:
      handlers:
        python:
          options:
            docstring_style: google
            show_source: true
            show_root_heading: true
            show_root_members_full_path: false
            members_order: source
  - gen-files:
      - files:
          - name: api
            url: https://indonesia-mteb.github.io/indonesia-mteb/api/
  - literate-nav:
      nav_file: SUMMARY.md
  - section-index
  - tags

nav:
  - Home: index.md
  - Getting Started:
    - Installation: getting-started/installation.md
    - Quick Start: getting-started/quickstart.md
    - Basic Usage: getting-started/basic-usage.md
  - Tasks:
    - Overview: tasks/overview.md
    - Classification: tasks/classification.md
    - Clustering: tasks/clustering.md
    - Retrieval: tasks/retrieval.md
    - STS: tasks/sts.md
  - Guides:
    - Adding Datasets: guides/adding-datasets.md
    - Custom Models: guides/custom-models.md
    - Benchmarking: guides/benchmarking.md
  - API Reference:
    - api/index.md
  - Community:
    - Contributing: community/contributing.md
    - Code of Conduct: community/code-of-conduct.md

extra:
  social:
    - icon: fontawesome/brands/github
      link: https://github.com/indonesia-mteb/indonesia-mteb
    - icon: fontawesome/brands/python
      link: https://pypi.org/project/indonesia-mteb/

5.2 Documentation Example

# docs/getting-started/quickstart.md

# Quick Start

## Installation

Install indonesia-mteb via pip:

```bash
pip install indonesia-mteb

Or with uv for faster installation:

uv add indonesia-mteb

Basic Usage

Running the Full Benchmark

import indonesia_mteb
from sentence_transformers import SentenceTransformer

# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Get Indonesia-MTEB benchmark
benchmark = indonesia_mteb.get_benchmark("Indonesia-MTEB")

# Run evaluation
results = benchmark.run(
    model,
    output_folder="results",
)

# Print results
for task_name, task_results in results.items():
    print(f"{task_name}: {task_results['main_score']}")

Running Specific Tasks

import indonesia_mteb
import mteb

# Load model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Get specific tasks
tasks = indonesia_mteb.get_tasks(tasks=[
    "IndoSentimentClassification",
    "IndoEmotionClassification",
])

# Run evaluation
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(
    model,
    output_folder="results",
)

Using the CLI

# Run full benchmark
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
              -b Indonesia-MTEB \
              --output-folder results

# Run specific tasks
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
              -t IndoSentimentClassification IndoEmotionClassification \
              --output-folder results

# Run lite benchmark (faster)
indonesia-mteb -m sentence-transformers/all-MiniLM-L6-v2 \
              -b Indonesia-MTEB-lite \
              --output-folder results
---

## Part 6: CI/CD Pipeline

### 6.1 GitHub Actions Test Workflow

```yaml
# .github/workflows/test.yml
name: Tests

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main, develop]

jobs:
  test:
    runs-on: ${{ matrix.os }}
    strategy:
      fail-fast: false
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]
        python-version: ["3.9", "3.10", "3.11", "3.12"]

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: ${{ matrix.python-version }}
          cache: 'pip'

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -e ".[dev]"

      - name: Run linting
        run: |
          ruff check indonesia_mteb
          ruff format --check indonesia_mteb

      - name: Run type checking
        run: mypy indonesia_mteb
        continue-on-error: true

      - name: Run tests
        run: |
          pytest --cov=indonesia_mteb --cov-report=xml --cov-report=term

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v4
        with:
          file: ./coverage.xml
          flags: unittests
          name: codecov-${{ matrix.os }}-py${{ matrix.python-version }}

  test-minimal:
    # Run with minimal dependencies
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -e .

      - name: Run import test
        run: python -c "import indonesia_mteb; print(indonesia_mteb.__version__)"

6.2 Publish Workflow

# .github/workflows/publish.yml
name: Publish to PyPI

on:
  push:
    tags:
      - 'v*'

permissions:
  contents: read
  id-token: write  # Required for trusted publishing

jobs:
  build:
    runs-on: ubuntu-latest

    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.11"

      - name: Install build dependencies
        run: |
          python -m pip install --upgrade pip
          pip install build twine

      - name: Build package
        run: python -m build

      - name: Check distribution
        run: twine check dist/*

      - name: Publish to PyPI
        uses: pypa/gh-action-pypi-publish@release/v1
        with:
          packages-dir: dist/

  build-github-release:
    runs-on: ubuntu-latest
    needs: build

    permissions:
      contents: write

    steps:
      - uses: actions/checkout@v4

      - name: Create GitHub Release
        uses: softprops/action-gh-release@v2
        with:
          files: dist/*
          generate_release_notes: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Part 7: Pre-Commit Configuration

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.1.0
    hooks:
      - id: ruff
        args: [--fix, --exit-non-zero-on-fix]
      - id: ruff-format

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0
    hooks:
      - id: mypy
        additional_dependencies:
          - types-requests
          - types-setuptools
        args: [--ignore-missing-imports]

  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
      - id: check-added-large-files
      - id: check-merge-conflict
      - id: check-case-conflict
      - id: check-docstring-first
      - id: debug-statements
      - id: mixed-line-ending

Setup command:

pip install pre-commit
pre-commit install


Part 8: Community Management

8.1 CONTRIBUTING.md Template

# Contributing to Indonesia-MTEB

First, thank you for considering contributing to Indonesia-MTEB!

## How to Contribute

### Reporting Bugs

Report bugs using [GitHub Issues](https://github.com/indonesia-mteb/indonesia-mteb/issues).

### Suggesting New Datasets

We welcome new Indonesian datasets! Before proposing:

1. Check if the dataset is already in the roadmap
2. Ensure the dataset has appropriate licensing
3. Verify the dataset is in MTEB-compatible format

See [Adding Datasets](docs/guides/adding-datasets.md) for details.

### Development Setup

```bash
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/indonesia-mteb.git
cd indonesia-mteb

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

Pull Request Process

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Run tests and linting
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

Code Style

  • Follow PEP 8
  • Use Ruff for formatting
  • Write docstrings in Google style
  • Add tests for new features

Dataset Contribution Requirements

  • License: Must be compatible with Apache-2.0
  • Format: Must match MTEB schema
  • Documentation: Complete dataset card
  • Testing: Must pass validation

Maintainer Responsibilities

  • Review PRs within 7 days
  • Respond to issues within 14 days
  • Release monthly versions
  • Maintain documentation

Community Guidelines

Be respectful, constructive, and inclusive. See project contribution guidelines.

### 8.2 Issue Templates

```markdown
# .github/ISSUE_TEMPLATE/dataset_request.md
---
name: Dataset Request
about: Suggest a new Indonesian dataset to add
title: '[DATASET] '
labels: dataset, enhancement
---

## Dataset Information

**Dataset Name:**
**Dataset URL:**
**Task Type:** (Classification/Clustering/Reranking/Retrieval/STS/Summarization)

## Licensing

**License:**
**Link to License:**

## Dataset Details

- **Number of samples:**
- **Splits available:** (train/validation/test)
- **Languages:** (Indonesian/Javanese/Sundanese/etc.)
- **Domain:** (Social/News/Wikipedia/etc.)

## Why should we add this dataset?

## Additional Context

8.3 Pull Request Template

# .github/PULL_REQUEST_TEMPLATE.md
## Description

Briefly describe your changes.

## Type of Change

- [ ] Bug fix
- [ ] New feature
- [ ] Documentation update
- [ ] New dataset

## Dataset Information (if adding a dataset)

- **Dataset Name:**
- **HuggingFace URL:**
- **Task Category:**
- **License:**
- **Number of samples:**

## Testing

- [ ] Tests added/updated
- [ ] All tests pass
- [ ] Pre-commit hooks passed

## Checklist

- [ ] Code follows project style
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] No new warnings generated
- [ ] Added tests/changed existing tests
- [ ] All tests pass

## Additional Notes

Part 9: Step-by-Step Implementation Guide

Phase 1: Initial Setup (Week 1)

Step 1: Create Repository Structure

# Create the repository
mkdir indonesia-mteb
cd indonesia-mteb

# Initialize git
git init

# Create basic structure
mkdir -p indonesia_mteb/{tasks/{classification,clustering,retrieval,reranking,sts,summarization},benchmarks,models,resources}
mkdir -p tests
mkdir -p docs
mkdir -p examples
mkdir -p scripts

Step 2: Create Configuration Files

Create these files with the templates provided above: - pyproject.toml - README.md - LICENSE - .gitignore - .pre-commit-config.yaml

Step 3: Create GitHub Repository

# Create repository on GitHub first, then:
git remote add origin https://github.com/YOUR_ORG/indonesia-mteb.git
git add .
git commit -m "Initial commit"
git push -u origin main

Phase 2: Core Implementation (Weeks 2-4)

Step 1: Implement Package Skeleton

# Create __init__.py files
touch indonesia_mteb/__init__.py
touch indonesia_mteb/tasks/__init__.py
touch indonesia_mteb/benchmarks/__init__.py

Step 2: Implement First Task

Start with one classification task:

# indonesia_mteb/tasks/classification/indo_sentiment.py
from mteb.abstasks.AbsTaskClassification import AbsTaskClassification
from mteb.abstasks.TaskMetadata import TaskMetadata

class IndoSentimentClassification(AbsTaskClassification):
    metadata = TaskMetadata(
        name="IndoSentimentClassification",
        # ... fill in from template above
    )

Step 3: Create Benchmark

# indonesia_mteb/benchmarks/indonesia_mteb.py
from mteb.benchmarks import Benchmark
from indonesia_mteb.tasks.classification.indo_sentiment import IndoSentimentClassification

INDONESIA_MTEB = Benchmark(
    name="Indonesia-MTEB",
    tasks=[IndoSentimentClassification],
    description="Indonesian Massive Text Embedding Benchmark",
)

Step 4: Test Locally

# test.py
import indonesia_mteb
import mteb
from sentence_transformers import SentenceTransformer

# Test import
print(indonesia_mteb.__version__)

# Test benchmark
benchmark = indonesia_mteb.get_benchmark()
print(f"Benchmark: {benchmark.name}")
print(f"Tasks: {len(benchmark.tasks)}")

# Test evaluation (if dataset exists)
model = SentenceTransformer("average_word_embeddings_levy_dependency")
tasks = benchmark.tasks
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder="test_results")
print(f"Results: {results}")

Phase 3: Dataset Integration (Weeks 5-8)

Step 1: Create HuggingFace Organization

  1. Go to https://huggingface.co/
  2. Create organization: indonesia-mteb

Step 2: Upload First Dataset

# Install huggingface_hub
pip install huggingface_hub

# Login
huggingface-cli login

# Create dataset
huggingface-cli repo create indonesia-mteb/indo-sentiment --type dataset

# Upload
cd indo-sentiment
huggingface-cli upload indo-sentiment . .

Step 3: Update Task Definition

# Update the dataset path to use HuggingFace
dataset={
    "path": "indonesia-mteb/indo-sentiment",
    "revision": "main",
}

Phase 4: Documentation (Weeks 9-10)

Step 1: Set Up MkDocs

pip install -e ".[docs]"

Step 2: Create Documentation Files

Following the structure in Part 5.

Step 3: Deploy Documentation

# Install mkdocs-material
pip install mkdocs-material

# Build locally
mkdocs serve

# Deploy to GitHub Pages (set up in GitHub repo settings first)
mkdocs gh-deploy

Phase 5: CI/CD (Week 11)

Step 1: Set Up GitHub Actions

Create the workflow files in .github/workflows/

Step 2: Configure PyPI

  1. Go to https://pypi.org/
  2. Create account
  3. Configure "Trusted Publishers" for your GitHub repo

Step 3: Test Release

# Tag the release
git tag v0.1.0
git push origin v0.1.0

# GitHub Actions will automatically publish to PyPI

Phase 6: Community Launch (Week 12)

Step 1: Prepare Announcement

# Indonesia-MTEB v0.1.0 Release

We're excited to announce the first release of Indonesia-MTEB!

## Features

- X classification tasks
- Y clustering tasks
- Z retrieval tasks

## Installation

```bash
pip install indonesia-mteb

Quick Start

...

Contributing

We welcome contributions! See CONTRIBUTING.md

Citation

...

#### Step 2: Announce on Channels

- GitHub Discussion
- Indonesian NLP communities
- Academic networks
- Social media

---

## Part 10: Common Issues and Solutions

### Issue 1: Dataset Not Found on HuggingFace

**Problem**: Import errors when trying to load datasets

**Solution**: Create datasets locally first, use absolute paths during development:

```python
# During development
dataset = {
    "path": "/absolute/path/to/local/data",
    # Not on HuggingFace yet
}

# After upload to HuggingFace
dataset = {
    "path": "indonesia-mteb/my-dataset",
    "revision": "main",
}

Issue 2: MTEB Version Compatibility

Problem: Tasks don't work with installed MTEB version

Solution: Pin MTEB version in dependencies:

dependencies = [
    "mteb>=2.7.0,<3.0.0",
]

Issue 3: Tests Fail on GitHub CI

Problem: Tests pass locally but fail on CI

Solution: Ensure all dependencies are listed and use same Python version:

python-version: "3.11"  # Match your local version

Part 11: Best Practices Summary

Code Organization

Practice Recommendation
Imports Use relative imports within package
Public API Export everything in __init__.py
Type Hints Add to all public functions
Docstrings Google style, include examples
Tests At least 80% coverage

Versioning

Use semantic versioning: - MAJOR: Breaking changes - MINOR: New features, backward compatible - PATCH: Bug fixes

Example: 0.1.00.1.1 (bug fix) → 0.2.0 (new feature) → 1.0.0 (stable)

Release Frequency

  • Patch releases: As needed for bug fixes
  • Minor releases: Monthly with new datasets
  • Major releases: Quarterly with breaking changes

Summary

Building indonesia-mteb as a Python package involves:

  1. Package Structure: Modern pyproject.toml-based setup
  2. Task Definitions: Extending MTEB's AbsTask classes
  3. Benchmark Class: Grouping tasks into Indonesia-MTEB
  4. CLI: Command-line tool for evaluation
  5. Testing: Pytest with coverage
  6. CI/CD: GitHub Actions for testing and publishing
  7. Documentation: MkDocs with API reference
  8. Community: Contribution guidelines and templates

Next Steps: 1. Create repository with basic structure 2. Implement first task 3. Set up CI/CD 4. Add documentation 5. Release v0.1.0 6. Engage community for contributions

This approach creates a maintainable, community-friendly package that leverages the MTEB ecosystem while providing Indonesian-specific functionality.