Libcll ================================== What is Libcll? --------------------- `libcll` is an extendable Python toolkit designed for conducting complementary-label learning process in a standardized way. This library provides synthetic complementary-label datasets based on specified distributions, state-of-the-art complementary-label learning methods, metrics for complementary-label learning evaluation, and support for multiple complementary-label learning approaches. Getting Started --------------- In this session, we will showcase an example to kickstart `libcll` and guide you through the complementary-label learning process to demonstrate the simplicity of CLL implementation with `libcll`. For the beginners, we recommend SCL-NL method due to its simplicity and speed, along with CLMNIST dataset for its quick learning curve. Installation ```````````` - Python version >= 3.8, <= 3.12 - Pytorch version >= 1.11, <= 2.0 - Pytorch Lightning version >= 2.0 - To install `libcll` and develop locally: .. code-block:: shell git clone git@github.com:ntucllab/libcll.git cd libcll pip install -e . Data Preparation ```````````````` First, the CLMNIST dataset can be directly imported from `libcll`. However, in the aid of diverse complementary-label distribution settings, the user need to execute function ``libcll.datasets.CLBaseDataset.gen_complementary_target(num_cl, Q)``, where ``num_cl`` and ``Q`` represent number of complementary labels for each instance and class transition probability matrix respectively. .. code-block:: python from torch.utils.data import random_split, DataLoader from libcll.datasets import CLMNIST from libcll.datasets.utils import collate_fn_multi_label train_set = CLMNIST(root="./data/mnist", train=True) test_set = CLMNIST(root="./data/mnist", train=False) train_set.gen_complementary_target() input_dim = train_set.input_dim num_classes = train_set.num_classes batch_size = 256 train_set, valid_set = random_split(train_set, [0.9, 0.1]) train_loader = DataLoader(train_set, batch_size=batch_size, collate_fn=collate_fn_multi_label, shuffle=True, num_workers=4) valid_loader = DataLoader(valid_set, batch_size=batch_size, collate_fn=collate_fn_multi_label, shuffle=False, num_workers=4) test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=4) Build Model and Strategy ```````````````````````` `libcll` provides easy access to well-known complementary-label learning methods. To mimic real-world scenarios, we assume the validation set contains only complementary labels and thus use SCEL loss instead of accuracy as the validation metric. .. code-block:: python from libcll.models import MLP from libcll.strategies import SCL model = MLP(input_dim, 512, num_classes) strategy = SCL( model=model, valid_type="SCEL", num_classes=num_classes, type="NL", lr=1e-4, ) Start Training `````````````` To train the model, we leverage PyTorch Lightning for easy-to-build training pipeline, GPU acceleration and callbacks such as early stopping, model checkpointing, and logging to TensorBoard. .. code-block:: python import pytorch_lightning as pl trainer = pl.Trainer( max_epochs=300, accelerator="gpu", ) trainer.fit( strategy, train_dataloaders=train_loader, val_dataloaders=valid_loader, ) trainer.test( dataloaders=test_loader, ) ---------- .. toctree:: :caption: User Guide :maxdepth: 2 user-guide/Datasets user-guide/Models user-guide/Strategies .. toctree:: :caption: API Reference :maxdepth: 1 api/Datasets api/Models api/Strategies