Libcll

What is Libcll?

libcll is an extendable Python toolkit designed for conducting complementary-label learning process in a standardized way. This library provides synthetic complementary-label datasets based on specified distributions, state-of-the-art complementary-label learning methods, metrics for complementary-label learning evaluation, and support for multiple complementary-label learning approaches.

Getting Started

In this session, we will showcase an example to kickstart libcll and guide you through the complementary-label learning process to demonstrate the simplicity of CLL implementation with libcll. For the beginners, we recommend SCL-NL method due to its simplicity and speed, along with CLMNIST dataset for its quick learning curve.

Installation

  • Python version >= 3.8, <= 3.12

  • Pytorch version >= 1.11, <= 2.0

  • Pytorch Lightning version >= 2.0

  • To install libcll and develop locally:

git clone git@github.com:ntucllab/libcll.git
cd libcll
pip install -e .

Data Preparation

First, the CLMNIST dataset can be directly imported from libcll. However, in the aid of diverse complementary-label distribution settings, the user need to execute function libcll.datasets.CLBaseDataset.gen_complementary_target(num_cl, Q), where num_cl and Q represent number of complementary labels for each instance and class transition probability matrix respectively.

from torch.utils.data import random_split, DataLoader
from libcll.datasets import CLMNIST
from libcll.datasets.utils import collate_fn_multi_label

train_set = CLMNIST(root="./data/mnist", train=True)
test_set = CLMNIST(root="./data/mnist", train=False)
train_set.gen_complementary_target()
input_dim = train_set.input_dim
num_classes = train_set.num_classes

batch_size = 256
train_set, valid_set = random_split(train_set, [0.9, 0.1])
train_loader = DataLoader(train_set, batch_size=batch_size, collate_fn=collate_fn_multi_label, shuffle=True, num_workers=4)
valid_loader = DataLoader(valid_set, batch_size=batch_size, collate_fn=collate_fn_multi_label, shuffle=False, num_workers=4)
test_loader = DataLoader(test_set, batch_size=batch_size, shuffle=False, num_workers=4)

Build Model and Strategy

libcll provides easy access to well-known complementary-label learning methods. To mimic real-world scenarios, we assume the validation set contains only complementary labels and thus use SCEL loss instead of accuracy as the validation metric.

from libcll.models import MLP
from libcll.strategies import SCL

model = MLP(input_dim, 512, num_classes)
strategy = SCL(
   model=model,
   valid_type="SCEL",
   num_classes=num_classes,
   type="NL",
   lr=1e-4,
)

Start Training

To train the model, we leverage PyTorch Lightning for easy-to-build training pipeline, GPU acceleration and callbacks such as early stopping, model checkpointing, and logging to TensorBoard.

import pytorch_lightning as pl

trainer = pl.Trainer(
   max_epochs=300,
   accelerator="gpu",
)
trainer.fit(
   strategy,
   train_dataloaders=train_loader,
   val_dataloaders=valid_loader,
)
trainer.test(
   dataloaders=test_loader,
)