backprop.tasks

backprop.tasks.base

class Task(model, local=False, api_key=None, task: Optional[str] = None, device: Optional[str] = None, models: Optional[Dict] = None, default_local_model: Optional[str] = None, local_aliases: Optional[Dict] = None)[source]

Bases: pytorch_lightning.core.lightning.LightningModule

Base Task superclass used to implement new tasks.

model

Model name string for the task in use.

local

Run locally. Defaults to False.

api_key

Backprop API key for non-local inference.

device

Device to run inference on. Defaults to “cuda” if available.

models

All supported models for a given task (pulls from config).

default_local_model

Which model the task will default to if initialized with none provided: Defined per-task.

configure_optimizers()[source]

Sets up optimizers for model. Must be defined in task: no base default.

finetune(dataset=None, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, dataset_train: Optional[torch.utils.data.dataset.Dataset] = None, dataset_valid: Optional[torch.utils.data.dataset.Dataset] = None, step=None, configure_optimizers=None)[source]
save(name: str, description: Optional[str] = None, details: Optional[Dict] = None)[source]

Saves the model used by task to ~/.cache/backprop/name

Parameters
  • name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.

  • description – String description of the model.

  • details – Valid json dictionary of additional details about the model

step(batch, batch_idx)[source]

Implemented per-task, passes batch into model and returns loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

train_dataloader()[source]

Returns a default dataloader of training data.

training: bool
training_step(batch, batch_idx)[source]

Performs the step function with training data and gets training loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

upload(name: str, description: Optional[str] = None, details: Optional[Dict] = None, api_key: Optional[str] = None)[source]

Saves the model used by task to ~/.cache/backprop/name and deploys to backprop

Parameters
  • name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.

  • description – String description of the model.

  • details – Valid json dictionary of additional details about the model

  • api_key – Backprop API key

val_dataloader()[source]

Returns a default dataloader of validation data.

validation_step(batch, batch_idx)[source]

Performs the step function with validation data and gets validation loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

backprop.tasks.emotion

class Emotion(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for emotion detection.

model
  1. Model name

  2. Model name on Backprop’s emotion endpoint

  3. Model object that implements the emotion task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(text: Union[str, List[str]])[source]

Perform emotion detection on input text.

Parameters

text – string or list of strings to detect emotion from keep this under a few sentences for best performance.

Returns

Emotion string or list of emotion strings.

configure_optimizers()[source]

Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a generative model for sentiment detection.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters
  • params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.

  • max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.

  • max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.

  • validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

emote = backprop.Emotion()

# Provide sentiment data for training
inp = ["I really liked the service I received!", "Meh, it was not impressive."]
out = ["positive", "negative"]
params = {"input_text": inp, "output_text": out}

# Finetune
emote.finetune(params)
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]

Performs a training step and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.image_classification

class ImageClassification(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for image classification.

model
  1. Model name

  2. Model name on Backprop’s image-classification endpoint

  3. Model object that implements the image-classification task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(image: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]

Classify image according to given labels.

Parameters
  • image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.

  • labels – list of strings or list of labels (for zero shot classification)

  • top_k – return probabilities only for top_k predictions. Use 0 to get all.

Returns

dict where each key is a label and value is probability between 0 and 1 or list of dicts

configure_optimizers()[source]

Returns default optimizer for image classification (SGD, learning rate 1e-1, weight decay 1e-4)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'single_label', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for image classification.

Parameters
  • params – Dictionary of model inputs. Contains ‘images’ and ‘labels’ keys, with values as lists of images/labels.

  • validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.

  • variant – Determines whether to do single or multi-label classification: “single_label” (default) or “multi_label”

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

ic = backprop.ImageClassification()

# Prep training images/labels. Labels are automatically used to set up model with number of classes for classification.
images = ["images/beagle/photo.jpg", "images/dachsund/photo.jpg", "images/malamute/photo.jpg"]
labels = ["beagle", "dachsund", "malamute"]
params = {"images": images, "labels": labels}

# Finetune
ic.finetune(params, variant="single_label")
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step_multi_label(batch, batch_idx)[source]

Performs a training step for multi-label classification and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

step_single_label(batch, batch_idx)[source]

Performs a training step for single-label classification and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.image_text_vectorisation

class ImageTextVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for combined image-text vectorisation.

model
  1. Model name

  2. Model name on Backprop’s image-text-vectorisation endpoint

  3. Model object that implements the image-text-vectorisation task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(image: Union[str, List[str]], text: Union[str, List[str]], return_tensor=False)[source]

Vectorise input image and text pairs.

Parameters
  • image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.

  • text – text or list of text to vectorise. Must match image ordering.

Returns

Vector or list of vectors

configure_optimizers()[source]

Returns default optimizer for image-text vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for combined image & text vectorisation. Includes different variants for calculating loss.

Parameters
  • params – Dictionary of model inputs. If using triplet variant, contains keys “texts”, “images”, and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, “imgs1”, “imgs2”, and “similarity_scores”.

  • validation_split – Float between 0 and 1 that determines percentage of data to use for validation.

  • variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

itv = backprop.ImageTextVectorisation()

# Prep training data & finetune (triplet variant)
images = ["product_images/crowbars/photo.jpg", "product_images/crowbars/photo1.jpg", "product_images/mugs/photo.jpg"]
texts = ["Steel crowbar with angled beak, 300mm", "Crowbar tempered steel 300m angled", "Sturdy ceramic mug, microwave-safe"]
groups = [0, 0, 1]
params = {"images": images, "texts": texts, "groups": groups}

itv.finetune(params, variant="triplet")

# Prep training data & finetune (cosine_similarity variant)
imgs1 = ["product_images/crowbars/photo.jpg", "product_images/mugs/photo.jpg"]
texts1 = ["Steel crowbar with angled beak, 300mm", "Sturdy ceramic mug, microwave-safe"]
imgs2 = ["product_images/crowbars/photo1.jpg", "product_images/hats/photo.jpg]
texts2 = ["Crowbar tempered steel 300m angled", "Dad hat with funny ghost picture on the front"]
similarity_scores = [1.0, 0.0]
params = {"imgs1": imgs1, "imgs2": imgs2, "texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores}

itv.finetune(params, variant="cosine_similarity")
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]

Performs a training step and calculates cosine similarity loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]

Performs a training step and calculates triplet loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

train_dataloader_triplet()[source]

Returns training dataloader with triplet loss sampling strategy.

training: bool
val_dataloader_triplet()[source]

Returns validation dataloader with triplet loss sampling strategy.

backprop.tasks.image_vectorisation

class ImageVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for image vectorisation.

model
  1. Model name

  2. Model name on Backprop’s image-vectorisation endpoint

  3. Model object that implements the image-vectorisation task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(image: Union[str, PIL.Image.Image, List[str], List[PIL.Image.Image]], return_tensor=False)[source]

Vectorise input image.

Parameters

image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.

Returns

Vector or list of vectors

configure_optimizers()[source]

Returns default optimizer for image vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for image vectorisation. Includes different variants for calculating loss.

Parameters
  • params – Dictionary of model inputs. If using triplet variant, contains keys “images” and “groups”. If using cosine_similarity variant, contains keys “imgs1”, “imgs2”, and “similarity_scores”.

  • validation_split – Float between 0 and 1 that determines percentage of data to use for validation.

  • variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

iv = backprop.ImageVectorisation()

# Set up training data & finetune (triplet variant)
images = ["images/beagle/photo.jpg",  "images/shiba_inu/photo.jpg", "images/beagle/photo1.jpg", "images/malamute/photo.jpg"]
groups = [0, 1, 0, 2]
params = {"images": images, "groups": groups}

iv.finetune(params, variant="triplet")

# Set up training data & finetune (cosine_similarity variant)
imgs1 = ["images/beagle/photo.jpg", "images/shiba_inu/photo.jpg"]
imgs2 = ["images/beagle/photo1.jpg", "images/malamute/photo.jpg"]
similarity_scores = [1.0, 0.0]
params = {"imgs1": imgs1, "imgs2": imgs2, "similarity_scores": similarity_scores}

iv.finetune(params, variant="cosine_similarity")
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]

Performs a training step and calculates cosine similarity loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]

Performs a training step and calculates triplet loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

train_dataloader_triplet()[source]

Returns training dataloader with triplet loss sampling strategy.

training: bool
val_dataloader_triplet()[source]

Returns validation dataloader with triplet loss sampling strategy.

backprop.tasks.qa

class QA(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for Question Answering.

model
  1. Model name

  2. Model name on Backprop’s qa endpoint

  3. Model object that implements the qa task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(question: Union[str, List[str]], context: Union[str, List[str]], prev_qa: Union[List[Tuple[str, str]], List[List[Tuple[str, str]]]] = [])[source]

Perform QA, either on docstore or on provided context.

Parameters
  • question – Question (string or list of strings) for qa model.

  • context – Context (string or list of strings) to ask question from.

  • prev_qa (optional) – List of previous question, answer tuples or list of prev_qa.

Returns

Answer string or list of answer strings

configure_optimizers()[source]

Returns default optimizer for Q&A (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for Q&A tasks.

Parameters
  • params – dictionary of lists: ‘questions’, ‘answers’, ‘contexts’. Optionally includes ‘prev_qas’: list of lists containing (q, a) tuples to prepend to context.

  • max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.

  • max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.

  • validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.

  • epochs – Integer specifying how many training iterations to run

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

# Initialise task
qa = backprop.QA()

# Set up training data for QA. Note that repeated contexts are needed, along with empty prev_qas to match.
# Input must be completely 1:1, each question has an associated answer, context, and prev_qa (if prev_qa is to be used).
questions = ["What's Backprop?", "What language is it in?", "When was the Moog synthesizer invented?"]
answers = ["A library that trains models", "Python", "1964"]
contexts = ["Backprop is a Python library that makes training and using models easier.",
            "Backprop is a Python library that makes training and using models easier.",
            "Bob Moog was a physicist. He invented the Moog synthesizer in 1964."]

prev_qas = [[],
            [("What's Backprop?", "A library that trains models")],
            []]

params = {"questions": questions,
          "answers": answers,
          "contexts": contexts,
          "prev_qas": prev_qas}

# Finetune
qa.finetune(params=params)
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]

Performs a training step and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.summarisation

class Summarisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for summarisation.

model
  1. Model name

  2. Model name on Backprop’s summarisation endpoint

  3. Model object that implements the summarisation task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(text: Union[str, List[str]])[source]

Perform summarisation on input text.

Parameters

text – string or list of strings to be summarised - keep each string below 500 words.

Returns

Summary string or list of summary strings.

configure_optimizers()[source]

Returns default optimizer for summarisation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 512, max_output_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a generative model for summarisation.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters
  • params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.

  • max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.

  • max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.

  • validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation

  • epochs – Integer specifying how many training iterations to run

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

summary = backprop.Summarisation()

# Provide training data for task
inp = ["This is a long news article about recent political happenings.", "This is an article about some recent scientific research."]
out = ["Short political summary.", "Short scientific summary."]
params = {"input_text": inp, "output_text": out}

# Finetune
summary.finetune(params)
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]

Performs a training step and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.text_classification

class TextClassification(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for classification.

model
  1. Model name

  2. Model name on Backprop’s text-classification endpoint

  3. Model object that implements the text-classification task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(text: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]

Classify input text based on previous training (user-tuned models) or according to given list of labels (zero-shot)

Parameters
  • text – string or list of strings to be classified

  • labels – list of labels for zero-shot classification (on our out-of-the-box models). If using a user-trained model (e.g. XLNet), this is not used.

  • top_k – return probabilities only for top_k predictions. Use 0 to get all.

Returns

dict where each key is a label and value is probability between 0 and 1, or list of dicts.

configure_optimizers()[source]

Returns default optimizer for text classification (AdamW, learning rate 2e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a text classification model on provided data.

Parameters
  • params – Dict containing keys “texts” and “labels”, with values being input/output data lists.

  • validation_split – Float between 0 and 1 that determines percentage of data to use for validation.

  • max_length – Int determining the maximum token length of input strings.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tc = backprop.TextCLassification()

# Set up input data. Labels will automatically be used to set up model with number of classes for classification.
inp = ["This is a political news article", "This is a computer science research paper", "This is a movie review"]
out = ["Politics", "Science", "Entertainment"]
params = {"texts": inp, "labels": out}

# Finetune
tc.finetune(params)
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]

Performs a training step and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.text_generation

class TextGeneration(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for text generation.

model
  1. Model name

  2. Model name on Backprop’s text-generation endpoint

  3. Model object that implements the text-generation task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(text: Union[str, List[str]], min_length: Optional[int] = None, max_length: Optional[int] = None, temperature: Optional[float] = None, top_k: Optional[int] = None, top_p: Optional[float] = None, repetition_penalty: Optional[float] = None, length_penalty: Optional[float] = None, num_beams: Optional[int] = None, num_generations: Optional[int] = None, do_sample: Optional[bool] = None)[source]

Generates text to continue from the given input.

Parameters
  • input_text (string) – Text from which the model will begin generating.

  • min_length (int) – Minimum number of tokens to generate (1 token ~ 1 word).

  • max_length (int) – Maximum number of tokens to generate (1 token ~ 1 word).

  • temperature (float) – Value that alters the randomness of generation (0.0 is no randomness, higher values introduce randomness. 0.5 - 0.7 is a good starting point).

  • top_k (int) – Only choose from the top_k tokens when generating (0 is no limit).

  • top_p (float) – Only choose from the top tokens with combined probability greater than top_p.

  • repetition_penalty (float) – Penalty to be applied to tokens present in the input_text and tokens already generated in the sequence (>1 discourages repetition while <1 encourages).

  • length_penalty (float) – Penalty applied to overall sequence length. Set >1 for longer sequences, or <1 for shorter ones.

  • num_beams (int) – Number of beams to be used in beam search. Does a number of generations to pick the best one. (1: no beam search)

  • num_generations (int) – How many times to run generation. Results are returned as a list.

  • do_sample (bool) – Whether or not sampling strategies (temperature, top_k, top_p) should be used.

Example:

import backprop

tg = backprop.TextGeneration()
tg("Geralt knew the sings, the monster was a", min_length=20, max_length=50, temperature=0.7)
> " real danger, and he was the only one in the village who knew how to defend himself."
configure_optimizers()[source]

Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 128, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for a text generation task.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters
  • params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.

  • max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.

  • max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.

  • validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tg = backprop.TextGeneration()

# Any text works as training data
inp = ["I really liked the service I received!", "Meh, it was not impressive."]
out = ["positive", "negative"]
params = {"input_text": inp, "output_text": out}

# Finetune
tg.finetune(params)
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]

Performs a training step and returns loss.

Parameters
  • batch – Batch output from the dataloader

  • batch_idx – Batch index.

training: bool

backprop.tasks.text_vectorisation

class TextVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]

Bases: backprop.tasks.base.Task

Task for text vectorisation.

model
  1. Model name

  2. Model name on Backprop’s text-vectorisation endpoint

  3. Model object that implements the text-vectorisation task

local

Run locally. Defaults to False

Type

optional

api_key

Backprop API key for non-local inference

Type

optional

device

Device to run inference on. Defaults to “cuda” if available.

Type

optional

__call__(text: Union[str, List[str]], return_tensor=False)[source]

Vectorise input text.

Parameters

text – string or list of strings to vectorise. Can be both PIL Image objects or paths to images.

Returns

Vector or list of vectors

configure_optimizers()[source]

Returns default optimizer for text vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: Optional[int] = None, variant: str = 'cosine_similarity', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]

Finetunes a model for text vectorisation. Includes different variants for calculating loss.

Parameters
  • params – Dictionary of model inputs. If using triplet variant, contains keys “texts” and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, and “similarity_scores”.

  • validation_split – Float between 0 and 1 that determines percentage of data to use for validation.

  • max_length – Int determining the maximum token length of input strings.

  • variant – How loss will be calculated: “cosine_similarity” (default) or “triplet”.

  • epochs – Integer specifying how many training iterations to run.

  • batch_size – Batch size when training. Leave as None to automatically determine batch size.

  • optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.

  • early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.

  • train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.

  • val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.

  • step – Function determining how to call model for a training step. Defaults to step defined in this task class.

  • configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tv = backprop.TextVectorisation()

# Set up training data & finetune (cosine_similarity variant)
texts1 = ["I went to the store and bought some bread", "I am getting a cat soon"]
texts2 = ["I bought bread from the store", "I took my dog for a walk"]
similarity_scores = [1.0, 0.0]
params = {"texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores}

tv.finetune(params, variant="cosine_similarity")

# Set up training data & finetune (triplet variant)
texts = ["I went to the store and bought some bread", "I bought bread from the store", "I'm going to go walk my dog"]
groups = [0, 0, 1]
params = {"texts": texts, "groups": groups}

tv.finetune(params, variant="triplet")
static list_models(return_dict=False, display=False, limit=None)[source]

Returns the list of models that can be used and finetuned with this task.

Parameters
  • return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.

  • display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).

  • limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]

Performs a training step and calculates cosine similarity loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]

Performs a training step and calculates triplet loss.

Parameters
  • batch – Batch output from dataloader.

  • batch_idx – Batch index.

train_dataloader_triplet()[source]

Returns training dataloader with triplet loss sampling strategy.

training: bool
val_dataloader_triplet()[source]

Returns validation dataloader with triplet loss sampling strategy.