backprop.tasks¶

backprop.tasks.base¶

class Task(model, local=False, api_key=None, task: Optional[str] = None, device: Optional[str] = None, models: Optional[Dict] = None, default_local_model: Optional[str] = None, local_aliases: Optional[Dict] = None)[source]¶

Bases: pytorch_lightning.core.lightning.LightningModule

Base Task superclass used to implement new tasks.

model¶: Model name string for the task in use.

local¶: Run locally. Defaults to False.

api_key¶: Backprop API key for non-local inference.

device¶: Device to run inference on. Defaults to “cuda” if available.

models¶: All supported models for a given task (pulls from config).

default_local_model¶: Which model the task will default to if initialized with none provided: Defined per-task.

configure_optimizers()[source]¶: Sets up optimizers for model. Must be defined in task: no base default.

finetune(dataset=None, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, dataset_train: Optional[torch.utils.data.dataset.Dataset] = None, dataset_valid: Optional[torch.utils.data.dataset.Dataset] = None, step=None, configure_optimizers=None)[source]¶

save(name: str, description: Optional[str] = None, details: Optional[Dict] = None)[source]¶

Saves the model used by task to ~/.cache/backprop/name

Parameters

name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.
description – String description of the model.
details – Valid json dictionary of additional details about the model

step(batch, batch_idx)[source]¶

Implemented per-task, passes batch into model and returns loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

train_dataloader()[source]¶: Returns a default dataloader of training data.

training: bool¶

training_step(batch, batch_idx)[source]¶

Performs the step function with training data and gets training loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

upload(name: str, description: Optional[str] = None, details: Optional[Dict] = None, api_key: Optional[str] = None)[source]¶

Saves the model used by task to ~/.cache/backprop/name and deploys to backprop

Parameters

name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.
description – String description of the model.
details – Valid json dictionary of additional details about the model
api_key – Backprop API key

val_dataloader()[source]¶: Returns a default dataloader of validation data.

validation_step(batch, batch_idx)[source]¶

Performs the step function with validation data and gets validation loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

backprop.tasks.emotion¶

class Emotion(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for emotion detection.

model¶

Model name
Model name on Backprop’s emotion endpoint
Model object that implements the emotion task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(text: Union[str, List[str]])[source]¶

Perform emotion detection on input text.

Parameters: text – string or list of strings to detect emotion from keep this under a few sentences for best performance.
Returns: Emotion string or list of emotion strings.

configure_optimizers()[source]¶: Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a generative model for sentiment detection.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters

params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

emote = backprop.Emotion()

# Provide sentiment data for training
inp = ["I really liked the service I received!", "Meh, it was not impressive."]
out = ["positive", "negative"]
params = {"input_text": inp, "output_text": out}

# Finetune
emote.finetune(params)

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]¶

Performs a training step and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.image_classification¶

class ImageClassification(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for image classification.

model¶

Model name
Model name on Backprop’s image-classification endpoint
Model object that implements the image-classification task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(image: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]¶

Classify image according to given labels.

Parameters

image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
labels – list of strings or list of labels (for zero shot classification)
top_k – return probabilities only for top_k predictions. Use 0 to get all.

Returns

dict where each key is a label and value is probability between 0 and 1 or list of dicts

configure_optimizers()[source]¶: Returns default optimizer for image classification (SGD, learning rate 1e-1, weight decay 1e-4)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'single_label', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for image classification.

Parameters

params – Dictionary of model inputs. Contains ‘images’ and ‘labels’ keys, with values as lists of images/labels.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
variant – Determines whether to do single or multi-label classification: “single_label” (default) or “multi_label”
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

ic = backprop.ImageClassification()

# Prep training images/labels. Labels are automatically used to set up model with number of classes for classification.
images = ["images/beagle/photo.jpg", "images/dachsund/photo.jpg", "images/malamute/photo.jpg"]
labels = ["beagle", "dachsund", "malamute"]
params = {"images": images, "labels": labels}

# Finetune
ic.finetune(params, variant="single_label")

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step_multi_label(batch, batch_idx)[source]¶

Performs a training step for multi-label classification and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

step_single_label(batch, batch_idx)[source]¶

Performs a training step for single-label classification and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.image_text_vectorisation¶

class ImageTextVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for combined image-text vectorisation.

model¶

Model name
Model name on Backprop’s image-text-vectorisation endpoint
Model object that implements the image-text-vectorisation task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(image: Union[str, List[str]], text: Union[str, List[str]], return_tensor=False)[source]¶

Vectorise input image and text pairs.

Parameters

image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
text – text or list of text to vectorise. Must match image ordering.

Returns

Vector or list of vectors

configure_optimizers()[source]¶: Returns default optimizer for image-text vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for combined image & text vectorisation. Includes different variants for calculating loss.

Parameters

params – Dictionary of model inputs. If using triplet variant, contains keys “texts”, “images”, and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, “imgs1”, “imgs2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

itv = backprop.ImageTextVectorisation()

# Prep training data & finetune (triplet variant)
images = ["product_images/crowbars/photo.jpg", "product_images/crowbars/photo1.jpg", "product_images/mugs/photo.jpg"]
texts = ["Steel crowbar with angled beak, 300mm", "Crowbar tempered steel 300m angled", "Sturdy ceramic mug, microwave-safe"]
groups = [0, 0, 1]
params = {"images": images, "texts": texts, "groups": groups}

itv.finetune(params, variant="triplet")

# Prep training data & finetune (cosine_similarity variant)
imgs1 = ["product_images/crowbars/photo.jpg", "product_images/mugs/photo.jpg"]
texts1 = ["Steel crowbar with angled beak, 300mm", "Sturdy ceramic mug, microwave-safe"]
imgs2 = ["product_images/crowbars/photo1.jpg", "product_images/hats/photo.jpg]
texts2 = ["Crowbar tempered steel 300m angled", "Dad hat with funny ghost picture on the front"]
similarity_scores = [1.0, 0.0]
params = {"imgs1": imgs1, "imgs2": imgs2, "texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores}

itv.finetune(params, variant="cosine_similarity")

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]¶

Performs a training step and calculates cosine similarity loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]¶

Performs a training step and calculates triplet loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

train_dataloader_triplet()[source]¶: Returns training dataloader with triplet loss sampling strategy.

training: bool¶

val_dataloader_triplet()[source]¶: Returns validation dataloader with triplet loss sampling strategy.

backprop.tasks.image_vectorisation¶

class ImageVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for image vectorisation.

model¶

Model name
Model name on Backprop’s image-vectorisation endpoint
Model object that implements the image-vectorisation task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(image: Union[str, PIL.Image.Image, List[str], List[PIL.Image.Image]], return_tensor=False)[source]¶

Vectorise input image.

Parameters: image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
Returns: Vector or list of vectors

configure_optimizers()[source]¶: Returns default optimizer for image vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for image vectorisation. Includes different variants for calculating loss.

Parameters

params – Dictionary of model inputs. If using triplet variant, contains keys “images” and “groups”. If using cosine_similarity variant, contains keys “imgs1”, “imgs2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

iv = backprop.ImageVectorisation()

# Set up training data & finetune (triplet variant)
images = ["images/beagle/photo.jpg",  "images/shiba_inu/photo.jpg", "images/beagle/photo1.jpg", "images/malamute/photo.jpg"]
groups = [0, 1, 0, 2]
params = {"images": images, "groups": groups}

iv.finetune(params, variant="triplet")

# Set up training data & finetune (cosine_similarity variant)
imgs1 = ["images/beagle/photo.jpg", "images/shiba_inu/photo.jpg"]
imgs2 = ["images/beagle/photo1.jpg", "images/malamute/photo.jpg"]
similarity_scores = [1.0, 0.0]
params = {"imgs1": imgs1, "imgs2": imgs2, "similarity_scores": similarity_scores}

iv.finetune(params, variant="cosine_similarity")

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]¶

Performs a training step and calculates cosine similarity loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]¶

Performs a training step and calculates triplet loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

train_dataloader_triplet()[source]¶: Returns training dataloader with triplet loss sampling strategy.

training: bool¶

val_dataloader_triplet()[source]¶: Returns validation dataloader with triplet loss sampling strategy.

backprop.tasks.qa¶

class QA(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for Question Answering.

model¶

Model name
Model name on Backprop’s qa endpoint
Model object that implements the qa task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(question: Union[str, List[str]], context: Union[str, List[str]], prev_qa: Union[List[Tuple[str, str]], List[List[Tuple[str, str]]]] = [])[source]¶

Perform QA, either on docstore or on provided context.

Parameters

question – Question (string or list of strings) for qa model.
context – Context (string or list of strings) to ask question from.
prev_qa (optional) – List of previous question, answer tuples or list of prev_qa.

Returns

Answer string or list of answer strings

configure_optimizers()[source]¶: Returns default optimizer for Q&A (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for Q&A tasks.

Parameters

params – dictionary of lists: ‘questions’, ‘answers’, ‘contexts’. Optionally includes ‘prev_qas’: list of lists containing (q, a) tuples to prepend to context.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

# Initialise task
qa = backprop.QA()

# Set up training data for QA. Note that repeated contexts are needed, along with empty prev_qas to match.
# Input must be completely 1:1, each question has an associated answer, context, and prev_qa (if prev_qa is to be used).
questions = ["What's Backprop?", "What language is it in?", "When was the Moog synthesizer invented?"]
answers = ["A library that trains models", "Python", "1964"]
contexts = ["Backprop is a Python library that makes training and using models easier.",
            "Backprop is a Python library that makes training and using models easier.",
            "Bob Moog was a physicist. He invented the Moog synthesizer in 1964."]

prev_qas = [[],
            [("What's Backprop?", "A library that trains models")],
            []]

params = {"questions": questions,
          "answers": answers,
          "contexts": contexts,
          "prev_qas": prev_qas}

# Finetune
qa.finetune(params=params)

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]¶

Performs a training step and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.summarisation¶

class Summarisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for summarisation.

model¶

Model name
Model name on Backprop’s summarisation endpoint
Model object that implements the summarisation task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(text: Union[str, List[str]])[source]¶

Perform summarisation on input text.

Parameters: text – string or list of strings to be summarised - keep each string below 500 words.
Returns: Summary string or list of summary strings.

configure_optimizers()[source]¶: Returns default optimizer for summarisation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 512, max_output_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a generative model for summarisation.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters

params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation
epochs – Integer specifying how many training iterations to run
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

summary = backprop.Summarisation()

# Provide training data for task
inp = ["This is a long news article about recent political happenings.", "This is an article about some recent scientific research."]
out = ["Short political summary.", "Short scientific summary."]
params = {"input_text": inp, "output_text": out}

# Finetune
summary.finetune(params)

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]¶

Performs a training step and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.text_classification¶

class TextClassification(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for classification.

model¶

Model name
Model name on Backprop’s text-classification endpoint
Model object that implements the text-classification task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(text: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]¶

Classify input text based on previous training (user-tuned models) or according to given list of labels (zero-shot)

Parameters

text – string or list of strings to be classified
labels – list of labels for zero-shot classification (on our out-of-the-box models). If using a user-trained model (e.g. XLNet), this is not used.
top_k – return probabilities only for top_k predictions. Use 0 to get all.

Returns

dict where each key is a label and value is probability between 0 and 1, or list of dicts.

configure_optimizers()[source]¶: Returns default optimizer for text classification (AdamW, learning rate 2e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a text classification model on provided data.

Parameters

params – Dict containing keys “texts” and “labels”, with values being input/output data lists.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
max_length – Int determining the maximum token length of input strings.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tc = backprop.TextCLassification()

# Set up input data. Labels will automatically be used to set up model with number of classes for classification.
inp = ["This is a political news article", "This is a computer science research paper", "This is a movie review"]
out = ["Politics", "Science", "Entertainment"]
params = {"texts": inp, "labels": out}

# Finetune
tc.finetune(params)

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]¶

Performs a training step and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.text_generation¶

class TextGeneration(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for text generation.

model¶

Model name
Model name on Backprop’s text-generation endpoint
Model object that implements the text-generation task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(text: Union[str, List[str]], min_length: Optional[int] = None, max_length: Optional[int] = None, temperature: Optional[float] = None, top_k: Optional[int] = None, top_p: Optional[float] = None, repetition_penalty: Optional[float] = None, length_penalty: Optional[float] = None, num_beams: Optional[int] = None, num_generations: Optional[int] = None, do_sample: Optional[bool] = None)[source]¶

Generates text to continue from the given input.

Parameters

input_text (string) – Text from which the model will begin generating.
min_length (int) – Minimum number of tokens to generate (1 token ~ 1 word).
max_length (int) – Maximum number of tokens to generate (1 token ~ 1 word).
temperature (float) – Value that alters the randomness of generation (0.0 is no randomness, higher values introduce randomness. 0.5 - 0.7 is a good starting point).
top_k (int) – Only choose from the top_k tokens when generating (0 is no limit).
top_p (float) – Only choose from the top tokens with combined probability greater than top_p.
repetition_penalty (float) – Penalty to be applied to tokens present in the input_text and tokens already generated in the sequence (>1 discourages repetition while <1 encourages).
length_penalty (float) – Penalty applied to overall sequence length. Set >1 for longer sequences, or <1 for shorter ones.
num_beams (int) – Number of beams to be used in beam search. Does a number of generations to pick the best one. (1: no beam search)
num_generations (int) – How many times to run generation. Results are returned as a list.
do_sample (bool) – Whether or not sampling strategies (temperature, top_k, top_p) should be used.

Example:

import backprop

tg = backprop.TextGeneration()
tg("Geralt knew the sings, the monster was a", min_length=20, max_length=50, temperature=0.7)
> " real danger, and he was the only one in the village who knew how to defend himself."

configure_optimizers()[source]¶: Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 128, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for a text generation task.

Note

input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)

Parameters

params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tg = backprop.TextGeneration()

# Any text works as training data
inp = ["I really liked the service I received!", "Meh, it was not impressive."]
out = ["positive", "negative"]
params = {"input_text": inp, "output_text": out}

# Finetune
tg.finetune(params)

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step(batch, batch_idx)[source]¶

Performs a training step and returns loss.

Parameters

batch – Batch output from the dataloader
batch_idx – Batch index.

training: bool¶

backprop.tasks.text_vectorisation¶

class TextVectorisation(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶

Bases: backprop.tasks.base.Task

Task for text vectorisation.

model¶

Model name
Model name on Backprop’s text-vectorisation endpoint
Model object that implements the text-vectorisation task

local¶

Run locally. Defaults to False

Type: optional

api_key¶

Backprop API key for non-local inference

Type: optional

device¶

Device to run inference on. Defaults to “cuda” if available.

Type: optional

__call__(text: Union[str, List[str]], return_tensor=False)[source]¶

Vectorise input text.

Parameters: text – string or list of strings to vectorise. Can be both PIL Image objects or paths to images.
Returns: Vector or list of vectors

configure_optimizers()[source]¶: Returns default optimizer for text vectorisation (AdamW, learning rate 1e-5)

finetune(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: Optional[int] = None, variant: str = 'cosine_similarity', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶

Finetunes a model for text vectorisation. Includes different variants for calculating loss.

Parameters

params – Dictionary of model inputs. If using triplet variant, contains keys “texts” and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
max_length – Int determining the maximum token length of input strings.
variant – How loss will be calculated: “cosine_similarity” (default) or “triplet”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.

Examples:

import backprop

tv = backprop.TextVectorisation()

# Set up training data & finetune (cosine_similarity variant)
texts1 = ["I went to the store and bought some bread", "I am getting a cat soon"]
texts2 = ["I bought bread from the store", "I took my dog for a walk"]
similarity_scores = [1.0, 0.0]
params = {"texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores}

tv.finetune(params, variant="cosine_similarity")

# Set up training data & finetune (triplet variant)
texts = ["I went to the store and bought some bread", "I bought bread from the store", "I'm going to go walk my dog"]
groups = [0, 0, 1]
params = {"texts": texts, "groups": groups}

tv.finetune(params, variant="triplet")

static list_models(return_dict=False, display=False, limit=None)[source]¶

Returns the list of models that can be used and finetuned with this task.

Parameters

return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.

step_cosine(batch, batch_idx)[source]¶

Performs a training step and calculates cosine similarity loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

step_triplet(batch, batch_idx)[source]¶

Performs a training step and calculates triplet loss.

Parameters

batch – Batch output from dataloader.
batch_idx – Batch index.

train_dataloader_triplet()[source]¶: Returns training dataloader with triplet loss sampling strategy.

training: bool¶

val_dataloader_triplet()[source]¶: Returns validation dataloader with triplet loss sampling strategy.