backprop.tasks¶
backprop.tasks.base¶
-
class
Task
(model, local=False, api_key=None, task: Optional[str] = None, device: Optional[str] = None, models: Optional[Dict] = None, default_local_model: Optional[str] = None, local_aliases: Optional[Dict] = None)[source]¶ Bases:
pytorch_lightning.core.lightning.LightningModule
Base Task superclass used to implement new tasks.
-
model
¶ Model name string for the task in use.
-
local
¶ Run locally. Defaults to False.
-
api_key
¶ Backprop API key for non-local inference.
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
-
models
¶ All supported models for a given task (pulls from config).
-
default_local_model
¶ Which model the task will default to if initialized with none provided: Defined per-task.
-
configure_optimizers
()[source]¶ Sets up optimizers for model. Must be defined in task: no base default.
-
finetune
(dataset=None, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, dataset_train: Optional[torch.utils.data.dataset.Dataset] = None, dataset_valid: Optional[torch.utils.data.dataset.Dataset] = None, step=None, configure_optimizers=None)[source]¶
-
save
(name: str, description: Optional[str] = None, details: Optional[Dict] = None)[source]¶ Saves the model used by task to
~/.cache/backprop/name
- Parameters
name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.
description – String description of the model.
details – Valid json dictionary of additional details about the model
-
step
(batch, batch_idx)[source]¶ Implemented per-task, passes batch into model and returns loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
training
: bool¶
-
training_step
(batch, batch_idx)[source]¶ Performs the step function with training data and gets training loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
upload
(name: str, description: Optional[str] = None, details: Optional[Dict] = None, api_key: Optional[str] = None)[source]¶ Saves the model used by task to
~/.cache/backprop/name
and deploys to backprop- Parameters
name – string identifier for the model. Lowercase letters and numbers. No spaces/special characters except dashes.
description – String description of the model.
details – Valid json dictionary of additional details about the model
api_key – Backprop API key
-
backprop.tasks.emotion¶
-
class
Emotion
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for emotion detection.
-
model
¶ Model name
Model name on Backprop’s emotion endpoint
Model object that implements the emotion task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(text: Union[str, List[str]])[source]¶ Perform emotion detection on input text.
- Parameters
text – string or list of strings to detect emotion from keep this under a few sentences for best performance.
- Returns
Emotion string or list of emotion strings.
-
configure_optimizers
()[source]¶ Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a generative model for sentiment detection.
Note
input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)
- Parameters
params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop emote = backprop.Emotion() # Provide sentiment data for training inp = ["I really liked the service I received!", "Meh, it was not impressive."] out = ["positive", "negative"] params = {"input_text": inp, "output_text": out} # Finetune emote.finetune(params)
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step
(batch, batch_idx)[source]¶ Performs a training step and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.image_classification¶
-
class
ImageClassification
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for image classification.
-
model
¶ Model name
Model name on Backprop’s image-classification endpoint
Model object that implements the image-classification task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(image: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]¶ Classify image according to given labels.
- Parameters
image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
labels – list of strings or list of labels (for zero shot classification)
top_k – return probabilities only for top_k predictions. Use 0 to get all.
- Returns
dict where each key is a label and value is probability between 0 and 1 or list of dicts
-
configure_optimizers
()[source]¶ Returns default optimizer for image classification (SGD, learning rate 1e-1, weight decay 1e-4)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'single_label', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for image classification.
- Parameters
params – Dictionary of model inputs. Contains ‘images’ and ‘labels’ keys, with values as lists of images/labels.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
variant – Determines whether to do single or multi-label classification: “single_label” (default) or “multi_label”
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop ic = backprop.ImageClassification() # Prep training images/labels. Labels are automatically used to set up model with number of classes for classification. images = ["images/beagle/photo.jpg", "images/dachsund/photo.jpg", "images/malamute/photo.jpg"] labels = ["beagle", "dachsund", "malamute"] params = {"images": images, "labels": labels} # Finetune ic.finetune(params, variant="single_label")
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step_multi_label
(batch, batch_idx)[source]¶ Performs a training step for multi-label classification and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
step_single_label
(batch, batch_idx)[source]¶ Performs a training step for single-label classification and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.image_text_vectorisation¶
-
class
ImageTextVectorisation
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for combined image-text vectorisation.
-
model
¶ Model name
Model name on Backprop’s image-text-vectorisation endpoint
Model object that implements the image-text-vectorisation task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(image: Union[str, List[str]], text: Union[str, List[str]], return_tensor=False)[source]¶ Vectorise input image and text pairs.
- Parameters
image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
text – text or list of text to vectorise. Must match image ordering.
- Returns
Vector or list of vectors
-
configure_optimizers
()[source]¶ Returns default optimizer for image-text vectorisation (AdamW, learning rate 1e-5)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for combined image & text vectorisation. Includes different variants for calculating loss.
- Parameters
params – Dictionary of model inputs. If using triplet variant, contains keys “texts”, “images”, and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, “imgs1”, “imgs2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop itv = backprop.ImageTextVectorisation() # Prep training data & finetune (triplet variant) images = ["product_images/crowbars/photo.jpg", "product_images/crowbars/photo1.jpg", "product_images/mugs/photo.jpg"] texts = ["Steel crowbar with angled beak, 300mm", "Crowbar tempered steel 300m angled", "Sturdy ceramic mug, microwave-safe"] groups = [0, 0, 1] params = {"images": images, "texts": texts, "groups": groups} itv.finetune(params, variant="triplet") # Prep training data & finetune (cosine_similarity variant) imgs1 = ["product_images/crowbars/photo.jpg", "product_images/mugs/photo.jpg"] texts1 = ["Steel crowbar with angled beak, 300mm", "Sturdy ceramic mug, microwave-safe"] imgs2 = ["product_images/crowbars/photo1.jpg", "product_images/hats/photo.jpg] texts2 = ["Crowbar tempered steel 300m angled", "Dad hat with funny ghost picture on the front"] similarity_scores = [1.0, 0.0] params = {"imgs1": imgs1, "imgs2": imgs2, "texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores} itv.finetune(params, variant="cosine_similarity")
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step_cosine
(batch, batch_idx)[source]¶ Performs a training step and calculates cosine similarity loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
step_triplet
(batch, batch_idx)[source]¶ Performs a training step and calculates triplet loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
train_dataloader_triplet
()[source]¶ Returns training dataloader with triplet loss sampling strategy.
-
training
: bool¶
-
backprop.tasks.image_vectorisation¶
-
class
ImageVectorisation
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for image vectorisation.
-
model
¶ Model name
Model name on Backprop’s image-vectorisation endpoint
Model object that implements the image-vectorisation task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(image: Union[str, PIL.Image.Image, List[str], List[PIL.Image.Image]], return_tensor=False)[source]¶ Vectorise input image.
- Parameters
image – image or list of images to vectorise. Can be both PIL Image objects or paths to images.
- Returns
Vector or list of vectors
-
configure_optimizers
()[source]¶ Returns default optimizer for image vectorisation (AdamW, learning rate 1e-5)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, variant: str = 'triplet', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for image vectorisation. Includes different variants for calculating loss.
- Parameters
params – Dictionary of model inputs. If using triplet variant, contains keys “images” and “groups”. If using cosine_similarity variant, contains keys “imgs1”, “imgs2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
variant – How loss will be calculated: “triplet” (default) or “cosine_similarity”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop iv = backprop.ImageVectorisation() # Set up training data & finetune (triplet variant) images = ["images/beagle/photo.jpg", "images/shiba_inu/photo.jpg", "images/beagle/photo1.jpg", "images/malamute/photo.jpg"] groups = [0, 1, 0, 2] params = {"images": images, "groups": groups} iv.finetune(params, variant="triplet") # Set up training data & finetune (cosine_similarity variant) imgs1 = ["images/beagle/photo.jpg", "images/shiba_inu/photo.jpg"] imgs2 = ["images/beagle/photo1.jpg", "images/malamute/photo.jpg"] similarity_scores = [1.0, 0.0] params = {"imgs1": imgs1, "imgs2": imgs2, "similarity_scores": similarity_scores} iv.finetune(params, variant="cosine_similarity")
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step_cosine
(batch, batch_idx)[source]¶ Performs a training step and calculates cosine similarity loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
step_triplet
(batch, batch_idx)[source]¶ Performs a training step and calculates triplet loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
train_dataloader_triplet
()[source]¶ Returns training dataloader with triplet loss sampling strategy.
-
training
: bool¶
-
backprop.tasks.qa¶
-
class
QA
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for Question Answering.
-
model
¶ Model name
Model name on Backprop’s qa endpoint
Model object that implements the qa task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(question: Union[str, List[str]], context: Union[str, List[str]], prev_qa: Union[List[Tuple[str, str]], List[List[Tuple[str, str]]]] = [])[source]¶ Perform QA, either on docstore or on provided context.
- Parameters
question – Question (string or list of strings) for qa model.
context – Context (string or list of strings) to ask question from.
prev_qa (optional) – List of previous question, answer tuples or list of prev_qa.
- Returns
Answer string or list of answer strings
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 256, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for Q&A tasks.
- Parameters
params – dictionary of lists: ‘questions’, ‘answers’, ‘contexts’. Optionally includes ‘prev_qas’: list of lists containing (q, a) tuples to prepend to context.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop # Initialise task qa = backprop.QA() # Set up training data for QA. Note that repeated contexts are needed, along with empty prev_qas to match. # Input must be completely 1:1, each question has an associated answer, context, and prev_qa (if prev_qa is to be used). questions = ["What's Backprop?", "What language is it in?", "When was the Moog synthesizer invented?"] answers = ["A library that trains models", "Python", "1964"] contexts = ["Backprop is a Python library that makes training and using models easier.", "Backprop is a Python library that makes training and using models easier.", "Bob Moog was a physicist. He invented the Moog synthesizer in 1964."] prev_qas = [[], [("What's Backprop?", "A library that trains models")], []] params = {"questions": questions, "answers": answers, "contexts": contexts, "prev_qas": prev_qas} # Finetune qa.finetune(params=params)
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step
(batch, batch_idx)[source]¶ Performs a training step and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.summarisation¶
-
class
Summarisation
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for summarisation.
-
model
¶ Model name
Model name on Backprop’s summarisation endpoint
Model object that implements the summarisation task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(text: Union[str, List[str]])[source]¶ Perform summarisation on input text.
- Parameters
text – string or list of strings to be summarised - keep each string below 500 words.
- Returns
Summary string or list of summary strings.
-
configure_optimizers
()[source]¶ Returns default optimizer for summarisation (AdaFactor, learning rate 1e-3)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 512, max_output_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a generative model for summarisation.
Note
input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)
- Parameters
params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation
epochs – Integer specifying how many training iterations to run
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop summary = backprop.Summarisation() # Provide training data for task inp = ["This is a long news article about recent political happenings.", "This is an article about some recent scientific research."] out = ["Short political summary.", "Short scientific summary."] params = {"input_text": inp, "output_text": out} # Finetune summary.finetune(params)
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step
(batch, batch_idx)[source]¶ Performs a training step and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.text_classification¶
-
class
TextClassification
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for classification.
-
model
¶ Model name
Model name on Backprop’s text-classification endpoint
Model object that implements the text-classification task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(text: Union[str, List[str]], labels: Optional[Union[List[str], List[List[str]]]] = None, top_k: int = 0)[source]¶ Classify input text based on previous training (user-tuned models) or according to given list of labels (zero-shot)
- Parameters
text – string or list of strings to be classified
labels – list of labels for zero-shot classification (on our out-of-the-box models). If using a user-trained model (e.g. XLNet), this is not used.
top_k – return probabilities only for top_k predictions. Use 0 to get all.
- Returns
dict where each key is a label and value is probability between 0 and 1, or list of dicts.
-
configure_optimizers
()[source]¶ Returns default optimizer for text classification (AdamW, learning rate 2e-5)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: int = 128, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a text classification model on provided data.
- Parameters
params – Dict containing keys “texts” and “labels”, with values being input/output data lists.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
max_length – Int determining the maximum token length of input strings.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop tc = backprop.TextCLassification() # Set up input data. Labels will automatically be used to set up model with number of classes for classification. inp = ["This is a political news article", "This is a computer science research paper", "This is a movie review"] out = ["Politics", "Science", "Entertainment"] params = {"texts": inp, "labels": out} # Finetune tc.finetune(params)
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step
(batch, batch_idx)[source]¶ Performs a training step and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.text_generation¶
-
class
TextGeneration
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for text generation.
-
model
¶ Model name
Model name on Backprop’s text-generation endpoint
Model object that implements the text-generation task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(text: Union[str, List[str]], min_length: Optional[int] = None, max_length: Optional[int] = None, temperature: Optional[float] = None, top_k: Optional[int] = None, top_p: Optional[float] = None, repetition_penalty: Optional[float] = None, length_penalty: Optional[float] = None, num_beams: Optional[int] = None, num_generations: Optional[int] = None, do_sample: Optional[bool] = None)[source]¶ Generates text to continue from the given input.
- Parameters
input_text (string) – Text from which the model will begin generating.
min_length (int) – Minimum number of tokens to generate (1 token ~ 1 word).
max_length (int) – Maximum number of tokens to generate (1 token ~ 1 word).
temperature (float) – Value that alters the randomness of generation (0.0 is no randomness, higher values introduce randomness. 0.5 - 0.7 is a good starting point).
top_k (int) – Only choose from the top_k tokens when generating (0 is no limit).
top_p (float) – Only choose from the top tokens with combined probability greater than top_p.
repetition_penalty (float) – Penalty to be applied to tokens present in the input_text and tokens already generated in the sequence (>1 discourages repetition while <1 encourages).
length_penalty (float) – Penalty applied to overall sequence length. Set >1 for longer sequences, or <1 for shorter ones.
num_beams (int) – Number of beams to be used in beam search. Does a number of generations to pick the best one. (1: no beam search)
num_generations (int) – How many times to run generation. Results are returned as a list.
do_sample (bool) – Whether or not sampling strategies (temperature, top_k, top_p) should be used.
Example:
import backprop tg = backprop.TextGeneration() tg("Geralt knew the sings, the monster was a", min_length=20, max_length=50, temperature=0.7) > " real danger, and he was the only one in the village who knew how to defend himself."
-
configure_optimizers
()[source]¶ Returns default optimizer for text generation (AdaFactor, learning rate 1e-3)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_input_length: int = 128, max_output_length: int = 32, epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for a text generation task.
Note
input_text and output_text in params must have matching ordering (item 1 of input must match item 1 of output)
- Parameters
params – Dictionary of model inputs. Contains ‘input_text’ and ‘output_text’ keys, with values as lists of input/output data.
max_input_length – Maximum number of tokens (1 token ~ 1 word) in input. Anything higher will be truncated. Max 512.
max_output_length – Maximum number of tokens (1 token ~ 1 word) in output. Anything higher will be truncated. Max 512.
validation_split – Float between 0 and 1 that determines what percentage of the data to use for validation.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop tg = backprop.TextGeneration() # Any text works as training data inp = ["I really liked the service I received!", "Meh, it was not impressive."] out = ["positive", "negative"] params = {"input_text": inp, "output_text": out} # Finetune tg.finetune(params)
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step
(batch, batch_idx)[source]¶ Performs a training step and returns loss.
- Parameters
batch – Batch output from the dataloader
batch_idx – Batch index.
-
training
: bool¶
-
backprop.tasks.text_vectorisation¶
-
class
TextVectorisation
(model: Optional[Union[str, backprop.models.generic_models.BaseModel]] = None, local: bool = False, api_key: Optional[str] = None, device: Optional[str] = None)[source]¶ Bases:
backprop.tasks.base.Task
Task for text vectorisation.
-
model
¶ Model name
Model name on Backprop’s text-vectorisation endpoint
Model object that implements the text-vectorisation task
-
local
¶ Run locally. Defaults to False
- Type
optional
-
api_key
¶ Backprop API key for non-local inference
- Type
optional
-
device
¶ Device to run inference on. Defaults to “cuda” if available.
- Type
optional
-
__call__
(text: Union[str, List[str]], return_tensor=False)[source]¶ Vectorise input text.
- Parameters
text – string or list of strings to vectorise. Can be both PIL Image objects or paths to images.
- Returns
Vector or list of vectors
-
configure_optimizers
()[source]¶ Returns default optimizer for text vectorisation (AdamW, learning rate 1e-5)
-
finetune
(params, validation_split: Union[float, Tuple[List[int], List[int]]] = 0.15, max_length: Optional[int] = None, variant: str = 'cosine_similarity', epochs: int = 20, batch_size: Optional[int] = None, optimal_batch_size: Optional[int] = None, early_stopping_epochs: int = 1, train_dataloader=None, val_dataloader=None, step=None, configure_optimizers=None)[source]¶ Finetunes a model for text vectorisation. Includes different variants for calculating loss.
- Parameters
params – Dictionary of model inputs. If using triplet variant, contains keys “texts” and “groups”. If using cosine_similarity variant, contains keys “texts1”, “texts2”, and “similarity_scores”.
validation_split – Float between 0 and 1 that determines percentage of data to use for validation.
max_length – Int determining the maximum token length of input strings.
variant – How loss will be calculated: “cosine_similarity” (default) or “triplet”.
epochs – Integer specifying how many training iterations to run.
batch_size – Batch size when training. Leave as None to automatically determine batch size.
optimal_batch_size – Optimal batch size for the model being trained – defaults to model settings.
early_stopping_epochs – Integer determining how many epochs will run before stopping without an improvement in validation loss.
train_dataloader – Dataloader for providing training data when finetuning. Defaults to inbuilt dataloder.
val_dataloader – Dataloader for providing validation data when finetuning. Defaults to inbuilt dataloader.
step – Function determining how to call model for a training step. Defaults to step defined in this task class.
configure_optimizers – Function that sets up the optimizer for training. Defaults to optimizer defined in this task class.
Examples:
import backprop tv = backprop.TextVectorisation() # Set up training data & finetune (cosine_similarity variant) texts1 = ["I went to the store and bought some bread", "I am getting a cat soon"] texts2 = ["I bought bread from the store", "I took my dog for a walk"] similarity_scores = [1.0, 0.0] params = {"texts1": texts1, "texts2": texts2, "similarity_scores": similarity_scores} tv.finetune(params, variant="cosine_similarity") # Set up training data & finetune (triplet variant) texts = ["I went to the store and bought some bread", "I bought bread from the store", "I'm going to go walk my dog"] groups = [0, 0, 1] params = {"texts": texts, "groups": groups} tv.finetune(params, variant="triplet")
-
static
list_models
(return_dict=False, display=False, limit=None)[source]¶ Returns the list of models that can be used and finetuned with this task.
- Parameters
return_dict – Default False. True if you want to return in dict form. Otherwise returns list form.
display – Default False. True if you want output printed directly (overrides return_dict, and returns nothing).
limit – Default None. Maximum number of models to return – leave None to get all models.
-
step_cosine
(batch, batch_idx)[source]¶ Performs a training step and calculates cosine similarity loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
step_triplet
(batch, batch_idx)[source]¶ Performs a training step and calculates triplet loss.
- Parameters
batch – Batch output from dataloader.
batch_idx – Batch index.
-
train_dataloader_triplet
()[source]¶ Returns training dataloader with triplet loss sampling strategy.
-
training
: bool¶
-