API Reference¶

The following constitutes the public API of PACE.

pace.evaluate(algorithm_class, dataset=<pace.data.BuiltinDataset object>, folds=5, selected_alleles=None, selected_lengths=None, nbr_train=1, test_alleles=None, test_lengths=None, nbr_test=10, scorers={'accuracy': <pace.evaluation.AccuracyScorer object>, 'ppv': <pace.evaluation.PpvScorer object>}, random_seed=127)¶

Evaluate an algorithm.

Given a dataset and an algorithm, this evaluates the algorithm by repeatedly splitting the dataset into training and testing subsets, training a new algorithm instance, asking it to make predictions about the testing subset, and scoring those predictions.

Parameters

algorithm_class (pace.PredictionAlgorithm) – a function taking no arguments that returns a new instance of the algorithm to test - If the algorithm class has a default constructor, you can simply pass in the class itself. Otherwise, pass in a lambda that fills in the constructor arguments appropriately. The algorithm must implement the interface specified by pace.PredictionAlgorithm.
dataset (pace.Dataset) – the dataset to use for testing - If omitted, the builtin dataset is used. The dataset must implement the interface specified by pace.Dataset.
folds (int) – the number of folds (i.e., iterations) to perform (default is 5)
selected_alleles (List[str], optional) – a list of alleles to use for training - If a value is given here, the dataset is filtered so that only samples for those alleles are used for training. (By default, no filtering is done.) Note that this will also determine the filtering of the test data unless a different filter is explicitly specified.
selected_lengths (List[int], optional) – a list of peptide lengths to use for training - If a value is given here, the dataset is filtered so that only samples for those lengths are used for training. (By default, no filtering is done.) Note that this will also determine the filtering of the test data unless a different filter is explicitly specified.
nbr_train (float, optional) – the nonbinder ratio for training - This determines the ratio of nonbinders to binders in the set of samples used for training the algorithm. It defaults to 1.
test_alleles (List[str], optional) – a list of alleles to use for testing - This is equivalent to selected_allles but determines the filtering for the testing phase. By default, the same set that was used for training is also used for testing.
test_lengths (List[int], optional) – a list of peptide lengths to use for testing - This is equivalent to selected_lengths but determines the filtering for the testing phase. By default, the same set that was used for training is also used for testing.
nbr_test – the nonbinder ratio for testing - This determines the ratio of nonbinders to binders in the set of samples used for testing the algorithm. It defaults to 10. (Using a value much higher than 10 with the default dataset (without subselecting) will exhaust the pool of nonbinders.)
scorers (Dict[str,pace.Scorer]) – a mapping from labels to scorers - If omitted, pace.evaluation.default_scorers is used.
random_seed (int, optional) – the random seed used to initialize the random state to ensure reproducible splits are obtained between different runs

Returns

a mapping from scorer labels to the results returned by that scorer (one per fold)

Return type

Dict[str,List[Any]]

pace.encode(sequences, aafeatmat='onehot')¶

Create a numerical encoding for the input peptide sequences Assumes that all input sequences have the same length (TO DO: how should we integrate error handling?)

Parameters

sequences – List of peptide sequences. A list of strings is accepted as well as a list of lists where the inner lists are single amino acids. All sequences need to be the same length.
aafeatmat – Either the name of one of the builtin peptide encodings or a pandas DataFrame with one amino acid per row, and columns with features. (Rows: 20 amino acids; columns: the encoding of each amino acid.)

Returns

encoded sequences

Return type

numpy.ndarray

pace.get_allele_similarity_mat(allele_similarity_name)¶

Get a matrix of pre-computed allele similarities

Parameters: allele_similarity_name (str) – Pre-computed allele similarity matrices are availble based on observed peptide binding motifs (‘motifs’) or HLA protein binding pocket residues (‘pockets’).
Returns: allele similarity matrix
Return type: pandas.core.frame.DataFrame

pace.get_similar_alleles(allele_similarity_name, allele, similarity_threshold)¶

Get the most similar alleles to a given allele, based on a specified allele similarity matrix and similarity threshold.

Parameters

allele_similarity_name (str) – Pre-computed allele similarity matrices are availble based on observed peptide binding motifs (‘motifs’) or HLA protein binding pocket residues (‘pockets’).
allele (str) – The allele for which to determine similar alleles
similarity_threshold – Numerical threhosld value that determins the cutoff for considering an allele similar to the given allele.

Returns

The similar alleles satisfying the specifid threshold along with the numerical similarity values. Note that the given allele is also returned.

Return type

pandas.core.frame.DataFrame

class pace.Sample¶

a sample to predict

property allele¶: the allele code for the MHC molecule

property peptide¶: the amino acid sequence for the peptide (as a string)

class pace.Dataset¶

an abstract base class defining the interface required of a dataset

abstract get_binders(length)¶

Get all binders with the specified length.

Parameters: length (int) – the peptide length the caller is interested in
Returns: all binders with that length - Note that this is allowed to return a single-use iterable.
Return type: Iterable[pace.Sample]

abstract get_nonbinders(length)¶

Get all nonbinders with the specified length.

Parameters: length (int) – the peptide length the caller is interested in
Returns: all non-binder peptides with that length.
Return type: List[str]

class pace.PredictionAlgorithm¶

an abstract base class defining the interface required of prediction algorithms that are to be evaluated by PACE

abstract predict(samples)¶

Predict whether or not a list of samples will bind.

Parameters: samples (List[pace.Sample]) – the samples to predict
Returns: predictions for each sample - Each prediction is a number between 0 and 1 indicating how likely the sample is to bind.
Return type: NumPy array-like object (e.g., list of floats)

abstract train(binders, nonbinders)¶

Train this instance using the supplied training data.

Parameters

binders – samples that are known to bind
nonbinders – samples that are known to not bind

class pace.PredictionResult¶

the result of predicting a single sample

property prediction¶: the algorithm’s prediction (between 0 and 1)

property sample¶: the sample that was predicted

property truth¶: the true answer (either 0 or 1)

class pace.Scorer¶

an abstract base class defining the interface required of scorers - A scorer quantifies (or summarizes) the quality of the prediction results.

abstract score(results)¶

Generate the score for a set of prediction results.

Parameters: results (Iterable[PredictionResult]) – the prediction results to score
Returns: whatever summary info the scorer would like to generate for the results
Return type: Any