API Reference¶
The following constitutes the public API of PACE.
-
pace.
evaluate
(algorithm_class, dataset=<pace.data.BuiltinDataset object>, folds=5, selected_alleles=None, selected_lengths=None, nbr_train=1, test_alleles=None, test_lengths=None, nbr_test=10, scorers={'accuracy': <pace.evaluation.AccuracyScorer object>, 'ppv': <pace.evaluation.PpvScorer object>}, random_seed=127)¶ Evaluate an algorithm.
Given a dataset and an algorithm, this evaluates the algorithm by repeatedly splitting the dataset into training and testing subsets, training a new algorithm instance, asking it to make predictions about the testing subset, and scoring those predictions.
- Parameters
algorithm_class (pace.PredictionAlgorithm) – a function taking no arguments that returns a new instance of the algorithm to test - If the algorithm class has a default constructor, you can simply pass in the class itself. Otherwise, pass in a lambda that fills in the constructor arguments appropriately. The algorithm must implement the interface specified by
pace.PredictionAlgorithm
.dataset (pace.Dataset) – the dataset to use for testing - If omitted, the builtin dataset is used. The dataset must implement the interface specified by
pace.Dataset
.folds (int) – the number of folds (i.e., iterations) to perform (default is 5)
selected_alleles (List[str], optional) – a list of alleles to use for training - If a value is given here, the dataset is filtered so that only samples for those alleles are used for training. (By default, no filtering is done.) Note that this will also determine the filtering of the test data unless a different filter is explicitly specified.
selected_lengths (List[int], optional) – a list of peptide lengths to use for training - If a value is given here, the dataset is filtered so that only samples for those lengths are used for training. (By default, no filtering is done.) Note that this will also determine the filtering of the test data unless a different filter is explicitly specified.
nbr_train (float, optional) – the nonbinder ratio for training - This determines the ratio of nonbinders to binders in the set of samples used for training the algorithm. It defaults to 1.
test_alleles (List[str], optional) – a list of alleles to use for testing - This is equivalent to
selected_allles
but determines the filtering for the testing phase. By default, the same set that was used for training is also used for testing.test_lengths (List[int], optional) – a list of peptide lengths to use for testing - This is equivalent to
selected_lengths
but determines the filtering for the testing phase. By default, the same set that was used for training is also used for testing.nbr_test – the nonbinder ratio for testing - This determines the ratio of nonbinders to binders in the set of samples used for testing the algorithm. It defaults to 10. (Using a value much higher than 10 with the default dataset (without subselecting) will exhaust the pool of nonbinders.)
scorers (Dict[str,pace.Scorer]) – a mapping from labels to scorers - If omitted,
pace.evaluation.default_scorers
is used.random_seed (int, optional) – the random seed used to initialize the random state to ensure reproducible splits are obtained between different runs
- Returns
a mapping from scorer labels to the results returned by that scorer (one per fold)
- Return type
Dict[str,List[Any]]
-
pace.
encode
(sequences, aafeatmat='onehot')¶ Create a numerical encoding for the input peptide sequences Assumes that all input sequences have the same length (TO DO: how should we integrate error handling?)
- Parameters
sequences – List of peptide sequences. A list of strings is accepted as well as a list of lists where the inner lists are single amino acids. All sequences need to be the same length.
aafeatmat – Either the name of one of the builtin peptide encodings or a pandas DataFrame with one amino acid per row, and columns with features. (Rows: 20 amino acids; columns: the encoding of each amino acid.)
- Returns
encoded sequences
- Return type
numpy.ndarray
-
pace.
get_allele_similarity_mat
(allele_similarity_name)¶ Get a matrix of pre-computed allele similarities
- Parameters
allele_similarity_name (str) – Pre-computed allele similarity matrices are availble based on observed peptide binding motifs (‘motifs’) or HLA protein binding pocket residues (‘pockets’).
- Returns
allele similarity matrix
- Return type
pandas.core.frame.DataFrame
-
pace.
get_similar_alleles
(allele_similarity_name, allele, similarity_threshold)¶ Get the most similar alleles to a given allele, based on a specified allele similarity matrix and similarity threshold.
- Parameters
allele_similarity_name (str) – Pre-computed allele similarity matrices are availble based on observed peptide binding motifs (‘motifs’) or HLA protein binding pocket residues (‘pockets’).
allele (str) – The allele for which to determine similar alleles
similarity_threshold – Numerical threhosld value that determins the cutoff for considering an allele similar to the given allele.
- Returns
The similar alleles satisfying the specifid threshold along with the numerical similarity values. Note that the given allele is also returned.
- Return type
pandas.core.frame.DataFrame
-
class
pace.
Sample
¶ a sample to predict
-
property
allele
¶ the allele code for the MHC molecule
-
property
peptide
¶ the amino acid sequence for the peptide (as a string)
-
property
-
class
pace.
Dataset
¶ an abstract base class defining the interface required of a dataset
-
abstract
get_binders
(length)¶ Get all binders with the specified length.
- Parameters
length (int) – the peptide length the caller is interested in
- Returns
all binders with that length - Note that this is allowed to return a single-use iterable.
- Return type
Iterable[pace.Sample]
-
abstract
get_nonbinders
(length)¶ Get all nonbinders with the specified length.
- Parameters
length (int) – the peptide length the caller is interested in
- Returns
all non-binder peptides with that length.
- Return type
List[str]
-
abstract
-
class
pace.
PredictionAlgorithm
¶ an abstract base class defining the interface required of prediction algorithms that are to be evaluated by PACE
-
abstract
predict
(samples)¶ Predict whether or not a list of samples will bind.
- Parameters
samples (List[pace.Sample]) – the samples to predict
- Returns
predictions for each sample - Each prediction is a number between 0 and 1 indicating how likely the sample is to bind.
- Return type
NumPy array-like object (e.g., list of floats)
-
abstract
train
(binders, nonbinders)¶ Train this instance using the supplied training data.
- Parameters
binders – samples that are known to bind
nonbinders – samples that are known to not bind
-
abstract
-
class
pace.
PredictionResult
¶ the result of predicting a single sample
-
property
prediction
¶ the algorithm’s prediction (between 0 and 1)
-
property
sample
¶ the sample that was predicted
-
property
truth
¶ the true answer (either 0 or 1)
-
property
-
class
pace.
Scorer
¶ an abstract base class defining the interface required of scorers - A scorer quantifies (or summarizes) the quality of the prediction results.
-
abstract
score
(results)¶ Generate the score for a set of prediction results.
- Parameters
results (Iterable[PredictionResult]) – the prediction results to score
- Returns
whatever summary info the scorer would like to generate for the results
- Return type
Any
-
abstract