acton package¶
Subpackages¶
Submodules¶
acton.acton module¶
Main processing script for Acton.
-
acton.acton.
draw
(n: int, lst: typing.List[T], replace: bool = True) → typing.List[T][source]¶ Draws n random elements from a list.
Parameters: - n – Number of elements to draw.
- lst – List of elements to draw from.
- replace – Draw with replacement.
Returns: n random elements.
Return type: List[T]
-
acton.acton.
get_DB
(data_path: str, pandas_key: str = None) -> (<class 'acton.database.Database'>, <class 'dict'>)[source]¶ Gets a Database that will handle the given data table.
Parameters: - data_path – Path to file.
- pandas_key – Key for pandas HDF5. Specify iff using pandas.
Returns: - Database – Database that will handle the given data table.
- dict – Keyword arguments for the Database constructor.
-
acton.acton.
label
(recommendations: acton.proto.wrappers.Recommendations) → acton.proto.wrappers.LabelPool[source]¶ Simulates a labelling task.
Parameters: - data_path – Path to data file.
- feature_cols – List of column names of features. If empty, all columns will be used.
- label_col – Column name of the labels.
- pandas_key – Key for pandas HDF5. Specify iff using pandas.
Returns: Return type:
-
acton.acton.
main
(data_path: str, feature_cols: typing.List[str], label_col: str, output_path: str, n_epochs: int = 10, initial_count: int = 10, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', pandas_key: str = '', n_recommendations: int = 1)[source]¶ Simulate an active learning experiment.
Parameters: - data_path – Path to data file.
- feature_cols – List of column names of the features. If empty, all non-label and non-ID columns will be used.
- label_col – Column name of the labels.
- output_path – Path to output file. Will be overwritten.
- n_epochs – Number of epochs to run.
- initial_count – Number of random instances to label initially.
- recommender – Name of recommender to make recommendations.
- predictor – Name of predictor to make predictions.
- pandas_key – Key for pandas HDF5. Specify iff using pandas.
- n_recommendations – Number of recommendations to make at once.
-
acton.acton.
predict
(labels: acton.proto.wrappers.LabelPool, predictor: str) → acton.proto.wrappers.Predictions[source]¶ Train a predictor and predict labels.
Parameters: - labels – IDs of labelled instances.
- predictor – Name of predictor to make predictions.
-
acton.acton.
recommend
(predictions: acton.proto.wrappers.Predictions, recommender: str = 'RandomRecommender', n_recommendations: int = 1) → acton.proto.wrappers.Recommendations[source]¶ Recommends instances to label based on predictions.
Parameters: - recommender – Name of recommender to make recommendations.
- n_recommendations – Number of recommendations to make at once. Default 1.
Returns: Return type:
-
acton.acton.
simulate_active_learning
(ids: typing.Iterable[int], db: acton.database.Database, db_kwargs: dict, output_path: str, n_initial_labels: int = 10, n_epochs: int = 10, test_size: int = 0.2, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', n_recommendations: int = 1)[source]¶ Simulates an active learning task.
Parameters: - ids – IDs of instances in the unlabelled pool.
- db – Database with features and labels.
- db_kwargs – Keyword arguments for the database constructor.
- output_path – Path to output intermediate predictions to. Will be overwritten.
- n_initial_labels – Number of initial labels to draw.
- n_epochs – Number of epochs.
- test_size – Percentage size of testing set.
- recommender – Name of recommender to make recommendations.
- predictor – Name of predictor to make predictions.
- n_recommendations – Number of recommendations to make at once.
-
acton.acton.
try_pandas
(data_path: str) → bool[source]¶ Guesses if a file is a pandas file.
Parameters: data_path – Path to file. Returns: True if the file is pandas. Return type: bool
acton.cli module¶
Command-line interface for Acton.
-
acton.cli.
read_binary
() → bytes[source]¶ Reads binary data from stdin.
Notes
The first eight bytes are expected to be the length of the input data as an unsigned long long.
Returns: Binary data. Return type: bytes
acton.database module¶
Wrapper class for databases.
-
class
acton.database.
ASCIIReader
(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184103664'> = None)[source]¶ Bases:
acton.database.Database
Reads ASCII databases.
-
feature_cols
¶ List[str] – List of feature columns.
-
label_col
¶ str – Name of label column.
-
max_id_length
¶ int – Maximum length of IDs.
-
n_features
¶ int – Number of features.
-
n_instances
¶ int – Number of instances.
-
n_labels
¶ int – Number of labels per instance.
-
path
¶ str – Path to ASCII file.
-
encode_labels
¶ bool – Whether to encode labels as integers.
-
label_encoder
¶ sklearn.preprocessing.LabelEncoder – Encodes labels as integers.
-
_db
¶ Database – Underlying ManagedHDF5Database.
-
_db_filepath
¶ str – Path of underlying HDF5 database.
-
_tempdir
¶ str – Temporary directory where the underlying HDF5 database is stored.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030184137672'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184158768'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x F array of label vectors.
Return type: numpy.ndarray
-
-
class
acton.database.
Database
[source]¶ Bases:
abc.ABC
Base class for database wrappers.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030183041904'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183058904'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x F array of label vectors.
Return type: numpy.ndarray
-
to_proto
() → mock.mock.Database[source]¶ Serialises this database as a protobuf.
Returns: Protobuf representing this database. Return type: DatabasePB
-
write_features
(ids: typing.Sequence[int], features: <MagicMock id='140030183067488'>)[source]¶ Writes feature vectors to the database.
Parameters: - ids – Iterable of IDs.
- features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.
-
write_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183100936'>)[source]¶ Writes label vectors to the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
- labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.
-
-
class
acton.database.
FITSReader
(path: str, feature_cols: typing.List[str], label_col: str, hdu_index: int = 1, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183896176'> = None)[source]¶ Bases:
acton.database.Database
Reads FITS databases.
-
hdu_index
¶ int – Index of HDU in the FITS file.
-
feature_cols
¶ List[str] – List of feature columns.
-
label_col
¶ str – Name of label column.
-
n_features
¶ int – Number of features.
-
n_instances
¶ int – Number of instances.
-
n_labels
¶ int – Number of labels per instance.
-
path
¶ str – Path to FITS file.
-
encode_labels
¶ bool – Whether to encode labels as integers.
-
label_encoder
¶ sklearn.preprocessing.LabelEncoder – Encodes labels as integers.
-
_hdulist
¶ astropy.io.fits.HDUList – FITS HDUList.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030183983368'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184065120'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x 1 array of label vectors.
Return type: numpy.p
-
-
class
acton.database.
HDF5Database
(path: str)[source]¶ Bases:
acton.database.Database
Database wrapping an HDF5 file as a context manager.
-
path
¶ str – Path to HDF5 file.
-
_h5_file
¶ h5py.File – HDF5 file object.
-
-
class
acton.database.
HDF5Reader
(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183223928'> = None)[source]¶ Bases:
acton.database.HDF5Database
Reads HDF5 databases.
-
feature_cols
¶ List[str] – List of feature datasets.
-
label_col
¶ str – Name of label dataset.
-
n_features
¶ int – Number of features.
-
n_instances
¶ int – Number of instances.
-
n_labels
¶ int – Number of labels per instance.
-
path
¶ str – Path to HDF5 file.
-
encode_labels
¶ bool – Whether to encode labels as integers.
-
label_encoder
¶ sklearn.preprocessing.LabelEncoder – Encodes labels as integers.
-
_h5_file
¶ h5py.File – HDF5 file object.
-
_is_multidimensional
¶ bool – Whether the features are in a multidimensional dataset.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030183232512'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183245416'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x F array of label vectors.
Return type: numpy.ndarray
-
-
class
acton.database.
ManagedHDF5Database
(path: str, label_dtype: str = None, feature_dtype: str = None)[source]¶ Bases:
acton.database.HDF5Database
Database using an HDF5 file.
Notes
This database uses an internal schema. For reading files from disk, use another Database.
-
path
¶ str – Path to HDF5 file.
-
label_dtype
¶ str – Data type of labels.
-
feature_dtype
¶ str – Data type of features.
-
_h5_file
¶ h5py.File – Opened HDF5 file.
-
_sync_attrs
¶ List[str] – List of instance attributes to sync with the HDF5 file’s attributes.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030183131176'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183161080'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x F array of label vectors.
Return type: numpy.ndarray
-
to_proto
() → mock.mock.Database[source]¶ Serialises this database as a protobuf.
Returns: Protobuf representing this database. Return type: DatabasePB
-
write_features
(ids: typing.Sequence[int], features: <MagicMock id='140030183110304'>)[source]¶ Writes feature vectors to the database.
Parameters: - ids – Iterable of IDs.
- features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.
Returns: N x D array of feature vectors.
Return type: numpy.ndarray
-
write_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183144080'>)[source]¶ Writes label vectors to the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
- labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.
-
-
class
acton.database.
PandasReader
(path: str, feature_cols: typing.List[str], label_col: str, key: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184226488'> = None)[source]¶ Bases:
acton.database.Database
Reads HDF5 databases.
-
feature_cols
¶ List[str] – List of feature datasets.
-
label_col
¶ str – Name of label dataset.
-
n_features
¶ int – Number of features.
-
n_instances
¶ int – Number of instances.
-
n_labels
¶ int – Number of labels per instance.
-
path
¶ str – Path to HDF5 file.
-
encode_labels
¶ bool – Whether to encode labels as integers.
-
label_encoder
¶ sklearn.preprocessing.LabelEncoder – Encodes labels as integers.
-
_df
¶ pandas.DataFrame – Pandas dataframe.
-
get_known_instance_ids
() → typing.List[int][source]¶ Returns a list of known instance IDs.
Returns: A list of known instance IDs. Return type: List[str]
-
get_known_labeller_ids
() → typing.List[int][source]¶ Returns a list of known labeller IDs.
Returns: A list of known labeller IDs. Return type: List[str]
-
read_features
(ids: typing.Sequence[int]) → <MagicMock id='140030183911944'>[source]¶ Reads feature vectors from the database.
Parameters: ids – Iterable of IDs. Returns: N x D array of feature vectors. Return type: numpy.ndarray
-
read_labels
(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183959912'>[source]¶ Reads label vectors from the database.
Parameters: - labeller_ids – Iterable of labeller IDs.
- instance_ids – Iterable of instance IDs.
Returns: T x N x 1 array of label vectors.
Return type: numpy.ndarray
-
acton.kde_predictor module¶
A predictor that uses KDE to classify instances.
-
class
acton.kde_predictor.
KDEClassifier
(bandwidth=1.0)[source]¶ Bases:
BaseEstimator
,ClassifierMixin
A classifier using kernel density estimation to classify instances.
-
fit
(X, y)[source]¶ Fits kernel density models to the data.
Parameters: - X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.
- y (array-like, shape (n_samples,)) – Target vector relative to X.
-
acton.labellers module¶
Labeller classes.
-
class
acton.labellers.
ASCIITableLabeller
(path: str, id_col: str, label_col: str)[source]¶ Bases:
acton.labellers.Labeller
Labeller that obtains labels from an ASCII table.
-
path
¶ str – Path to table.
-
id_col
¶ str – Name of the column where IDs are stored.
-
label_col
¶ str – Name of the column where binary labels are stored.
-
_table
¶ astropy.table.Table – Table object.
-
-
class
acton.labellers.
DatabaseLabeller
(db: acton.database.Database)[source]¶ Bases:
acton.labellers.Labeller
Labeller that obtains labels from a Database.
-
_db
¶ acton.database.Database – Database with labels.
-
acton.plot module¶
Script to plot a dump of predictions.
acton.predictors module¶
Predictor classes.
-
acton.predictors.
AveragePredictions
(predictor: acton.predictors.Predictor) → acton.predictors.Predictor[source]¶ Wrapper for a predictor that averages predicted probabilities.
Notes
This effectively reduces the number of predictors to 1.
Parameters: predictor – Predictor to wrap. Returns: Predictor with averaged predictions. Return type: Predictor
-
class
acton.predictors.
Committee
(Predictor: type, db: acton.database.Database, n_classifiers: int = 10, subset_size: float = 0.6, **kwargs: dict)[source]¶ Bases:
acton.predictors.Predictor
A predictor using a committee of other predictors.
-
n_classifiers
¶ int – Number of logistic regression classifiers in the committee.
-
subset_size
¶ float – Percentage of known labels to take subsets of to train the classifier. Lower numbers increase variety.
-
_db
¶ acton.database.Database – Database storing features and labels.
-
_committee
¶ List[sklearn.linear_model.LogisticRegression] – Underlying committee of logistic regression classifiers.
-
_reference_predictor
¶ Predictor – Reference predictor trained on all known labels.
-
fit
(ids: typing.Iterable[int])[source]¶ Fits the predictor to labelled data.
Parameters: ids – List of IDs of instances to train from.
-
predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183690872'>, <MagicMock id='140030183318304'>)[source]¶ Predicts labels of instances.
Notes
Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x T x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
reference_predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183316392'>, <MagicMock id='140030183290192'>)[source]¶ Predicts labels using the best possible method.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x 1 x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
-
class
acton.predictors.
GPClassifier
(db: acton.database.Database, max_iters: int = 50000, n_jobs: int = 1)[source]¶ Bases:
acton.predictors.Predictor
Classifier using Gaussian processes.
-
max_iters
¶ int – Maximum optimisation iterations.
-
label_encoder
¶ sklearn.preprocessing.LabelEncoder – Encodes labels as integers.
-
model_
¶ gpy.models.GPClassification – GP model.
-
_db
¶ acton.database.Database – Database storing features and labels.
-
fit
(ids: typing.Iterable[int])[source]¶ Fits the predictor to labelled data.
Parameters: ids – List of IDs of instances to train from.
-
predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183344224'>, <MagicMock id='140030183364872'>)[source]¶ Predicts labels of instances.
Notes
Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x 1 x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
reference_predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183373456'>, <MagicMock id='140030183385912'>)[source]¶ Predicts labels using the best possible method.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x 1 x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
-
class
acton.predictors.
Predictor
[source]¶ Bases:
abc.ABC
Base class for predictors.
-
prediction_type
¶ str – What kind of predictions this class generates, e.g. classification.s
-
fit
(ids: typing.Iterable[int])[source]¶ Fits the predictor to labelled data.
Parameters: ids – List of IDs of instances to train from.
-
predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183704952'>, <MagicMock id='140030183719432'>)[source]¶ Predicts labels of instances.
Notes
Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x T x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
prediction_type
= 'classification'
-
reference_predict
(ids: typing.Sequence[int]) -> (<MagicMock id='140030183703216'>, <MagicMock id='140030183708096'>)[source]¶ Predicts labels using the best possible method.
Parameters: ids – List of IDs of instances to predict labels for. Returns: - numpy.ndarray – An N x 1 x C array of corresponding predictions.
- numpy.ndarray – A N array of confidences (or None if not applicable).
-
-
acton.predictors.
from_class
(Predictor: type, regression: bool = False) → type[source]¶ Converts a scikit-learn predictor class into a Predictor class.
Parameters: - Predictor – scikit-learn predictor class.
- regression – Whether this predictor does regression (as opposed to classification).
Returns: Predictor class wrapping the scikit-learn class.
Return type: type
-
acton.predictors.
from_instance
(predictor: BaseEstimator, db: acton.database.Database, regression: bool = False) → acton.predictors.Predictor[source]¶ Converts a scikit-learn predictor instance into a Predictor instance.
Parameters: - predictor – scikit-learn predictor.
- db – Database storing features and labels.
- regression – Whether this predictor does regression (as opposed to classification).
Returns: Predictor instance wrapping the scikit-learn predictor.
Return type:
acton.recommenders module¶
Recommender classes.
-
class
acton.recommenders.
EntropyRecommender
(db: acton.database.Database)[source]¶ Bases:
acton.recommenders.Recommender
Recommends instances by confidence-based uncertainty sampling.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182493656'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
class
acton.recommenders.
MarginRecommender
(db: acton.database.Database)[source]¶ Bases:
acton.recommenders.Recommender
Recommends instances by margin-based uncertainty sampling.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182519080'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Notes
Assumes predictions are probabilities of positive binary label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
class
acton.recommenders.
QBCRecommender
(db: acton.database.Database)[source]¶ Bases:
acton.recommenders.Recommender
Recommends instances by committee disagreement.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182451128'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Notes
Assumes predictions are probabilities of positive binary label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x T x C array of predictions. The ith row must correspond with the ith ID in the sequence.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
class
acton.recommenders.
RandomRecommender
(db: acton.database.Database)[source]¶ Bases:
acton.recommenders.Recommender
Recommends instances at random.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182433960'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x T x C array of predictions.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
class
acton.recommenders.
Recommender
[source]¶ Bases:
abc.ABC
Base class for recommenders.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182416792'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x T x C array of predictions.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
class
acton.recommenders.
UncertaintyRecommender
(db: acton.database.Database)[source]¶ Bases:
acton.recommenders.Recommender
Recommends instances by confidence-based uncertainty sampling.
-
recommend
(ids: typing.Sequence[int], predictions: <MagicMock id='140030182480584'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶ Recommends an instance to label.
Notes
Assumes predictions are probabilities of positive binary label.
Parameters: - ids – Sequence of IDs in the unlabelled data pool.
- predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
- n – Number of recommendations to make.
- diversity – Recommendation diversity in [0, 1].
Returns: IDs of the instances to label.
Return type: Sequence[int]
-
-
acton.recommenders.
choose_boltzmann
(features: <MagicMock id='140030182379080'>, scores: <MagicMock id='140030182391536'>, n: int, temperature: float = 1.0) → typing.Sequence[int][source]¶ Chooses n scores using a Boltzmann distribution.
Notes
Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.
Parameters: - scores – 1D array of scores.
- n – Number of scores to choose.
- temperature – Temperature parameter for sampling. Higher temperatures give more diversity.
Returns: List of indices of scores chosen.
Return type: Sequence[int]
-
acton.recommenders.
choose_mmr
(features: <MagicMock id='140030182328584'>, scores: <MagicMock id='140030182370496'>, n: int, l: float = 0.5) → typing.Sequence[int][source]¶ Chooses n scores using maximal marginal relevance.
Notes
Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.
Parameters: - scores – 1D array of scores.
- n – Number of scores to choose.
- l – Lambda parameter for MMR. l = 1 gives a relevance-ranked list and l = 0 gives a maximal diversity ranking.
Returns: List of indices of scores chosen.
Return type: Sequence[int]