acton package

Submodules

acton.acton module

Main processing script for Acton.

acton.acton.draw(n: int, lst: typing.List[T], replace: bool = True) → typing.List[T][source]

Draws n random elements from a list.

Parameters:
  • n – Number of elements to draw.
  • lst – List of elements to draw from.
  • replace – Draw with replacement.
Returns:

n random elements.

Return type:

List[T]

acton.acton.get_DB(data_path: str, pandas_key: str = None) -> (<class 'acton.database.Database'>, <class 'dict'>)[source]

Gets a Database that will handle the given data table.

Parameters:
  • data_path – Path to file.
  • pandas_key – Key for pandas HDF5. Specify iff using pandas.
Returns:

  • Database – Database that will handle the given data table.
  • dict – Keyword arguments for the Database constructor.

acton.acton.label(recommendations: acton.proto.wrappers.Recommendations) → acton.proto.wrappers.LabelPool[source]

Simulates a labelling task.

Parameters:
  • data_path – Path to data file.
  • feature_cols – List of column names of features. If empty, all columns will be used.
  • label_col – Column name of the labels.
  • pandas_key – Key for pandas HDF5. Specify iff using pandas.
Returns:

Return type:

acton.proto.wrappers.LabelPool

acton.acton.main(data_path: str, feature_cols: typing.List[str], label_col: str, output_path: str, n_epochs: int = 10, initial_count: int = 10, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', pandas_key: str = '', n_recommendations: int = 1)[source]

Simulate an active learning experiment.

Parameters:
  • data_path – Path to data file.
  • feature_cols – List of column names of the features. If empty, all non-label and non-ID columns will be used.
  • label_col – Column name of the labels.
  • output_path – Path to output file. Will be overwritten.
  • n_epochs – Number of epochs to run.
  • initial_count – Number of random instances to label initially.
  • recommender – Name of recommender to make recommendations.
  • predictor – Name of predictor to make predictions.
  • pandas_key – Key for pandas HDF5. Specify iff using pandas.
  • n_recommendations – Number of recommendations to make at once.
acton.acton.predict(labels: acton.proto.wrappers.LabelPool, predictor: str) → acton.proto.wrappers.Predictions[source]

Train a predictor and predict labels.

Parameters:
  • labels – IDs of labelled instances.
  • predictor – Name of predictor to make predictions.
acton.acton.recommend(predictions: acton.proto.wrappers.Predictions, recommender: str = 'RandomRecommender', n_recommendations: int = 1) → acton.proto.wrappers.Recommendations[source]

Recommends instances to label based on predictions.

Parameters:
  • recommender – Name of recommender to make recommendations.
  • n_recommendations – Number of recommendations to make at once. Default 1.
Returns:

Return type:

acton.proto.wrappers.Recommendations

acton.acton.simulate_active_learning(ids: typing.Iterable[int], db: acton.database.Database, db_kwargs: dict, output_path: str, n_initial_labels: int = 10, n_epochs: int = 10, test_size: int = 0.2, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', n_recommendations: int = 1)[source]

Simulates an active learning task.

Parameters:
  • ids – IDs of instances in the unlabelled pool.
  • db – Database with features and labels.
  • db_kwargs – Keyword arguments for the database constructor.
  • output_path – Path to output intermediate predictions to. Will be overwritten.
  • n_initial_labels – Number of initial labels to draw.
  • n_epochs – Number of epochs.
  • test_size – Percentage size of testing set.
  • recommender – Name of recommender to make recommendations.
  • predictor – Name of predictor to make predictions.
  • n_recommendations – Number of recommendations to make at once.
acton.acton.try_pandas(data_path: str) → bool[source]

Guesses if a file is a pandas file.

Parameters:data_path – Path to file.
Returns:True if the file is pandas.
Return type:bool
acton.acton.validate_predictor(predictor: str)[source]

Raises an exception if the predictor is not valid.

Parameters:predictor – Name of predictor.
Raises:ValueError
acton.acton.validate_recommender(recommender: str)[source]

Raises an exception if the recommender is not valid.

Parameters:recommender – Name of recommender.
Raises:ValueError

acton.cli module

Command-line interface for Acton.

acton.cli.lines_from_stdin() → typing.Iterable[str][source]

Yields lines from stdin.

acton.cli.read_binary() → bytes[source]

Reads binary data from stdin.

Notes

The first eight bytes are expected to be the length of the input data as an unsigned long long.

Returns:Binary data.
Return type:bytes
acton.cli.read_bytes_from_buffer(n: int, buffer: typing.BinaryIO) → bytes[source]

Reads n bytes from stdin, blocking until all bytes are received.

Parameters:
  • n – How many bytes to read.
  • buffer – Which buffer to read from.
Returns:

Exactly n bytes.

Return type:

bytes

acton.cli.write_binary(string: bytes)[source]

Writes binary data to stdout.

Notes

The output will be preceded by the length as an unsigned long long.

acton.database module

Wrapper class for databases.

class acton.database.ASCIIReader(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184103664'> = None)[source]

Bases: acton.database.Database

Reads ASCII databases.

feature_cols

List[str] – List of feature columns.

label_col

str – Name of label column.

max_id_length

int – Maximum length of IDs.

n_features

int – Number of features.

n_instances

int – Number of instances.

n_labels

int – Number of labels per instance.

path

str – Path to ASCII file.

encode_labels

bool – Whether to encode labels as integers.

label_encoder

sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_db

Database – Underlying ManagedHDF5Database.

_db_filepath

str – Path of underlying HDF5 database.

_tempdir

str – Temporary directory where the underlying HDF5 database is stored.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030184137672'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184158768'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x F array of label vectors.

Return type:

numpy.ndarray

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184175544'>)[source]
write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030184196640'>)[source]
class acton.database.Database[source]

Bases: abc.ABC

Base class for database wrappers.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183041904'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183058904'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x F array of label vectors.

Return type:

numpy.ndarray

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183067488'>)[source]

Writes feature vectors to the database.

Parameters:
  • ids – Iterable of IDs.
  • features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.
write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183100936'>)[source]

Writes label vectors to the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
  • labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.
class acton.database.FITSReader(path: str, feature_cols: typing.List[str], label_col: str, hdu_index: int = 1, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183896176'> = None)[source]

Bases: acton.database.Database

Reads FITS databases.

hdu_index

int – Index of HDU in the FITS file.

feature_cols

List[str] – List of feature columns.

label_col

str – Name of label column.

n_features

int – Number of features.

n_instances

int – Number of instances.

n_labels

int – Number of labels per instance.

path

str – Path to FITS file.

encode_labels

bool – Whether to encode labels as integers.

label_encoder

sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_hdulist

astropy.io.fits.HDUList – FITS HDUList.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183983368'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184065120'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x 1 array of label vectors.

Return type:

numpy.p

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184245736'>)[source]
write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030184250448'>)[source]
class acton.database.HDF5Database(path: str)[source]

Bases: acton.database.Database

Database wrapping an HDF5 file as a context manager.

path

str – Path to HDF5 file.

_h5_file

h5py.File – HDF5 file object.

class acton.database.HDF5Reader(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183223928'> = None)[source]

Bases: acton.database.HDF5Database

Reads HDF5 databases.

feature_cols

List[str] – List of feature datasets.

label_col

str – Name of label dataset.

n_features

int – Number of features.

n_instances

int – Number of instances.

n_labels

int – Number of labels per instance.

path

str – Path to HDF5 file.

encode_labels

bool – Whether to encode labels as integers.

label_encoder

sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_h5_file

h5py.File – HDF5 file object.

_is_multidimensional

bool – Whether the features are in a multidimensional dataset.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183232512'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183245416'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x F array of label vectors.

Return type:

numpy.ndarray

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183262192'>)[source]
write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183279192'>)[source]
class acton.database.ManagedHDF5Database(path: str, label_dtype: str = None, feature_dtype: str = None)[source]

Bases: acton.database.HDF5Database

Database using an HDF5 file.

Notes

This database uses an internal schema. For reading files from disk, use another Database.

path

str – Path to HDF5 file.

label_dtype

str – Data type of labels.

feature_dtype

str – Data type of features.

_h5_file

h5py.File – Opened HDF5 file.

_sync_attrs

List[str] – List of instance attributes to sync with the HDF5 file’s attributes.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183131176'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183161080'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x F array of label vectors.

Return type:

numpy.ndarray

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183110304'>)[source]

Writes feature vectors to the database.

Parameters:
  • ids – Iterable of IDs.
  • features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.
Returns:

N x D array of feature vectors.

Return type:

numpy.ndarray

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183144080'>)[source]

Writes label vectors to the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
  • labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.
class acton.database.PandasReader(path: str, feature_cols: typing.List[str], label_col: str, key: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184226488'> = None)[source]

Bases: acton.database.Database

Reads HDF5 databases.

feature_cols

List[str] – List of feature datasets.

label_col

str – Name of label dataset.

n_features

int – Number of features.

n_instances

int – Number of instances.

n_labels

int – Number of labels per instance.

path

str – Path to HDF5 file.

encode_labels

bool – Whether to encode labels as integers.

label_encoder

sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_df

pandas.DataFrame – Pandas dataframe.

get_known_instance_ids() → typing.List[int][source]

Returns a list of known instance IDs.

Returns:A list of known instance IDs.
Return type:List[str]
get_known_labeller_ids() → typing.List[int][source]

Returns a list of known labeller IDs.

Returns:A list of known labeller IDs.
Return type:List[str]
read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183911944'>[source]

Reads feature vectors from the database.

Parameters:ids – Iterable of IDs.
Returns:N x D array of feature vectors.
Return type:numpy.ndarray
read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183959912'>[source]

Reads label vectors from the database.

Parameters:
  • labeller_ids – Iterable of labeller IDs.
  • instance_ids – Iterable of instance IDs.
Returns:

T x N x 1 array of label vectors.

Return type:

numpy.ndarray

to_proto() → mock.mock.Database[source]

Serialises this database as a protobuf.

Returns:Protobuf representing this database.
Return type:DatabasePB
write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184072864'>)[source]
write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183933600'>)[source]
acton.database.product(seq: typing.Iterable[int])[source]

Finds the product of a list of ints.

Parameters:seq – List of ints.
Returns:Product.
Return type:int
acton.database.serialise_encoder(encoder: <MagicMock name='mock.LabelEncoder' id='140030183037416'>) → mock.mock.LabelEncoder[source]

Serialises a LabelEncoder as a protobuf.

Parameters:encoder – LabelEncoder.
Returns:Protobuf representing the LabelEncoder.
Return type:LabelEncoderPB

acton.kde_predictor module

A predictor that uses KDE to classify instances.

class acton.kde_predictor.KDEClassifier(bandwidth=1.0)[source]

Bases: BaseEstimator, ClassifierMixin

A classifier using kernel density estimation to classify instances.

fit(X, y)[source]

Fits kernel density models to the data.

Parameters:
  • X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.
  • y (array-like, shape (n_samples,)) – Target vector relative to X.
predict(X)[source]

Predicts class labels.

Parameters:X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.
predict_proba(X)[source]

Predicts class probabilities.

Class probabilities are normalised log densities of the kernel density estimates.

Parameters:X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.

acton.labellers module

Labeller classes.

class acton.labellers.ASCIITableLabeller(path: str, id_col: str, label_col: str)[source]

Bases: acton.labellers.Labeller

Labeller that obtains labels from an ASCII table.

path

str – Path to table.

id_col

str – Name of the column where IDs are stored.

label_col

str – Name of the column where binary labels are stored.

_table

astropy.table.Table – Table object.

query(id_: int) → <MagicMock id='140030183575680'>[source]

Queries the labeller.

Parameters:id – ID of instance to label.
Returns:1 x 1 label array.
Return type:numpy.ndarray
class acton.labellers.DatabaseLabeller(db: acton.database.Database)[source]

Bases: acton.labellers.Labeller

Labeller that obtains labels from a Database.

_db

acton.database.Database – Database with labels.

query(id_: int) → <MagicMock id='140030183600368'>[source]

Queries the labeller.

Parameters:id – ID of instance to label.
Returns:1 x 1 label array.
Return type:numpy.ndarray
class acton.labellers.Labeller[source]

Bases: abc.ABC

Base class for labellers.

query(id_: int) → <MagicMock id='140030183577304'>[source]

Queries the labeller.

Parameters:id – ID of instance to label.
Returns:T x F label array.
Return type:numpy.ndarray

acton.plot module

Script to plot a dump of predictions.

acton.plot.plot(predictions: typing.Iterable[typing.BinaryIO])[source]

Plots predictions from a file.

Parameters:predictions – Files containing predictions.

acton.predictors module

Predictor classes.

acton.predictors.AveragePredictions(predictor: acton.predictors.Predictor) → acton.predictors.Predictor[source]

Wrapper for a predictor that averages predicted probabilities.

Notes

This effectively reduces the number of predictors to 1.

Parameters:predictor – Predictor to wrap.
Returns:Predictor with averaged predictions.
Return type:Predictor
class acton.predictors.Committee(Predictor: type, db: acton.database.Database, n_classifiers: int = 10, subset_size: float = 0.6, **kwargs: dict)[source]

Bases: acton.predictors.Predictor

A predictor using a committee of other predictors.

n_classifiers

int – Number of logistic regression classifiers in the committee.

subset_size

float – Percentage of known labels to take subsets of to train the classifier. Lower numbers increase variety.

_db

acton.database.Database – Database storing features and labels.

_committee

List[sklearn.linear_model.LogisticRegression] – Underlying committee of logistic regression classifiers.

_reference_predictor

Predictor – Reference predictor trained on all known labels.

fit(ids: typing.Iterable[int])[source]

Fits the predictor to labelled data.

Parameters:ids – List of IDs of instances to train from.
predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183690872'>, <MagicMock id='140030183318304'>)[source]

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x T x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183316392'>, <MagicMock id='140030183290192'>)[source]

Predicts labels using the best possible method.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x 1 x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
class acton.predictors.GPClassifier(db: acton.database.Database, max_iters: int = 50000, n_jobs: int = 1)[source]

Bases: acton.predictors.Predictor

Classifier using Gaussian processes.

max_iters

int – Maximum optimisation iterations.

label_encoder

sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

model_

gpy.models.GPClassification – GP model.

_db

acton.database.Database – Database storing features and labels.

fit(ids: typing.Iterable[int])[source]

Fits the predictor to labelled data.

Parameters:ids – List of IDs of instances to train from.
predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183344224'>, <MagicMock id='140030183364872'>)[source]

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x 1 x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183373456'>, <MagicMock id='140030183385912'>)[source]

Predicts labels using the best possible method.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x 1 x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
class acton.predictors.Predictor[source]

Bases: abc.ABC

Base class for predictors.

prediction_type

str – What kind of predictions this class generates, e.g. classification.s

fit(ids: typing.Iterable[int])[source]

Fits the predictor to labelled data.

Parameters:ids – List of IDs of instances to train from.
predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183704952'>, <MagicMock id='140030183719432'>)[source]

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x T x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
prediction_type = 'classification'
reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183703216'>, <MagicMock id='140030183708096'>)[source]

Predicts labels using the best possible method.

Parameters:ids – List of IDs of instances to predict labels for.
Returns:
  • numpy.ndarray – An N x 1 x C array of corresponding predictions.
  • numpy.ndarray – A N array of confidences (or None if not applicable).
acton.predictors.from_class(Predictor: type, regression: bool = False) → type[source]

Converts a scikit-learn predictor class into a Predictor class.

Parameters:
  • Predictor – scikit-learn predictor class.
  • regression – Whether this predictor does regression (as opposed to classification).
Returns:

Predictor class wrapping the scikit-learn class.

Return type:

type

acton.predictors.from_instance(predictor: BaseEstimator, db: acton.database.Database, regression: bool = False) → acton.predictors.Predictor[source]

Converts a scikit-learn predictor instance into a Predictor instance.

Parameters:
  • predictor – scikit-learn predictor.
  • db – Database storing features and labels.
  • regression – Whether this predictor does regression (as opposed to classification).
Returns:

Predictor instance wrapping the scikit-learn predictor.

Return type:

Predictor

acton.recommenders module

Recommender classes.

class acton.recommenders.EntropyRecommender(db: acton.database.Database)[source]

Bases: acton.recommenders.Recommender

Recommends instances by confidence-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182493656'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

class acton.recommenders.MarginRecommender(db: acton.database.Database)[source]

Bases: acton.recommenders.Recommender

Recommends instances by margin-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182519080'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

class acton.recommenders.QBCRecommender(db: acton.database.Database)[source]

Bases: acton.recommenders.Recommender

Recommends instances by committee disagreement.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182451128'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x T x C array of predictions. The ith row must correspond with the ith ID in the sequence.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

class acton.recommenders.RandomRecommender(db: acton.database.Database)[source]

Bases: acton.recommenders.Recommender

Recommends instances at random.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182433960'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x T x C array of predictions.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

class acton.recommenders.Recommender[source]

Bases: abc.ABC

Base class for recommenders.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182416792'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x T x C array of predictions.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

class acton.recommenders.UncertaintyRecommender(db: acton.database.Database)[source]

Bases: acton.recommenders.Recommender

Recommends instances by confidence-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182480584'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:
  • ids – Sequence of IDs in the unlabelled data pool.
  • predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence.
  • n – Number of recommendations to make.
  • diversity – Recommendation diversity in [0, 1].
Returns:

IDs of the instances to label.

Return type:

Sequence[int]

acton.recommenders.choose_boltzmann(features: <MagicMock id='140030182379080'>, scores: <MagicMock id='140030182391536'>, n: int, temperature: float = 1.0) → typing.Sequence[int][source]

Chooses n scores using a Boltzmann distribution.

Notes

Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.

Parameters:
  • scores – 1D array of scores.
  • n – Number of scores to choose.
  • temperature – Temperature parameter for sampling. Higher temperatures give more diversity.
Returns:

List of indices of scores chosen.

Return type:

Sequence[int]

acton.recommenders.choose_mmr(features: <MagicMock id='140030182328584'>, scores: <MagicMock id='140030182370496'>, n: int, l: float = 0.5) → typing.Sequence[int][source]

Chooses n scores using maximal marginal relevance.

Notes

Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.

Parameters:
  • scores – 1D array of scores.
  • n – Number of scores to choose.
  • l – Lambda parameter for MMR. l = 1 gives a relevance-ranked list and l = 0 gives a maximal diversity ranking.
Returns:

List of indices of scores chosen.

Return type:

Sequence[int]

Module contents