acton package¶

Subpackages¶

acton.proto package

Submodules¶

acton.acton module¶

Main processing script for Acton.

acton.acton.draw(n: int, lst: typing.List[T], replace: bool = True) → typing.List[T][source]¶

Draws n random elements from a list.

Parameters:	n – Number of elements to draw. lst – List of elements to draw from. replace – Draw with replacement.
Returns:	n random elements.
Return type:	List[T]

acton.acton.get_DB(data_path: str, pandas_key: str = None) -> (<class 'acton.database.Database'>, <class 'dict'>)[source]¶

Gets a Database that will handle the given data table.

Parameters:

data_path – Path to file.
pandas_key – Key for pandas HDF5. Specify iff using pandas.

Returns:

Database – Database that will handle the given data table.
dict – Keyword arguments for the Database constructor.

acton.acton.label(recommendations: acton.proto.wrappers.Recommendations) → acton.proto.wrappers.LabelPool[source]¶

Simulates a labelling task.

Parameters:	data_path – Path to data file. feature_cols – List of column names of features. If empty, all columns will be used. label_col – Column name of the labels. pandas_key – Key for pandas HDF5. Specify iff using pandas.
Returns:
Return type:	acton.proto.wrappers.LabelPool

acton.acton.main(data_path: str, feature_cols: typing.List[str], label_col: str, output_path: str, n_epochs: int = 10, initial_count: int = 10, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', pandas_key: str = '', n_recommendations: int = 1)[source]¶

Simulate an active learning experiment.

Parameters:

data_path – Path to data file.
feature_cols – List of column names of the features. If empty, all non-label and non-ID columns will be used.
label_col – Column name of the labels.
output_path – Path to output file. Will be overwritten.
n_epochs – Number of epochs to run.
initial_count – Number of random instances to label initially.
recommender – Name of recommender to make recommendations.
predictor – Name of predictor to make predictions.
pandas_key – Key for pandas HDF5. Specify iff using pandas.
n_recommendations – Number of recommendations to make at once.

acton.acton.predict(labels: acton.proto.wrappers.LabelPool, predictor: str) → acton.proto.wrappers.Predictions[source]¶

Train a predictor and predict labels.

Parameters:	labels – IDs of labelled instances. predictor – Name of predictor to make predictions.

acton.acton.recommend(predictions: acton.proto.wrappers.Predictions, recommender: str = 'RandomRecommender', n_recommendations: int = 1) → acton.proto.wrappers.Recommendations[source]¶

Recommends instances to label based on predictions.

Parameters:	recommender – Name of recommender to make recommendations. n_recommendations – Number of recommendations to make at once. Default 1.
Returns:
Return type:	acton.proto.wrappers.Recommendations

acton.acton.simulate_active_learning(ids: typing.Iterable[int], db: acton.database.Database, db_kwargs: dict, output_path: str, n_initial_labels: int = 10, n_epochs: int = 10, test_size: int = 0.2, recommender: str = 'RandomRecommender', predictor: str = 'LogisticRegression', n_recommendations: int = 1)[source]¶

Simulates an active learning task.

Parameters:

ids – IDs of instances in the unlabelled pool.
db – Database with features and labels.
db_kwargs – Keyword arguments for the database constructor.
output_path – Path to output intermediate predictions to. Will be overwritten.
n_initial_labels – Number of initial labels to draw.
n_epochs – Number of epochs.
test_size – Percentage size of testing set.
recommender – Name of recommender to make recommendations.
predictor – Name of predictor to make predictions.
n_recommendations – Number of recommendations to make at once.

acton.acton.try_pandas(data_path: str) → bool[source]¶

Guesses if a file is a pandas file.

Parameters:	data_path – Path to file.
Returns:	True if the file is pandas.
Return type:	bool

acton.acton.validate_predictor(predictor: str)[source]¶

Raises an exception if the predictor is not valid.

Parameters:	predictor – Name of predictor.
Raises:	`ValueError`

acton.acton.validate_recommender(recommender: str)[source]¶

Raises an exception if the recommender is not valid.

Parameters:	recommender – Name of recommender.
Raises:	`ValueError`

acton.cli module¶

Command-line interface for Acton.

acton.cli.lines_from_stdin() → typing.Iterable[str][source]¶: Yields lines from stdin.

acton.cli.read_binary() → bytes[source]¶

Reads binary data from stdin.

Notes

The first eight bytes are expected to be the length of the input data as an unsigned long long.

Returns:	Binary data.
Return type:	bytes

acton.cli.read_bytes_from_buffer(n: int, buffer: typing.BinaryIO) → bytes[source]¶

Reads n bytes from stdin, blocking until all bytes are received.

Parameters:	n – How many bytes to read. buffer – Which buffer to read from.
Returns:	Exactly n bytes.
Return type:	bytes

acton.cli.write_binary(string: bytes)[source]¶

Writes binary data to stdout.

Notes

The output will be preceded by the length as an unsigned long long.

acton.database module¶

Wrapper class for databases.

class acton.database.ASCIIReader(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184103664'> = None)[source]¶

Bases: acton.database.Database

Reads ASCII databases.

feature_cols¶: List[str] – List of feature columns.

label_col¶: str – Name of label column.

max_id_length¶: int – Maximum length of IDs.

n_features¶: int – Number of features.

n_instances¶: int – Number of instances.

n_labels¶: int – Number of labels per instance.

path¶: str – Path to ASCII file.

encode_labels¶: bool – Whether to encode labels as integers.

label_encoder¶: sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_db¶: Database – Underlying ManagedHDF5Database.

_db_filepath¶: str – Path of underlying HDF5 database.

_tempdir¶: str – Temporary directory where the underlying HDF5 database is stored.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030184137672'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184158768'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x F array of label vectors.
Return type:	numpy.ndarray

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184175544'>)[source]¶

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030184196640'>)[source]¶

class acton.database.Database[source]¶

Bases: abc.ABC

Base class for database wrappers.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183041904'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183058904'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x F array of label vectors.
Return type:	numpy.ndarray

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183067488'>)[source]¶

Writes feature vectors to the database.

Parameters:	ids – Iterable of IDs. features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183100936'>)[source]¶

Writes label vectors to the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs. labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.

class acton.database.FITSReader(path: str, feature_cols: typing.List[str], label_col: str, hdu_index: int = 1, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183896176'> = None)[source]¶

Bases: acton.database.Database

Reads FITS databases.

hdu_index¶: int – Index of HDU in the FITS file.

feature_cols¶: List[str] – List of feature columns.

label_col¶: str – Name of label column.

n_features¶: int – Number of features.

n_instances¶: int – Number of instances.

n_labels¶: int – Number of labels per instance.

path¶: str – Path to FITS file.

encode_labels¶: bool – Whether to encode labels as integers.

label_encoder¶: sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_hdulist¶: astropy.io.fits.HDUList – FITS HDUList.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183983368'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030184065120'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x 1 array of label vectors.
Return type:	numpy.p

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184245736'>)[source]¶

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030184250448'>)[source]¶

class acton.database.HDF5Database(path: str)[source]¶

Bases: acton.database.Database

Database wrapping an HDF5 file as a context manager.

path¶: str – Path to HDF5 file.

_h5_file¶: h5py.File – HDF5 file object.

class acton.database.HDF5Reader(path: str, feature_cols: typing.List[str], label_col: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030183223928'> = None)[source]¶

Bases: acton.database.HDF5Database

Reads HDF5 databases.

feature_cols¶: List[str] – List of feature datasets.

label_col¶: str – Name of label dataset.

n_features¶: int – Number of features.

n_instances¶: int – Number of instances.

n_labels¶: int – Number of labels per instance.

path¶: str – Path to HDF5 file.

encode_labels¶: bool – Whether to encode labels as integers.

label_encoder¶: sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_h5_file¶: h5py.File – HDF5 file object.

_is_multidimensional¶: bool – Whether the features are in a multidimensional dataset.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183232512'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183245416'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x F array of label vectors.
Return type:	numpy.ndarray

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183262192'>)[source]¶

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183279192'>)[source]¶

class acton.database.ManagedHDF5Database(path: str, label_dtype: str = None, feature_dtype: str = None)[source]¶

Bases: acton.database.HDF5Database

Database using an HDF5 file.

Notes

This database uses an internal schema. For reading files from disk, use another Database.

path¶: str – Path to HDF5 file.

label_dtype¶: str – Data type of labels.

feature_dtype¶: str – Data type of features.

_h5_file¶: h5py.File – Opened HDF5 file.

_sync_attrs¶: List[str] – List of instance attributes to sync with the HDF5 file’s attributes.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183131176'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183161080'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x F array of label vectors.
Return type:	numpy.ndarray

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030183110304'>)[source]¶

Writes feature vectors to the database.

Parameters:	ids – Iterable of IDs. features – N x D array of feature vectors. The ith row corresponds to the ith ID in ids.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183144080'>)[source]¶

Writes label vectors to the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs. labels – T x N x D array of label vectors. The ith row corresponds to the ith labeller ID in labeller_ids and the jth column corresponds to the jth instance ID in instance_ids.

class acton.database.PandasReader(path: str, feature_cols: typing.List[str], label_col: str, key: str, encode_labels: bool = True, label_encoder: <MagicMock name='mock.LabelEncoder' id='140030184226488'> = None)[source]¶

Bases: acton.database.Database

Reads HDF5 databases.

feature_cols¶: List[str] – List of feature datasets.

label_col¶: str – Name of label dataset.

n_features¶: int – Number of features.

n_instances¶: int – Number of instances.

n_labels¶: int – Number of labels per instance.

path¶: str – Path to HDF5 file.

encode_labels¶: bool – Whether to encode labels as integers.

label_encoder¶: sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

_df¶: pandas.DataFrame – Pandas dataframe.

get_known_instance_ids() → typing.List[int][source]¶

Returns a list of known instance IDs.

Returns:	A list of known instance IDs.
Return type:	List[str]

get_known_labeller_ids() → typing.List[int][source]¶

Returns a list of known labeller IDs.

Returns:	A list of known labeller IDs.
Return type:	List[str]

read_features(ids: typing.Sequence[int]) → <MagicMock id='140030183911944'>[source]¶

Reads feature vectors from the database.

Parameters:	ids – Iterable of IDs.
Returns:	N x D array of feature vectors.
Return type:	numpy.ndarray

read_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int]) → <MagicMock id='140030183959912'>[source]¶

Reads label vectors from the database.

Parameters:	labeller_ids – Iterable of labeller IDs. instance_ids – Iterable of instance IDs.
Returns:	T x N x 1 array of label vectors.
Return type:	numpy.ndarray

to_proto() → mock.mock.Database[source]¶

Serialises this database as a protobuf.

Returns:	Protobuf representing this database.
Return type:	DatabasePB

write_features(ids: typing.Sequence[int], features: <MagicMock id='140030184072864'>)[source]¶

write_labels(labeller_ids: typing.Sequence[int], instance_ids: typing.Sequence[int], labels: <MagicMock id='140030183933600'>)[source]¶

acton.database.product(seq: typing.Iterable[int])[source]¶

Finds the product of a list of ints.

Parameters:	seq – List of ints.
Returns:	Product.
Return type:	int

acton.database.serialise_encoder(encoder: <MagicMock name='mock.LabelEncoder' id='140030183037416'>) → mock.mock.LabelEncoder[source]¶

Serialises a LabelEncoder as a protobuf.

Parameters:	encoder – LabelEncoder.
Returns:	Protobuf representing the LabelEncoder.
Return type:	LabelEncoderPB

acton.kde_predictor module¶

A predictor that uses KDE to classify instances.

class acton.kde_predictor.KDEClassifier(bandwidth=1.0)[source]¶

Bases: BaseEstimator, ClassifierMixin

A classifier using kernel density estimation to classify instances.

fit(X, y)[source]¶

Fits kernel density models to the data.

Parameters:	X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point. y (array-like, shape (n_samples,)) – Target vector relative to X.

predict(X)[source]¶

Predicts class labels.

Parameters:	X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.

predict_proba(X)[source]¶

Predicts class probabilities.

Class probabilities are normalised log densities of the kernel density estimates.

Parameters:	X (array_like, shape (n_samples, n_features)) – List of n_features-dimensional data points. Each row corresponds to a single data point.

acton.labellers module¶

Labeller classes.

class acton.labellers.ASCIITableLabeller(path: str, id_col: str, label_col: str)[source]¶

Bases: acton.labellers.Labeller

Labeller that obtains labels from an ASCII table.

path¶: str – Path to table.

id_col¶: str – Name of the column where IDs are stored.

label_col¶: str – Name of the column where binary labels are stored.

_table¶: astropy.table.Table – Table object.

query(id_: int) → <MagicMock id='140030183575680'>[source]¶

Queries the labeller.

Parameters:	id – ID of instance to label.
Returns:	1 x 1 label array.
Return type:	numpy.ndarray

class acton.labellers.DatabaseLabeller(db: acton.database.Database)[source]¶

Bases: acton.labellers.Labeller

Labeller that obtains labels from a Database.

_db¶: acton.database.Database – Database with labels.

query(id_: int) → <MagicMock id='140030183600368'>[source]¶

Queries the labeller.

Parameters:	id – ID of instance to label.
Returns:	1 x 1 label array.
Return type:	numpy.ndarray

class acton.labellers.Labeller[source]¶

Bases: abc.ABC

Base class for labellers.

query(id_: int) → <MagicMock id='140030183577304'>[source]¶

Queries the labeller.

Parameters:	id – ID of instance to label.
Returns:	T x F label array.
Return type:	numpy.ndarray

acton.plot module¶

Script to plot a dump of predictions.

acton.plot.plot(predictions: typing.Iterable[typing.BinaryIO])[source]¶

Plots predictions from a file.

Parameters:	predictions – Files containing predictions.

acton.predictors module¶

Predictor classes.

acton.predictors.AveragePredictions(predictor: acton.predictors.Predictor) → acton.predictors.Predictor[source]¶

Wrapper for a predictor that averages predicted probabilities.

Notes

This effectively reduces the number of predictors to 1.

Parameters:	predictor – Predictor to wrap.
Returns:	Predictor with averaged predictions.
Return type:	Predictor

class acton.predictors.Committee(Predictor: type, db: acton.database.Database, n_classifiers: int = 10, subset_size: float = 0.6, **kwargs: dict)[source]¶

Bases: acton.predictors.Predictor

A predictor using a committee of other predictors.

n_classifiers¶: int – Number of logistic regression classifiers in the committee.

subset_size¶: float – Percentage of known labels to take subsets of to train the classifier. Lower numbers increase variety.

_db¶: acton.database.Database – Database storing features and labels.

_committee¶: List[sklearn.linear_model.LogisticRegression] – Underlying committee of logistic regression classifiers.

_reference_predictor¶: Predictor – Reference predictor trained on all known labels.

fit(ids: typing.Iterable[int])[source]¶

Fits the predictor to labelled data.

Parameters:	ids – List of IDs of instances to train from.

predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183690872'>, <MagicMock id='140030183318304'>)[source]¶

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x T x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183316392'>, <MagicMock id='140030183290192'>)[source]¶

Predicts labels using the best possible method.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x 1 x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

class acton.predictors.GPClassifier(db: acton.database.Database, max_iters: int = 50000, n_jobs: int = 1)[source]¶

Bases: acton.predictors.Predictor

Classifier using Gaussian processes.

max_iters¶: int – Maximum optimisation iterations.

label_encoder¶: sklearn.preprocessing.LabelEncoder – Encodes labels as integers.

model_¶: gpy.models.GPClassification – GP model.

_db¶: acton.database.Database – Database storing features and labels.

fit(ids: typing.Iterable[int])[source]¶

Fits the predictor to labelled data.

Parameters:	ids – List of IDs of instances to train from.

predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183344224'>, <MagicMock id='140030183364872'>)[source]¶

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x 1 x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183373456'>, <MagicMock id='140030183385912'>)[source]¶

Predicts labels using the best possible method.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x 1 x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

class acton.predictors.Predictor[source]¶

Bases: abc.ABC

Base class for predictors.

prediction_type¶: str – What kind of predictions this class generates, e.g. classification.s

fit(ids: typing.Iterable[int])[source]¶

Fits the predictor to labelled data.

Parameters:	ids – List of IDs of instances to train from.

predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183704952'>, <MagicMock id='140030183719432'>)[source]¶

Predicts labels of instances.

Notes

Unlike in scikit-learn, predictions are always real-valued. Predicted labels for a classification problem are represented by predicted probabilities of each class.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x T x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

prediction_type = 'classification'

reference_predict(ids: typing.Sequence[int]) -> (<MagicMock id='140030183703216'>, <MagicMock id='140030183708096'>)[source]¶

Predicts labels using the best possible method.

Parameters:	ids – List of IDs of instances to predict labels for.
Returns:	numpy.ndarray – An N x 1 x C array of corresponding predictions. numpy.ndarray – A N array of confidences (or None if not applicable).

acton.predictors.from_class(Predictor: type, regression: bool = False) → type[source]¶

Converts a scikit-learn predictor class into a Predictor class.

Parameters:	Predictor – scikit-learn predictor class. regression – Whether this predictor does regression (as opposed to classification).
Returns:	Predictor class wrapping the scikit-learn class.
Return type:	type

acton.predictors.from_instance(predictor: BaseEstimator, db: acton.database.Database, regression: bool = False) → acton.predictors.Predictor[source]¶

Converts a scikit-learn predictor instance into a Predictor instance.

Parameters:	predictor – scikit-learn predictor. db – Database storing features and labels. regression – Whether this predictor does regression (as opposed to classification).
Returns:	Predictor instance wrapping the scikit-learn predictor.
Return type:	Predictor

acton.recommenders module¶

Recommender classes.

class acton.recommenders.EntropyRecommender(db: acton.database.Database)[source]¶

Bases: acton.recommenders.Recommender

Recommends instances by confidence-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182493656'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

class acton.recommenders.MarginRecommender(db: acton.database.Database)[source]¶

Bases: acton.recommenders.Recommender

Recommends instances by margin-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182519080'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

class acton.recommenders.QBCRecommender(db: acton.database.Database)[source]¶

Bases: acton.recommenders.Recommender

Recommends instances by committee disagreement.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182451128'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x T x C array of predictions. The ith row must correspond with the ith ID in the sequence. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

class acton.recommenders.RandomRecommender(db: acton.database.Database)[source]¶

Bases: acton.recommenders.Recommender

Recommends instances at random.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182433960'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x T x C array of predictions. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

class acton.recommenders.Recommender[source]¶

Bases: abc.ABC

Base class for recommenders.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182416792'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x T x C array of predictions. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

class acton.recommenders.UncertaintyRecommender(db: acton.database.Database)[source]¶

Bases: acton.recommenders.Recommender

Recommends instances by confidence-based uncertainty sampling.

recommend(ids: typing.Sequence[int], predictions: <MagicMock id='140030182480584'>, n: int = 1, diversity: float = 0.5) → typing.Sequence[int][source]¶

Recommends an instance to label.

Notes

Assumes predictions are probabilities of positive binary label.

Parameters:	ids – Sequence of IDs in the unlabelled data pool. predictions – N x 1 x C array of predictions. The ith row must correspond with the ith ID in the sequence. n – Number of recommendations to make. diversity – Recommendation diversity in [0, 1].
Returns:	IDs of the instances to label.
Return type:	Sequence[int]

acton.recommenders.choose_boltzmann(features: <MagicMock id='140030182379080'>, scores: <MagicMock id='140030182391536'>, n: int, temperature: float = 1.0) → typing.Sequence[int][source]¶

Chooses n scores using a Boltzmann distribution.

Notes

Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.

Parameters:	scores – 1D array of scores. n – Number of scores to choose. temperature – Temperature parameter for sampling. Higher temperatures give more diversity.
Returns:	List of indices of scores chosen.
Return type:	Sequence[int]

acton.recommenders.choose_mmr(features: <MagicMock id='140030182328584'>, scores: <MagicMock id='140030182370496'>, n: int, l: float = 0.5) → typing.Sequence[int][source]¶

Chooses n scores using maximal marginal relevance.

Notes

Scores are chosen from highest to lowest. If there are less scores to choose from than requested, all scores will be returned in order of preference.

Parameters:	scores – 1D array of scores. n – Number of scores to choose. l – Lambda parameter for MMR. l = 1 gives a relevance-ranked list and l = 0 gives a maximal diversity ranking.
Returns:	List of indices of scores chosen.
Return type:	Sequence[int]

acton package¶

Subpackages¶

Submodules¶

acton.acton module¶

acton.cli module¶

acton.database module¶

acton.kde_predictor module¶

acton.labellers module¶

acton.plot module¶

acton.predictors module¶

acton.recommenders module¶

Module contents¶