How to train a model to predict curation labels
A full tutorial for model-based curation can be found here.
Here, we assume that you have:
Two SortingAnalyzers called
analyzer_1
andanalyzer_2
, and have calculated some template and quality metrics for bothManually curated labels for the units in each analyzer, in lists called
analyzer_1_labels
andanalyzer_2_labels
. If you have used phy, the lists can be accessed usingcurated_labels = analyzer.sorting.get_property("quality")
.
With these objects calculated, you can train a model as follows
from spikeinterface.curation import train_model
analyzer_list = [analyzer_1, analyzer_2]
labels_list = [analyzer_1_labels, analyzer_2_labels]
output_folder = "/path/to/output_folder"
trainer = train_model(
mode="analyzers",
labels=labels_list,
analyzers=analyzer_list,
output_folder=output_folder,
metric_names=None, # Set if you want to use a subset of metrics, defaults to all calculated quality and template metrics
imputation_strategies=None, # Default is all available imputation strategies
scaling_techniques=None, # Default is all available scaling techniques
classifiers=None, # Defaults to Random Forest classifier only - we usually find this gives the best results, but a range of classifiers is available
seed=None, # Set a seed for reproducibility
)
The trainer tries several models and chooses the most accurate one. This model and
some metadata are stored in the output_folder
, which can later be loaded using the
load_model
function (more details).
We can also access the model, which is an sklearn Pipeline
, from the trainer object
best_model = trainer.best_pipeline
The training function can also be run in “csv” mode, if you prefer to
store metrics in as .csv files. If the target labels are stored as a column in
the file, you can point to these with the target_label
parameter
trainer = train_model(
mode="csv",
metrics_paths = ["/path/to/csv_file_1", "/path/to/csv_file_2"],
target_label = "my_label",
output_folder=output_folder,
)