API Reference

This page provides detailed documentation for the SPURS package. SPURS offers APIs for protein stability prediction and functional site identification.

Getting Started

Model Loading and Inference

spurs.inference.get_SPURS(ckpt_path: str, device: str = 'cpu')[source]

Load a SPURS model from a local checkpoint file.

This function loads a SPURS model and its configuration from a local checkpoint directory. The checkpoint should contain both the model weights and the configuration file.

Parameters
  • ckpt_path (str) – Path to the checkpoint directory containing .hydra/config.yaml and checkpoints/best.ckpt

  • device (str, optional) – Device to load the model on. Defaults to ‘cuda’ if available, else ‘cpu’.

Returns

A tuple containing: - model (torch.nn.Module): The loaded SPURS model - cfg (OmegaConf): The model configuration

Return type

tuple

Example

>>> model, cfg = get_SPURS("path/to/checkpoint")
spurs.inference.get_SPURS_from_hub(repo_id: str = 'cyclization9/SPURS', device: str = 'cpu')[source]

Load a pre-trained SPURS model directly from the Hugging Face model hub.

This function downloads and loads a pre-trained SPURS model and its configuration from the Hugging Face model hub. This is the recommended way to get started with SPURS.

Parameters
  • repo_id (str, optional) – Hugging Face model repository ID. Defaults to “cyclization9/SPURS”.

  • device (str, optional) – Device to load the model on. Defaults to ‘cuda’ if available, else ‘cpu’.

Returns

A tuple containing:
  • model (torch.nn.Module): The loaded SPURS model

  • cfg (OmegaConf): The model configuration

Return type

tuple

Example

>>> model, cfg = get_SPURS_from_hub()
spurs.inference.get_SPURS_multi_from_hub(repo_id: str = 'cyclization9/SPURS', device: str = 'cpu')[source]

Load a pre-trained SPURS multi-mutation model from the Hugging Face model hub.

This function downloads and loads a pre-trained SPURS model specialized for predicting the effects of multiple mutations simultaneously.

Parameters
  • repo_id (str, optional) – Hugging Face model repository ID. Defaults to “cyclization9/SPURS”.

  • device (str, optional) – Device to load the model on. Defaults to ‘cuda’ if available, else ‘cpu’.

Returns

A tuple containing:
  • model (torch.nn.Module): The loaded SPURS multi-mutation model

  • cfg (OmegaConf): The model configuration

Return type

tuple

Example

>>> model, cfg = get_SPURS_multi_from_hub()
spurs.inference.parse_pdb(pdb_path: str, pdb_name: str, chain: str, cfg, device: str = 'cpu')[source]

Parse a PDB file and prepare it for SPURS model input.

This function processes a PDB file and converts it into the format required by SPURS models. It handles coordinate extraction, feature generation, and device placement.

Parameters
  • pdb_path (str) – Path to the PDB file

  • pdb_name (str) – Name of the protein

  • chain (str) – Chain identifier to analyze

  • cfg – Model configuration containing alphabet settings

  • device (str, optional) – Device to place the tensors on. Defaults to ‘cuda’ if available, else ‘cpu’.

Returns

A dictionary containing processed PDB data including:
  • coordinates

  • sequence information

  • features required by the model

All tensors are placed on the specified device.

Return type

dict

spurs.inference.parse_pdb_for_mutation(mut_info_list)[source]

Parse mutation information into tensor format required by model.

This function converts a list of mutation specifications into the tensor format required by SPURS models for mutation analysis.

Parameters

mut_info_list (List[List[str]]) – List of lists containing mutation strings. Each inner list represents a set of mutations to analyze together. Format: [[‘V2C’,’P3T’], [‘W1A’,’V2Y’]] where each mutation string is formatted as ‘OriginalAAPositionNewAA’.

Returns

A tuple containing:
  • mut_ids (torch.Tensor): Tensor of mutation positions (0-indexed)

  • append_tensors (torch.Tensor): Tensor of amino acid indices for wild-type and mutant residues

Return type

tuple

Example

>>> mut_ids, append_tensors = parse_pdb_for_mutation([['V2C', 'P3T']])
>>> print(mut_ids)  # Shows positions 1,2 (0-indexed)
>>> print(append_tensors)  # Shows amino acid indices for V->C, P->T

Basic usage:

from spurs.inference import get_SPURS_from_hub, parse_pdb

# Load pre-trained model
model, cfg = get_SPURS_from_hub()

# Parse PDB file
pdb = parse_pdb(
    pdb_path='path/to/protein.pdb',
    pdb_name='PROTEIN_NAME',
    chain='A',
    cfg=cfg
)

# Make predictions
ddg = model(pdb, return_logist=True)

Functional Site Identification

This module implements functionality for identifying and analyzing functional sites in proteins using SPURS model predictions and sigmoid-based regression analysis. It provides tools for normalizing mutation effect scores, fitting sigmoid functions to the data, and visualizing the results to identify functionally important positions in proteins.

spurs.functional_site_annotation.normalize(data, lower=- 4.36, upper=1.21, new_min=0, new_max=1)[source]

Normalize mutation effect scores to a specified range.

Parameters
  • data – Raw mutation effect scores

  • lower – Lower bound for clipping (default: -4.36, 0.1 percentile of ddG values)

  • upper – Upper bound for clipping (default: 1.21, 99.9 percentile of ddG values)

  • new_min – Minimum value in normalized range (default: 0)

  • new_max – Maximum value in normalized range (default: 1)

Returns

Normalized and clipped data scaled to [new_min, new_max]

spurs.functional_site_annotation.sigmoid_function(x, xmid, scal)[source]

Compute sigmoid function values for given parameters.

Parameters
  • x – Input values

  • xmid – Midpoint parameter of the sigmoid

  • scal – Scale parameter controlling steepness

Returns

Sigmoid function values

class spurs.functional_site_annotation.SigmoidRegressor(*args: Any, **kwargs: Any)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.RegressorMixin

Custom scikit-learn compatible estimator for fitting sigmoid functions to mutation effect data. Used to identify functional sites by finding positions where mutations have non-linear effects.

The sigmoid function is fit using weighted least squares optimization with bounds derived from empirical protein stability data.

__init__(xmid_initial=0.0, scal_initial=1.0)[source]

Initialize sigmoid regressor with starting parameters.

Parameters
  • xmid_initial – Initial guess for sigmoid midpoint

  • scal_initial – Initial guess for sigmoid scale factor

fit(X, y, sample_weight=None)[source]

Fit sigmoid function to data using weighted optimization.

Parameters
  • X – Input features

  • y – Target values

  • sample_weight – Sample weights for weighted least squares

Returns

Fitted estimator

Return type

self

predict(X)[source]

Predict using fitted sigmoid function.

Parameters

X – Input features

Returns

Predicted values

spurs.functional_site_annotation.get_sigmoid_results(mask_results, ddg)[source]

Fit sigmoid function to mutation effect data for functional site identification.

Parameters
  • mask_results – Mutation effect predictions from ESM

  • ddg – Predicted stability measurements

Returns

(normalized_ddg, mask_results, None, sigmoid_parameters)

Return type

tuple

spurs.functional_site_annotation.inference_wt_seq(sequence: str, indices: list, batch_converter, model, device: torch.device, alphabet)[source]

Perform inference on wild-type sequence using the ESM model.

Parameters
  • sequence – Protein sequence to analyze

  • indices – List of positions to analyze

  • batch_converter – ESM batch converter

  • model – ESM model

  • device – Computation device

  • alphabet – ESM alphabet for token mapping

Returns

Logits for each position and possible amino acid substitution

spurs.functional_site_annotation.get_wt_aa_logit_differences(sequence: str, mut_indices: list, batch_converter, model, device: torch.device, alphabet, shift: int = 1)[source]

Calculate differences between wild-type and mutant amino acid logits.

This function helps identify positions where mutations would have the strongest effect by comparing model predictions for all possible amino acid substitutions against the wild-type amino acid at each position.

Parameters
  • sequence – Protein sequence

  • mut_indices – Positions to analyze

  • batch_converter – ESM batch converter

  • model – ESM model

  • device – Computation device

  • alphabet – ESM alphabet

  • shift – Index adjustment (default: 1)

Returns

Tensor of logit differences between wild-type and all possible mutations

spurs.functional_site_annotation.plot_sigmoid_results(result, shift=1, vcenter=0, highlight_positions=[])[source]

Visualize functional site prediction results using a scatter plot.

Creates a plot showing normalized mutation effects across protein positions, with optional highlighting of specific positions of interest. The color scheme uses a diverging colormap centered at vcenter.

Parameters
  • result – Tuple containing (normalized_ddg, mask_results, None, sigmoid_parameters)

  • shift – Position numbering offset (default: 1)

  • vcenter – Center value for color normalization (default: 0)

  • highlight_positions – List of positions to highlight with markers

Returns

Z-score normalized mutation effects

Return type

normalized_data

Basic usage:

from spurs.functional_site_annotation import get_wt_aa_logit_differences
import torch

# Setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
sequence = "YOUR_PROTEIN_SEQUENCE"
mut_indices = list(range(1, len(sequence) + 1))

# Get predictions
results = get_wt_aa_logit_differences(
    sequence,
    mut_indices,
    batch_converter,
    model,
    device,
    alphabet
)

Model Architecture

SPURS Models

class spurs.models.stability.spurs.SPURSConfig(encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False), adapter_layer_indices: List = <factory>, separate_loss: bool = True, name: str = 'esm2_t33_650M_UR50D', dropout: float = 0.1, mlp: spurs.models.stability.mlp.MLPConfig = MLPConfig(num_layers=3, input_dim=2602, hidden_dim=512, output_dim=1, dropout=0.1, ckpt_path='', append_tensors=True, flat_dim=-1))[source]

Bases: object

encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False)
adapter_layer_indices: List
separate_loss: bool = True
name: str = 'esm2_t33_650M_UR50D'
dropout: float = 0.1
mlp: spurs.models.stability.mlp.MLPConfig = MLPConfig(num_layers=3, input_dim=2602, hidden_dim=512, output_dim=1, dropout=0.1, ckpt_path='', append_tensors=True, flat_dim=-1)
__init__(encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False), adapter_layer_indices: List = <factory>, separate_loss: bool = True, name: str = 'esm2_t33_650M_UR50D', dropout: float = 0.1, mlp: spurs.models.stability.mlp.MLPConfig = MLPConfig(num_layers=3, input_dim=2602, hidden_dim=512, output_dim=1, dropout=0.1, ckpt_path='', append_tensors=True, flat_dim=-1)) None
class spurs.models.stability.spurs.SPURS(cfg)[source]

Bases: spurs.models.stability.basemodel.BaseModel

SPURS (Structure-based Protein Understanding and Recognition System) model for protein stability prediction.

This model combines protein structure information (from ProteinMPNN) and sequence information (from ESM2) to predict protein stability changes. The architecture consists of three main components:

  1. Encoder (ProteinMPNN): Processes protein structure information

  2. Decoder (ESM2): Processes sequence information with structural prior

  3. MLP: Final stability prediction layer

The model uses a structural adapter to effectively combine structural and sequence information, allowing for more accurate stability predictions.

Parameters

cfg (SPURSConfig) – Configuration object containing model parameters - encoder: ProteinMPNN configuration - adapter_layer_indices: List of ESM2 layer indices to adapt - name: ESM2 model name - dropout: Dropout rate - mlp: MLP configuration

__init__(cfg) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(batch, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_encoder(batch)[source]

Forward pass through the encoder (ProteinMPNN) component of the SPURS model.

This function processes the input protein structure data (X, S, mask, chain_M, residue_idx, chain_encoding_all, randn_1) and returns the encoded features from the ProteinMPNN encoder.

Parameters

batch (dict) – Input batch containing protein structure data - X: Protein structure coordinates - S: Protein structure mask - mask: Mask indicating valid positions - chain_M: Chain mask - residue_idx: Residue indices - chain_encoding_all: Chain encoding

Returns

Encoded features from the ProteinMPNN encoder

Return type

torch.Tensor

training: bool

ProteinMPNN Model

class spurs.models.stability.mpnn.MPNN(cfg)[source]

Bases: spurs.models.stability.basemodel.BaseModel

ProteinMPNN-based model for protein stability prediction.

This model uses the ProteinMPNN architecture to process protein structure information and predict stability changes. Key features:

  1. Structure-aware message passing neural network

  2. Support for both fixed and trainable encoders

  3. MLP-based regression head for stability predictions

Parameters

cfg (MPNN) – Configuration object containing: - encoder: ProteinMPNN configuration - name: Model name - dropout: Dropout rate - mlp: MLP configuration

__init__(cfg) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

training: bool
forward(batch, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_encoder(batch)[source]

ESM Model

class spurs.models.stability.esm.ESM(cfg)[source]

Bases: spurs.models.stability.basemodel.BaseModel

ESM-based model for protein stability prediction.

This model leverages the ESM2 language model to understand protein sequences and predict stability changes. Features include:

  1. Pre-trained ESM2 language model for sequence encoding

  2. MLP-based regression head for stability predictions

Parameters

cfg (ESM) – Configuration object containing: - name: ESM2 model name (e.g. ‘esm2_t33_650M_UR50D’) - dropout: Dropout rate - mlp: MLP configuration

__init__(cfg) None[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

training: bool
forward(batch, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_encoder(batch)[source]

Transfer Model

class spurs.models.stability.org_transfer_model.TransferModelConfig(encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False))[source]

Bases: object

encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False)
__init__(encoder: spurs.models.stability.protein_mpnn.ProteinMPNNConfig = ProteinMPNNConfig(d_model=128, d_node_feats=128, d_edge_feats=128, k_neighbors=48, augment_eps=0.0, n_enc_layers=3, dropout=0.1, tune=False, use_input_decoding_order=False, n_vocab=22, n_dec_layers=3, random_decoding_order=True, nar=True, crf=False, use_esm_alphabet=False)) None
class spurs.models.stability.org_transfer_model.TransferModel(cfg)[source]

Bases: spurs.models.stability.basemodel.BaseModel

ThermoMPNN https://github.com/Kuhlman-Lab/ThermoMPNN

This model combines ProteinMPNN’s structure embeddings with a light attention mechanism and MLP layers for stability prediction. Features include:

  1. Pre-trained ProteinMPNN encoder for structure understanding

  2. Light attention mechanism for sequence context

  3. MLP layers for final prediction

  4. Support for mutation effect prediction

Parameters

cfg (TransferModelConfig) – Configuration object containing: - encoder: ProteinMPNN configuration

__init__(cfg)[source]

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(batch, tied_feat=True)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

Data Modules

class spurs.datamodules.datasets.megascale.MegaScaleDataset(reduce: str = '', split: str = 'train', single_mut: bool = False, mut_seq: bool = False, std_ratio: float = 0.75, loss_ratio: float = 1.0, train_ratio: float = 0.05)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset class for handling protein stability data from Tsuboyama et al. 2023.

This dataset contains a large collection of protein stability measurements, including single-site mutations and their effects on protein stability (ddG values). The dataset supports train/val/test splits and includes structure information from AlphaFold.

Parameters
  • reduce (str, optional) – Reduction strategy for the dataset. Defaults to ‘’.

  • split (str, optional) – Dataset split (‘train’, ‘val’, ‘test’). Defaults to ‘train’.

  • single_mut (bool, optional) – Whether to handle only single mutations. Defaults to False.

  • mut_seq (bool, optional) – Whether to include mutated sequences. Defaults to False.

  • std_ratio (float, optional) – Will be removed in the future

  • loss_ratio (float, optional) – Will be removed in the future

  • train_ratio (float, optional) – Will be removed in the future

__init__(reduce: str = '', split: str = 'train', single_mut: bool = False, mut_seq: bool = False, std_ratio: float = 0.75, loss_ratio: float = 1.0, train_ratio: float = 0.05)[source]
cal_index2mt(index)[source]
collect_func(batch)[source]
class spurs.datamodules.datasets.megascale.Featurizer(alphabet: spurs.datamodules.datasets.data_utils.Alphabet, to_pifold_format=False, coord_nan_to_zero=True, atoms=('N', 'CA', 'C', 'O'), single_mut=False, mut_seq=False)[source]

Bases: object

__init__(alphabet: spurs.datamodules.datasets.data_utils.Alphabet, to_pifold_format=False, coord_nan_to_zero=True, atoms=('N', 'CA', 'C', 'O'), single_mut=False, mut_seq=False)[source]
class spurs.datamodules.datasets.megascale.MegaScaleTestDatasets[source]

Bases: torch.utils.data.dataset.Dataset

A comprehensive test dataset combining multiple protein stability benchmarks.

This dataset aggregates various test sets including: - MegaScale test set - FireProt homologue-free set - SSYM direct and inverse sets - S669 dataset - ddg datasets (S461, S783, S8754, S2648) - dtm datasets (S571, S4346)

__init__()[source]
class spurs.datamodules.datasets.megascale_multi.MegaScaleDoubleDataset(reduce: str = '', split: str = 'train')[source]

Bases: torch.utils.data.dataset.Dataset

A dataset class for handling double mutations from the MegaScale dataset.

This dataset specifically handles double mutations from Tsuboyama et al. 2023, where each data point contains two mutations and their combined effect on protein stability. The dataset supports train/val/test splits and includes structure information from AlphaFold.

Parameters
  • reduce (str, optional) – Reduction strategy for the dataset. Defaults to ‘’.

  • split (str, optional) – Dataset split (‘train’, ‘val’, ‘test’). Defaults to ‘train’.

__init__(reduce: str = '', split: str = 'train')[source]
class spurs.datamodules.datasets.megascale_multi.Featurizer(alphabet: spurs.datamodules.datasets.data_utils.Alphabet, to_pifold_format=False, coord_nan_to_zero=True, atoms=('N', 'CA', 'C', 'O'), single_mut=False, mut_seq=False)[source]

Bases: object

__init__(alphabet: spurs.datamodules.datasets.data_utils.Alphabet, to_pifold_format=False, coord_nan_to_zero=True, atoms=('N', 'CA', 'C', 'O'), single_mut=False, mut_seq=False)[source]
class spurs.datamodules.datasets.domainome.domainome(pdb_dir, csv_fname, dataset_name, stage='full', mut_seq=False, train_size=1)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset class for handling domain-level protein stability data.

This dataset provides domain-specific stability measurements and is used for analyzing and predicting stability changes at the protein domain level. It includes domain-level annotations and corresponding stability measurements.

Parameters
  • pdb_dir (str) – Directory containing PDB structure files.

  • csv_fname (str) – Path to the CSV file containing domain mutation data.

  • dataset_name (str) – Name of the dataset.

  • stage (str, optional) – Dataset stage (‘full’, ‘train’, ‘test’). Defaults to ‘full’.

  • mut_seq (bool, optional) – Whether to include mutated sequences. Defaults to False.

  • train_size (float, optional) – Training data size ratio. Defaults to 1.

__init__(pdb_dir, csv_fname, dataset_name, stage='full', mut_seq=False, train_size=1)[source]
class spurs.datamodules.datasets.ddgbench.ddgBenchDataset(pdb_dir, csv_fname, dataset_name)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset class for handling standard protein stability benchmark datasets.

This class is used for loading and processing several benchmark datasets: - SSYM-dir: Direct SSYM dataset for stability measurements - SSYM-inv: Inverse SSYM dataset for stability measurements - S669: A curated set of 669 stability measurements

Parameters
  • pdb_dir (str) – Directory containing PDB structure files.

  • csv_fname (str) – Path to the CSV file containing mutation data.

  • dataset_name (str) – Name of the dataset (‘ssym_dir’, ‘ssym_inv’, or ‘S669’).

__init__(pdb_dir, csv_fname, dataset_name)[source]
__getitem__(index)[source]

Batch retrieval fxn - each batch is a single protein

class spurs.datamodules.datasets.ddggeo.ddgGeo(pdb_dir, csv_fname, dataset_name, stage='full', mut_seq=False, train_size=1)[source]

Bases: torch.utils.data.dataset.Dataset

A dataset class for handling geometric-aware protein stability datasets. https://github.com/Gonglab-THU/SPIRED-Fitness

This class handles multiple datasets that incorporate geometric features: - S461: 461 mutations with structural information - S783: 783 mutations with geometric features - S8754: Large-scale dataset with 8,754 mutations - S2648: Test set with 2,648 mutations - S571: Temperature-based stability dataset - S4346: Extended temperature mutation dataset

Parameters
  • pdb_dir (str) – Directory containing PDB structure files.

  • csv_fname (str) – Path to the CSV file containing mutation data.

  • dataset_name (str) – Name of the dataset (e.g., ‘S461’, ‘S783’, etc.).

  • stage (str, optional) – Dataset stage (‘full’, ‘train’, ‘test’). Defaults to ‘full’.

  • mut_seq (bool, optional) – Whether to include mutated sequences. Defaults to False.

  • train_size (float, optional) – Training data size ratio. Defaults to 1.

__init__(pdb_dir, csv_fname, dataset_name, stage='full', mut_seq=False, train_size=1)[source]
class spurs.datamodules.datasets.fireport.FireProtDataset(split)[source]

Bases: torch.utils.data.dataset.Dataset

__init__(split)[source]

Note

  • ddgBenchDataset is used for ssym-dir, ssym-inv, and S669 datasets

  • ddgGeo is used for S461, S783, S8754, S2648, S571, and S4346 datasets