OverlapScore

class geoanalytics.scoreCalculator.OverlapScore.OverlapScore(TrainDF, TopkDF, startBandTrainDF=2, startBandTopkDF=2)[source]

Bases: object

About this algorithm

Description:

OverlapScore quantifies the cluster overlap between two datasets using KMeans clustering. It helps evaluate how well a top-k retrieved set aligns with the training dataset in the embedding space, by checking the agreement of cluster assignments.

Parameters:
  • TrainDF (pd.DataFrame): The original training dataset.

  • TopkDF (pd.DataFrame): The retrieved top-k dataset.

  • startBandTrainDF (int): Column index from which to start using features in TrainDF (default: 2).

  • startBandTopkDF (int): Column index from which to start using features in TopkDF (default: 2).

Attributes:
  • TrainDF (np.ndarray) – Sliced feature matrix from the training dataset.

  • TopkDF (np.ndarray) – Sliced feature matrix from the top-k dataset.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.scoreCalculator import OverlapScore

train_df = pd.read_csv("train.csv")

topk_df = pd.read_csv("topk.csv")

overlap = OverlapScore(train_df, topk_df, startBandTrainDF=2, startBandTopkDF=2)

score = overlap.run(n_clusters=3)

Credits

This implementation was created by Raashika and revised by M. Charan Teja under the guidance of Professor Rage Uday Kiran.

run(n_clusters=2)[source]

Computes the cluster overlap score using KMeans clustering.

Parameters:

n_clusters (int) – Number of clusters to use for KMeans (default: 2).

Returns:

Proportion of top-k samples that belong to the same cluster as the first training sample.

Return type:

float