SilhouetteScore

class geoanalytics.scoreCalculator.SilhouetteScore.SilhouetteScore(TrainDF, TopkDF, startBandTrainDF=2, startBandTopkDF=2)[source]

Bases: object

About this algorithm

Description:

SilhouetteScore evaluates how well-separated the top-k retrieved data points are from the training dataset using the silhouette coefficient. This is useful for validating retrieval performance and cluster consistency between two groups.

Parameters:
  • TrainDF (pd.DataFrame): The original training dataset.

  • TopkDF (pd.DataFrame): The retrieved top-k dataset.

  • startBandTrainDF (int): Column index from which to extract features from TrainDF (default: 2).

  • startBandTopkDF (int): Column index from which to extract features from TopkDF (default: 2).

Attributes:
  • TrainDF (np.ndarray) – Extracted features from the training dataset.

  • TopkDF (np.ndarray) – Extracted features from the top-k dataset.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.scoreCalculator import SilhouetteScore

train_df = pd.read_csv("train.csv")

topk_df = pd.read_csv("topk.csv")

scorer = SilhouetteScore(train_df, topk_df, startBandTrainDF=2, startBandTopkDF=2)

score = scorer.run()

Credits

This implementation was created by Raashika and revised by M. Charan Teja under the guidance of Professor Rage Uday Kiran.

run()[source]

Computes the silhouette score for the two combined datasets.

Returns:

Silhouette score indicating the separation between training and top-k data points.

Return type:

float