MEANshift

class geoanalytics.clustering.MEANshift.MEANshift(dataframe)[source]

Bases: object

About this algorithm

Description:

MeanShift is a centroid-based clustering algorithm that seeks modes (i.e., high-density areas) in the feature space. It does not require predefining the number of clusters. This wrapper performs MeanShift clustering on input data, tracks memory and execution time, and supports exporting results.

Parameters:

Dataset (pandas DataFrame) must be provided during object initialization.
Clustering hyperparameters can be passed to the run method.

Attributes:

df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and features.
labelsDF (pd.DataFrame) – DataFrame containing ‘x’, ‘y’, and assigned cluster labels.
centers (ndarray) – The cluster centers estimated by MeanShift.
startTime, endTime (float) – Variables to track clustering execution time.
memoryUSS, memoryRSS (float) – Memory usage of the clustering process in kilobytes.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.clustering import MEANshift

df = pd.read_csv("input.csv")

ms = MEANshift(df)

labels_df, centers = ms.run(bandwidth=None, max_iter=300)

ms.getRuntime()
ms.getMemoryUSS()
ms.getMemoryRSS()

ms.save('MeanShiftLabels.csv', 'MeanShiftCenters.csv')

Credits

This implementation was created by Raashika and revised by M.Charan Teja under the guidance of Professor Rage Uday Kiran.

getMemoryRSS()[source]: Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]: Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]: Prints the total runtime of the clustering algorithm.

run(bandwidth=None, max_iter=300)[source]

Executes MeanShift clustering algorithm.

Parameters:

bandwidth – float or None, bandwidth for window size. If None, it will be estimated automatically.
max_iter – int, maximum number of iterations (default: 300)

Returns:

(labelsDF, centers) labelsDF (pd.DataFrame) – DataFrame with ‘x’, ‘y’, and cluster labels centers (np.ndarray) – Coordinates of cluster centers

save(outputFileLabels='MeanShiftLabels.csv', outputFileCenters='MeanShiftCenters.csv')[source]

Saves the clustering result and cluster centers to CSV files.

Parameters:

outputFileLabels – str, filename to save label results (default: ‘MeanShiftLabels.csv’)
outputFileCenters – str, filename to save cluster centers (default: ‘MeanShiftCenters.csv’)