MEANshift
- class geoanalytics.clustering.MEANshift.MEANshift(dataframe)[source]
Bases:
objectAbout this algorithm
- Description:
MeanShift is a centroid-based clustering algorithm that seeks modes (i.e., high-density areas) in the feature space. It does not require predefining the number of clusters. This wrapper performs MeanShift clustering on input data, tracks memory and execution time, and supports exporting results.
- Parameters:
Dataset (pandas DataFrame) must be provided during object initialization.
Clustering hyperparameters can be passed to the run method.
- Attributes:
df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and features.
labelsDF (pd.DataFrame) – DataFrame containing ‘x’, ‘y’, and assigned cluster labels.
centers (ndarray) – The cluster centers estimated by MeanShift.
startTime, endTime (float) – Variables to track clustering execution time.
memoryUSS, memoryRSS (float) – Memory usage of the clustering process in kilobytes.
Execution methods
Calling from a Python program
import pandas as pd from geoanalytics.clustering import MEANshift df = pd.read_csv("input.csv") ms = MEANshift(df) labels_df, centers = ms.run(bandwidth=None, max_iter=300) ms.getRuntime() ms.getMemoryUSS() ms.getMemoryRSS() ms.save('MeanShiftLabels.csv', 'MeanShiftCenters.csv')
Credits
This implementation was created by Raashika and revised by M.Charan Teja under the guidance of Professor Rage Uday Kiran.
- getMemoryRSS()[source]
Prints the memory usage (RSS) of the process in kilobytes.
- getMemoryUSS()[source]
Prints the memory usage (USS) of the process in kilobytes.
- getRuntime()[source]
Prints the total runtime of the clustering algorithm.
- run(bandwidth=None, max_iter=300)[source]
Executes MeanShift clustering algorithm.
- Parameters:
bandwidth – float or None, bandwidth for window size. If None, it will be estimated automatically.
max_iter – int, maximum number of iterations (default: 300)
- Returns:
(labelsDF, centers) labelsDF (pd.DataFrame) – DataFrame with ‘x’, ‘y’, and cluster labels centers (np.ndarray) – Coordinates of cluster centers
- save(outputFileLabels='MeanShiftLabels.csv', outputFileCenters='MeanShiftCenters.csv')[source]
Saves the clustering result and cluster centers to CSV files.
- Parameters:
outputFileLabels – str, filename to save label results (default: ‘MeanShiftLabels.csv’)
outputFileCenters – str, filename to save cluster centers (default: ‘MeanShiftCenters.csv’)