HDBScan

class geoanalytics.clustering.HDBScan.HDBScan(dataframe)[source]

Bases: object

About this algorithm

Description:

HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) is an advanced density-based clustering algorithm. It extends DBSCAN by converting it into a hierarchical clustering algorithm and then using a technique to extract a flat clustering based on the stability of clusters. This wrapper enables runtime and memory usage tracking, and allows exporting results to CSV.

Parameters:
  • Dataset (pandas DataFrame) must be provided during object initialization.

  • Clustering hyperparameters are passed to the run() method.

Attributes:
  • df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and features.

  • labelsDF (pd.DataFrame) – DataFrame containing ‘x’, ‘y’, and assigned cluster labels.

  • startTime, endTime (float) – Runtime tracking.

  • memoryUSS, memoryRSS (float) – Memory usage of the clustering process in kilobytes.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.clustering import HDBScan

df = pd.read_csv("input.csv")

hdb = HDBScan(df)

labels_df = hdb.run(min_sample=5, min_cluster_size=10)

hdb.getRuntime()
hdb.getMemoryUSS()
hdb.getMemoryRSS()

hdb.save('HDBScanLabels.csv')

Credits

This implementation was created by Raashika and revised by M.Charan Teja under the guidance of Professor Rage Uday Kiran.

getMemoryRSS()[source]

Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]

Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]

Prints the total runtime of the clustering algorithm.

run(min_sample=5, min_cluster_size=10)[source]

Executes HDBScan clustering algorithm.

Parameters:
  • min_sample – int, minimum samples required for a dense region (default: 5)

  • min_cluster_size – int, minimum size of clusters to extract (default: 10)

Returns:

labelsDF (pd.DataFrame) with columns [‘x’, ‘y’, ‘labels’]

save(outputFileLabels='HDBScanLabels.csv')[source]

Saves the clustering result with labels to a CSV file.

Parameters:

outputFileLabels – str, filename for saving labels (default: ‘HDBScanLabels.csv’)