Gaussianmixture

class geoanalytics.clustering.Gaussianmixture.Gaussianmixture(dataframe)[source]

Bases: object

About this algorithm

Description:

Gaussian Mixture Model (GMM) is a probabilistic clustering algorithm that clusters feature-rich data by modeling it as a mixture of Gaussians, with runtime and memory tracking and exportable label results.

Parameters:
  • Dataset (pandas DataFrame) must be provided during object initialization.

  • No other parameters are required during instantiation.

Attributes:
  • df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and features.

  • labelsDF (pd.DataFrame) – DataFrame containing ‘x’, ‘y’, and assigned cluster labels.

  • model (GaussianMixture) – The trained scikit-learn GaussianMixture model instance for reuse or further analysis.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.clustering import Gaussianmixture

df = pd.read_csv("input.csv")

gm = Gaussianmixture(df)

output = gm.clustering(n_components=3)

labels_df = output[0]

weights = output[1]

centers = output[2]

gm.save('GaussianMixtureLabels.csv')

Credits

This implementation was created by Raashika and revised by M.Charan Teja under the guidance of Professor Rage Uday Kiran.

getMemoryRSS()[source]

Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]

Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]

Prints the total runtime of the clustering algorithm.

run(n_components=4, max_iters=100, covariance_type='full', init_params='kmeans', random_state=0)[source]

Performs Gaussian Mixture Model clustering on the feature columns.

Parameters:
  • n_components – Number of Gaussian components (clusters) to use.

  • max_iters – Maximum number of iterations allowed during EM algorithm.

  • covariance_type – Type of covariance parameters (‘full’, ‘tied’, ‘diag’, ‘spherical’).

  • init_params – Initialization method (‘kmeans’ or ‘random’).

  • random_state – Random seed to ensure reproducibility.

Returns:

Tuple of (DataFrame with labels, array of component weights, array of component means).

save(outputFileLabels='GaussianMixtureLabels.csv', outputFileWeights='GaussianMixtureWeights.csv', outputFileMeans='GaussianMixtureMeans.csv')[source]

Saves labels, weights, and means to separate CSV files.