MedianImputation

class geoanalytics.imputation.MedianImputation.MedianImputation(dataframe)[source]

Bases: object

About this algorithm

Description:

MedianImputation replaces missing values in feature columns with their respective medians, excluding ‘x’ and ‘y’ spatial columns, and reports runtime and memory usage after execution.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataset with ‘x’, ‘y’ spatial columns followed by numerical features with potential missing values.

Attributes:
  • df (pandas.DataFrame) – Internal copy of the original input DataFrame with reordered columns.

  • imputedDF (pandas.DataFrame) – Final DataFrame after applying median imputation.

  • startTime (float) – Start time of the imputation.

  • endTime (float) – End time of the imputation.

  • memoryUSS (float) – Memory usage (USS in KB) during the run.

  • memoryRSS (float) – Memory usage (RSS in KB) during the run.

Execution methods

Calling from a Python program

from geoanalytics.imputation import MedianImputation as alg

import pandas as pd

df = pd.read_csv('dataset.csv')

obj = alg.MedianImputation(df)

imputed_df = obj.run()

obj.save('MedianImputation.csv')

obj.getRuntime()

obj.getMemoryUSS()

obj.getMemoryRSS()

print("Data after Median Imputation:", imputed_df)

Credits

The complete program was written by and revised by under the supervision of Professor Rage Uday Kiran.

getMemoryRSS()[source]

Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]

Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]

Prints the total runtime of the clustering algorithm.

run()[source]

Executes median imputation on the dataset (excluding ‘x’ and ‘y’ columns), and returns the imputed DataFrame with original coordinates.

Returns:

imputedDFpandas.DataFrame

The DataFrame with missing values imputed using column medians.

save(outputFile='MedianImputation.csv')[source]

Saves the imputed DataFrame to a CSV file.