MultipleImputation

class geoanalytics.imputation.MultipleImputation.MultipleImputation(dataframe)[source]

Bases: object

About this algorithm

Description:

MultipleImputation performs iterative multivariate imputation using chained equations with linear regression to estimate missing values based on relationships among features.

Parameters:

Dataset (pandas DataFrame) must be provided during object initialization.
Additional tuning parameters can be provided during the run() call.

Attributes:

df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and feature columns.
imputedDF (pd.DataFrame) – DataFrame containing ‘x’, ‘y’, and imputed values.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.imputation import MultipleImputation

df = pd.read_csv("input.csv")

mi = MultipleImputation(df)

output = mi.run()

mi.save('MultipleImputaion.csv')

Credits

This implementation was created by and revised by under the guidance of Professor Rage Uday Kiran.

getMemoryRSS()[source]: Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]: Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]: Prints the total runtime of the clustering algorithm.

run(n_nearest_features=None, max_iter=10, random_state=0)[source]

Executes iterative multivariate imputation using linear regression.

Parameters:

n_nearest_features – int or None, optional Number of features to use when estimating missing values. If None, all features are used.
max_iter – int, default=10 Maximum number of imputation iterations.
random_state – int, default=0 Seed for reproducibility.

Returns:

pandas DataFrame with imputed values and original ‘x’, ‘y’ columns.

save(outputFile='MultipleImputation.csv')[source]

Saves the imputed DataFrame to a CSV file.

Parameters:: outputFile – Filename to save the resulting DataFrame.