KNNImputation

class geoanalytics.imputation.KNNImputation.KNNImputation(dataframe)[source]

Bases: object

About this algorithm

Description:

K-Nearest Neighbors (KNN) Imputation estimates missing values by finding the k nearest samples (rows) in the dataset and imputing missing values based on the average (or weighted average) of those neighbors’ corresponding feature values.

Parameters:

Dataset (pandas DataFrame) must be provided during object initialization.
Number of neighbors n_neighbors is specified during the run() call.

Attributes:

df (pd.DataFrame) – The input data with ‘x’, ‘y’ coordinates and features.
imputedDF (pd.DataFrame) – DataFrame after filling in missing values.
startTime, endTime (float) – Variables to track execution time.
memoryUSS, memoryRSS (float) – Memory usage of the imputation process in kilobytes.

Execution methods

Calling from a Python program

import pandas as pd

from geoanalytics.imputation import KNNImputation

df = pd.read_csv("input.csv")

knn_imputer = KNNImputation(df)

imputed_df = knn_imputer.run(n_neighbors=5)

knn_imputer.getRuntime()
knn_imputer.getMemoryUSS()
knn_imputer.getMemoryRSS()

knn_imputer.save('KNN.csv')

Credits

This implementation was created by Raashika and revised by M.Charan Teja under the guidance of Professor Rage Uday Kiran.

getMemoryRSS()[source]: Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]: Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]: Prints the total runtime of the clustering algorithm.

run(n_neighbors=5)[source]

Executes the KNN Imputation algorithm by replacing missing values based on nearest neighbor averages.

Parameters:: n_neighbors – int, number of neighbors to use for imputation (default: 5)
Returns:: imputedDF (pd.DataFrame) – DataFrame with missing values filled

save(outputFile='KNN.csv')[source]

Saves the imputed DataFrame to a CSV file.

Parameters:: outputFile – str, filename to save the imputed data (default: ‘KNN.csv’)