MatrixFactorizationKNN

class geoanalytics.imputation.MatrixFactorizationKNN.MatrixFactorizationKNN(dataframe)[source]

Bases: object

About this algorithm

Description:

MatrixFactorizationKNN applies KNN-based imputation using fancyimpute.KNN to estimate missing values based on similarity between rows. It excludes ‘x’ and ‘y’ during imputation and reports runtime and memory usage.

Parameters:
  • dataframe (pandas.DataFrame) – Input dataset with ‘x’, ‘y’ spatial columns followed by features with possible missing values.

  • k (int) – Number of nearest neighbors used in the imputation (default is 5).

Attributes:
  • df (pandas.DataFrame) – Internal copy of the input DataFrame with reordered columns.

  • imputedDF (pandas.DataFrame) – Final DataFrame after applying KNN Imputation.

  • startTime (float) – Start time of the imputation.

  • endTime (float) – End time of the imputation.

  • memoryUSS (float) – Memory usage (USS in KB) during the run.

  • memoryRSS (float) – Memory usage (RSS in KB) during the run.

Execution methods

Calling from a Python program

from geoanalytics.imputation import MatrixFactorizationKNN as alg

import pandas as pd

df = pd.read_csv('dataset.csv')

obj = alg.MatrixFactorizationKNN(df)

imputed_df = obj.run(k=5)

obj.save('MatrixFactorizationKNN.csv')

obj.getRuntime()

obj.getMemoryUSS()

obj.getMemoryRSS()

print("Data after KNN Imputation:", imputed_df)

Credits

The complete program was written by and revised by under the supervision of Professor Rage Uday Kiran.

getMemoryRSS()[source]

Prints the memory usage (RSS) of the process in kilobytes.

getMemoryUSS()[source]

Prints the memory usage (USS) of the process in kilobytes.

getRuntime()[source]

Prints the total runtime of the algorithm.

run(k=5)[source]

Executes KNN Imputation on the dataset (excluding ‘x’ and ‘y’ columns),and returns the imputed DataFrame with original coordinates.

Parameters:

kint

Number of neighbors to use (default: 5)

Returns:

imputedDFpandas.DataFrame

The DataFrame with missing values imputed.

save(outputFile='MatrixFactorizationKNN.csv')[source]

Saves the imputed DataFrame to a CSV file.