CSV2Parquet

class geoanalytics.conversion.CSV2Parquet.CSV2Parquet(inputFile, outputFile, sep)[source]

Bases: object

About this algorithm

Description:

This class is to convert CSV files into Parquet format.

Reference:

Parameters:
  • inputFile (str) – Path to the input CSV file.

  • outputFile (str) – Path to the output Parquet file.

  • sep (str) – This variable is used to distinguish items from one another. The default seperator is tab space. However, the users can override their default separator.

Attributes:
  • getMemoryUSS (float) – Returns the memory used by the process in USS.

  • getMemoryRSS (float) – Returns the memory used by the process in RSS.

  • getRuntime() (float) – Returns the time taken to execute the conversion.

  • printStats()Prints statistics about memory usage and runtime.

Methods:
  • convert()Reads the input file, converts it to a Parquet file, and tracks memory usage and runtime.

Execution methods

Terminal command

Format:

(.venv) $ python3 CSV2Parquet.py <inputFile> <outputFile> <sep>

Example Usage:

(.venv) $ python3 CSV2Parquet.py sampleDB.csv output.parquet

Calling from a python program

import PAMI.extras.convert.CSV2Parquet as cp

inputFile = 'sampleDB.csv'

sep = " "

outputFile = 'output.parquet'

obj = cp.CSV2Parquet(inputFile, outputFile, sep)

obj.convert()

obj.printStats()

Credits

The complete program was written by P. Likhitha and revised by Tarun Sreepada under the supervision of Professor Rage Uday Kiran.

convert()[source]

This function converts the input CSV file to a data frame, which is then transformed into a Parquet file.

getMemoryRSS()[source]

Returns the memory used by the process in RSS (Resident Set Size).

Returns:

The total memory (in bytes) used by the process in RAM.

Return type:

int

getMemoryUSS()[source]

Returns the memory used by the process in USS (Unique Set Size).

Returns:

The amount of memory (in bytes) used exclusively by the process

Return type:

int

getRuntime()[source]

Returns the time taken to complete the CSV to Parquet conversion.

Returns:

The runtime of the conversion process in seconds.

Return type:

float

printStats()[source]

Prints the resource usage statistics including memory consumption (USS and RSS) and the runtime.

Returns:

Prints memory usage and runtime to the console.