Parquet2CSV

class geoanalytics.conversion.Parquet2CSV.Parquet2CSV(inputFile, outputFile, sep)[source]

Bases: object

About this algorithm

Description:

This class is to convert Parquet format into CSV file.

Reference:

Parameters:
  • inputFile (str) – Path to the input Parquet file.

  • outputFile (str) – Path to the output CSV file.

  • sep (str) – This variable is used to distinguish items from one another. The default seperator is tab space. However, the users can override their default separator.

Attributes:
  • getMemoryUSS (int) – Returns the memory used by the process in USS.

  • getMemoryRSS (int) – Returns the memory used by the process in RSS.

  • getRuntime() (float) – Returns the time taken to execute the conversion.

  • printStats() – * Prints statistics about memory usage and runtime.*

Methods:
  • convert()Reads the Parquet file, converts it to a CSV file, and tracks memory usage and runtime.

Execution methods

Terminal command

Format:

(.venv) $ python3 _CSV2Parquet.py <inputFile> <outputFile> <sep>

Example Usage:

(.venv) $ python3 _CSV2Parquet.py output.parquet sampleDB.csv

Calling from a python program

import PAMI.extras.convert.Parquet2CSV as pc

inputFile = 'output.parquet'

sep = " "

outputFile = 'sampleDB.csv'

obj = pc.Parquet2CSV(inputFile, outputFile, sep)

obj.convert()

obj.printStats()

Credits

The complete program was written by P. Likhitha and revised by Tarun Sreepada under the supervision of Professor Rage Uday Kiran.

convert()[source]

This function converts the input Parquet file into a CSV file where each row is joined by the specified separator and written to the output file.

getMemoryRSS()[source]

Returns the memory used by the process in RSS (Resident Set Size).

Returns:

The total memory (in bytes) used by the process in RAM.

Return type:

int

getMemoryUSS()[source]

Returns the memory used by the process in USS (Unique Set Size).

Returns:

The amount of memory (in bytes) used exclusively by the process

Return type:

int

getRuntime()[source]

Returns the time taken to complete the Parquet to CSV conversion.

Returns:

The runtime of the conversion process in seconds.

Return type:

float

printStats()[source]

Prints the resource usage statistics including memory consumption (USS and RSS) and the runtime.

Returns:

Prints memory usage and runtime to the console.