CSV2Parquet
- class geoanalytics.conversion.CSV2Parquet.CSV2Parquet(inputFile, outputFile, sep)[source]
Bases:
objectAbout this algorithm
- Description:
This class is to convert CSV files into Parquet format.
- Reference:
- Parameters:
inputFile (str) – Path to the input CSV file.
outputFile (str) – Path to the output Parquet file.
sep (str) – This variable is used to distinguish items from one another. The default seperator is tab space. However, the users can override their default separator.
- Attributes:
getMemoryUSS (float) – Returns the memory used by the process in USS.
getMemoryRSS (float) – Returns the memory used by the process in RSS.
getRuntime() (float) – Returns the time taken to execute the conversion.
printStats() – Prints statistics about memory usage and runtime.
- Methods:
convert() – Reads the input file, converts it to a Parquet file, and tracks memory usage and runtime.
Execution methods
Terminal command
Format: (.venv) $ python3 CSV2Parquet.py <inputFile> <outputFile> <sep> Example Usage: (.venv) $ python3 CSV2Parquet.py sampleDB.csv output.parquet
Calling from a python program
import PAMI.extras.convert.CSV2Parquet as cp inputFile = 'sampleDB.csv' sep = " " outputFile = 'output.parquet' obj = cp.CSV2Parquet(inputFile, outputFile, sep) obj.convert() obj.printStats()
Credits
The complete program was written by P. Likhitha and revised by Tarun Sreepada under the supervision of Professor Rage Uday Kiran.
- convert()[source]
This function converts the input CSV file to a data frame, which is then transformed into a Parquet file.
- getMemoryRSS()[source]
Returns the memory used by the process in RSS (Resident Set Size).
- Returns:
The total memory (in bytes) used by the process in RAM.
- Return type:
int
- getMemoryUSS()[source]
Returns the memory used by the process in USS (Unique Set Size).
- Returns:
The amount of memory (in bytes) used exclusively by the process
- Return type:
int