Load
- class optimus.engines.base.io.load.BaseLoad(op)[source]
- avro(filepath_or_buffer, n_rows=None, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a avro file.
- Parameters
filepath_or_buffer – path or location of the file. Must be string dataType
n_rows –
storage_options –
conn –
args – custom argument to be passed to the spark avro function
kwargs – custom keyword arguments to be passed to the spark avro function
- csv(filepath_or_buffer, sep=',', header=True, infer_schema=True, encoding='UTF-8', n_rows=None, null_value='None', quoting=3, lineterminator='\r\n', on_bad_lines='warn', cache=False, na_filter=False, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a csv file. It is the same read.csv Spark function with some predefined params.
- Parameters
encoding –
storage_options –
quoting –
filepath_or_buffer – path or location of the file.
sep – usually delimiter mark are ‘,’ or ‘;’.
header – tell the function whether dataset has a header row. True default.
infer_schema – infers the input schema automatically from data.
n_rows –
null_value –
cache –
na_filter –
lineterminator –
on_bad_lines –
conn –
It requires one extra pass over the data. True default.
:return dataFrame
- excel(filepath_or_buffer, header=0, sheet_name=0, merge_sheets=False, skip_rows=0, n_rows=None, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a excel file.
- Parameters
filepath_or_buffer – Path or location of the file. Must be string dataType
header –
sheet_name – excel sheet name
merge_sheets –
skip_rows –
n_rows –
storage_options –
conn –
n_partitions –
args – custom argument to be passed to the excel function
kwargs – custom keyword arguments to be passed to the excel function
- Returns
- file(path, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Try to infer the file data format and encoding and load the data into a dataframe.
- Parameters
path – Path to the file you want to load.
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.
- Returns
- hdf5(path, columns=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a HDF5 file.
- Parameters
path – path or location of the file. Must be string dataType.
columns – Specific column names to be loaded from the file.
n_partitions –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.
- Returns
- json(filepath_or_buffer, multiline=False, n_rows=False, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a json file.
- Parameters
filepath_or_buffer – path or location of the file.
multiline –
n_rows –
storage_options –
conn –
args –
kwargs –
- Returns
- static model(path)[source]
Load a machine learning model from a file.
- Parameters
path – Path to the file we want to load.
- Returns
- orc(path, columns, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a OCR file.
- Parameters
path – path or location of the file. Must be string dataType.
columns – Specific column names to be loaded from the file.
storage_options –
conn –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.
- parquet(filepath_or_buffer, columns=None, n_rows=None, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType [source]
Loads a dataframe from a parquet file.
- Parameters
filepath_or_buffer – path or location of the file. Must be string dataType
columns – select the columns that will be loaded. In this way you do not need to load all the dataframe
storage_options –
conn –
args – custom argument to be passed to the spark parquet function
kwargs – custom keyword arguments to be passed to the spark parquet function
- tsv(filepath_or_buffer, header=True, infer_schema=True, *args, **kwargs)[source]
Loads a dataframe from a tsv(Tabular separated values) file.
- Parameters
filepath_or_buffer – Path or location of the file. Must be string dataType
header –
infer_schema –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.
- Returns