Load

class optimus.engines.base.io.load.BaseLoad(op)[source]

avro(filepath_or_buffer, n_rows=None, storage_options=None, conn=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a avro file.

Parameters

filepath_or_buffer – path or location of the file. Must be string dataType
n_rows –
storage_options –
conn –
args – custom argument to be passed to the spark avro function
kwargs – custom keyword arguments to be passed to the spark avro function

csv(filepath_or_buffer, sep=',', header=True, infer_schema=True, encoding='UTF-8', n_rows=None, null_value='None', quoting=3, lineterminator='\r\n', on_bad_lines='warn', cache=False, na_filter=False, storage_options=None, conn=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a csv file. It is the same read.csv Spark function with some predefined params.

Parameters

encoding –
storage_options –
quoting –
filepath_or_buffer – path or location of the file.
sep – usually delimiter mark are ‘,’ or ‘;’.
header – tell the function whether dataset has a header row. True default.
infer_schema – infers the input schema automatically from data.
n_rows –
null_value –
cache –
na_filter –
lineterminator –
on_bad_lines –
conn –

It requires one extra pass over the data. True default.

:return dataFrame

excel(filepath_or_buffer, header=0, sheet_name=0, merge_sheets=False, skip_rows=0, n_rows=None, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a excel file.

Parameters

filepath_or_buffer – Path or location of the file. Must be string dataType
header –
sheet_name – excel sheet name
merge_sheets –
skip_rows –
n_rows –
storage_options –
conn –
n_partitions –
args – custom argument to be passed to the excel function
kwargs – custom keyword arguments to be passed to the excel function

Returns

file(path, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Try to infer the file data format and encoding and load the data into a dataframe.

Parameters

path – Path to the file you want to load.
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

hdf5(path, columns=None, n_partitions=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a HDF5 file.

Parameters

path – path or location of the file. Must be string dataType.
columns – Specific column names to be loaded from the file.
n_partitions –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

json(filepath_or_buffer, multiline=False, n_rows=False, storage_options=None, conn=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a json file.

Parameters

filepath_or_buffer – path or location of the file.
multiline –
n_rows –
storage_options –
conn –
args –
kwargs –

Returns

static model(path)[source]

Load a machine learning model from a file.

Parameters: path – Path to the file we want to load.
Returns

orc(path, columns, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a OCR file.

Parameters

path – path or location of the file. Must be string dataType.
columns – Specific column names to be loaded from the file.
storage_options –
conn –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.

parquet(filepath_or_buffer, columns=None, n_rows=None, storage_options=None, conn=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a parquet file.

Parameters

filepath_or_buffer – path or location of the file. Must be string dataType
columns – select the columns that will be loaded. In this way you do not need to load all the dataframe
storage_options –
conn –
args – custom argument to be passed to the spark parquet function
kwargs – custom keyword arguments to be passed to the spark parquet function

tsv(filepath_or_buffer, header=True, infer_schema=True, *args, **kwargs)[source]

Loads a dataframe from a tsv(Tabular separated values) file.

Parameters

filepath_or_buffer – Path or location of the file. Must be string dataType
header –
infer_schema –
args – custom argument to be passed to the spark avro function.
kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

xml(path, n_rows=None, storage_options=None, conn=None, *args, **kwargs) → optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a XML file.

Parameters

path –
n_rows –
storage_options –
conn –
args –
kwargs –

Returns