Load

class optimus.engines.base.io.load.BaseLoad(op)[source]
avro(filepath_or_buffer, n_rows=None, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a avro file.

Parameters
  • filepath_or_buffer – path or location of the file. Must be string dataType

  • n_rows

  • storage_options

  • conn

  • args – custom argument to be passed to the spark avro function

  • kwargs – custom keyword arguments to be passed to the spark avro function

csv(filepath_or_buffer, sep=',', header=True, infer_schema=True, encoding='UTF-8', n_rows=None, null_value='None', quoting=3, lineterminator='\r\n', on_bad_lines='warn', cache=False, na_filter=False, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a csv file. It is the same read.csv Spark function with some predefined params.

Parameters
  • encoding

  • storage_options

  • quoting

  • filepath_or_buffer – path or location of the file.

  • sep – usually delimiter mark are ‘,’ or ‘;’.

  • header – tell the function whether dataset has a header row. True default.

  • infer_schema – infers the input schema automatically from data.

  • n_rows

  • null_value

  • cache

  • na_filter

  • lineterminator

  • on_bad_lines

  • conn

It requires one extra pass over the data. True default.

:return dataFrame

excel(filepath_or_buffer, header=0, sheet_name=0, merge_sheets=False, skip_rows=0, n_rows=None, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a excel file.

Parameters
  • filepath_or_buffer – Path or location of the file. Must be string dataType

  • header

  • sheet_name – excel sheet name

  • merge_sheets

  • skip_rows

  • n_rows

  • storage_options

  • conn

  • n_partitions

  • args – custom argument to be passed to the excel function

  • kwargs – custom keyword arguments to be passed to the excel function

Returns

file(path, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Try to infer the file data format and encoding and load the data into a dataframe.

Parameters
  • path – Path to the file you want to load.

  • args – custom argument to be passed to the spark avro function.

  • kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

hdf5(path, columns=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a HDF5 file.

Parameters
  • path – path or location of the file. Must be string dataType.

  • columns – Specific column names to be loaded from the file.

  • n_partitions

  • args – custom argument to be passed to the spark avro function.

  • kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

json(filepath_or_buffer, multiline=False, n_rows=False, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a json file.

Parameters
  • filepath_or_buffer – path or location of the file.

  • multiline

  • n_rows

  • storage_options

  • conn

  • args

  • kwargs

Returns

static model(path)[source]

Load a machine learning model from a file.

Parameters

path – Path to the file we want to load.

Returns

orc(path, columns, storage_options=None, conn=None, n_partitions=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a OCR file.

Parameters
  • path – path or location of the file. Must be string dataType.

  • columns – Specific column names to be loaded from the file.

  • storage_options

  • conn

  • args – custom argument to be passed to the spark avro function.

  • kwargs – custom keyword arguments to be passed to the spark avro function.

parquet(filepath_or_buffer, columns=None, n_rows=None, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a parquet file.

Parameters
  • filepath_or_buffer – path or location of the file. Must be string dataType

  • columns – select the columns that will be loaded. In this way you do not need to load all the dataframe

  • storage_options

  • conn

  • args – custom argument to be passed to the spark parquet function

  • kwargs – custom keyword arguments to be passed to the spark parquet function

tsv(filepath_or_buffer, header=True, infer_schema=True, *args, **kwargs)[source]

Loads a dataframe from a tsv(Tabular separated values) file.

Parameters
  • filepath_or_buffer – Path or location of the file. Must be string dataType

  • header

  • infer_schema

  • args – custom argument to be passed to the spark avro function.

  • kwargs – custom keyword arguments to be passed to the spark avro function.

Returns

xml(path, n_rows=None, storage_options=None, conn=None, *args, **kwargs) optimus.helpers.types.DataFrameType[source]

Loads a dataframe from a XML file.

Parameters
  • path

  • n_rows

  • storage_options

  • conn

  • args

  • kwargs

Returns