Plugins: Usage & Development

Plugins: Registry

The plugins registry is a registry of plugins intended to be used in pre/post processing. This feature comes in handy :

During ETL: Cleanup data, adding columns enforcing data-typing, removing/encrypting PHI ...

In a collaborative environment (Jupyter-x; Zeppelin; AWS Service Workbench)

Plugins: Architecture & Design

Plugins are designed around plugin architecture using Iterator design-pattern. In that respect and function as a pipeline i.e executed sequentially in the order in which they are expressed in the parameter. Effectively the output of one function will be the input to the next.

Data Transport UML Plugin Component View

Quick Start

1. Make Plugin 2. Register Plugin 3. Use The Plugin

The code here shows a function that will be registered as "autoincrement".

The data, will always be a pandas.DataFrame

For the sake of this example the file will be my-plugin.py

import transport
                             import numpy as np


_index = 0
                             
@transport.Plugin(name='autoincrement')
def _incr (_data):
global _index
_data['_id'] = _index + np.arange(_data.shape[0])
_index = _data.shape[0]
return _data

data-transport comes with a built-in command line interface (CLI). It allows plugins to be registered and reused.

Registered functions are stored in $HOME/.data-transport/plugins/code

Any updates to my-plugin.py will require re-registering the file

Additional plugin registry functions (list, test) are available

$ transport plugin-add demo ./my-plugin.py

The following command allows data-transport to determine what is knows about the function i.e real name and name to be used in code.

$ transport plugin-test demo.autoincrement

Once registered, the plugins are ready for use within code or configuration file (auth-file).

import transport
                         from transport import providers

     
                         _args = {
                             "provider":providers.HTTP,
"url":"https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/addresses.csv",
"plugins":["demo@autoincrement"]

                         }
     
                         reader = transport.get.reader(**_args)
_data = reader.read()
print (_data.head())