You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
data-transport/notebooks/plugins.ipynb

4.1 KiB

None <html lang="en"> <head> </head>

Writing data-transport plugins

The data-transport plugins are designed to automate pre/post processing i.e

- Read -> Post processing
- Write-> Pre processing

In this example we will assume, data and write both pre/post processing to any supported infrastructure. We will equally show how to specify the plugins within a configuration file

In [1]:
#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os
import shutil
#
#

DATABASE = '/home/steve/tmp/demo.db3'
if os.path.exists(DATABASE) :
    os.remove(DATABASE)
#
#    
_data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})
litew = transport.get.writer(provider=providers.SQLITE,database=DATABASE)
litew.write(_data,table='friends')

Reading from SQLite

The cell below reads the data that has been written by the cell above and computes the average age from a plugin function we will write.

  • Basic read of the designated table (friends) created above
  • Read with pipeline functions defined in code

NOTE

It is possible to use transport.factory.instance or transport.instance or transport.get.<[reader|writer]> they are the same. It allows the maintainers to know that we used a factory design pattern.

In [4]:
import transport
from transport import providers
import os
import numpy as np
def _autoincrement (_data,**kwargs) :
    """
    This function will add an autoincrement field to the table
    """
    _data['autoinc'] = np.arange(_data.shape[0])
    
    return _data
def reduce(_data,**_args) :
    """
    This function will reduce the age of the data frame
    """
    _data.age /= 10
    return _data
reader = transport.get.reader(provider=providers.SQLITE,database=DATABASE,table='friends')
#
# basic read of the data created in the first cell
_df = reader.read()
print (_df)
print ()
print()
#
# read of the data with pipeline function provided to alter the database
print (reader.read(pipeline=[_autoincrement,reduce]))
           name  age
0    James Bond   55
1  Steve Rogers  150
2  Steve Nyemba   44


           name   age  autoinc
0    James Bond   5.5        0
1  Steve Rogers  15.0        1
2  Steve Nyemba   4.4        2

The parameters for instianciating a transport object (reader or writer) can be found at data-transport home

In [ ]:

</html>