data-transport

4.1 KiB

Raw Blame History

None <html lang="en"> <head> </head>

Writing data-transport plugins¶

The data-transport plugins are designed to automate pre/post processing i.e

- Read -> Post processing
- Write-> Pre processing

In this example we will assume, data and write both pre/post processing to any supported infrastructure. We will equally show how to specify the plugins within a configuration file

In [1]:

#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os
import shutil
#
#

DATABASE = '/home/steve/tmp/demo.db3'
if os.path.exists(DATABASE) :
    os.remove(DATABASE)
#
#    
_data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})
litew = transport.get.writer(provider=providers.SQLITE,database=DATABASE)
litew.write(_data,table='friends')

Reading from SQLite¶

The cell below reads the data that has been written by the cell above and computes the average age from a plugin function we will write.

Basic read of the designated table (friends) created above
Read with pipeline functions defined in code

NOTE

It is possible to use transport.factory.instance or transport.instance or transport.get.<[reader|writer]> they are the same. It allows the maintainers to know that we used a factory design pattern.

In [4]:

import transport
from transport import providers
import os
import numpy as np
def _autoincrement (_data,**kwargs) :
    """
    This function will add an autoincrement field to the table
    """
    _data['autoinc'] = np.arange(_data.shape[0])
    
    return _data
def reduce(_data,**_args) :
    """
    This function will reduce the age of the data frame
    """
    _data.age /= 10
    return _data
reader = transport.get.reader(provider=providers.SQLITE,database=DATABASE,table='friends')
#
# basic read of the data created in the first cell
_df = reader.read()
print (_df)
print ()
print()
#
# read of the data with pipeline function provided to alter the database
print (reader.read(pipeline=[_autoincrement,reduce]))

           name  age
0    James Bond   55
1  Steve Rogers  150
2  Steve Nyemba   44


           name   age  autoinc
0    James Bond   5.5        0
1  Steve Rogers  15.0        1
2  Steve Nyemba   4.4        2

The parameters for instianciating a transport object (reader or writer) can be found at data-transport home

In [ ]:

</html>

4.1 KiB Raw Blame History

Writing data-transport plugins¶

Reading from SQLite¶

4.1 KiB

Raw Blame History