You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
4.1 KiB
4.1 KiB
None
<html lang="en">
<head>
</head>
</html>
Writing data-transport plugins¶
The data-transport plugins are designed to automate pre/post processing i.e
- Read -> Post processing
- Write-> Pre processing
In this example we will assume, data and write both pre/post processing to any supported infrastructure. We will equally show how to specify the plugins within a configuration file
In [1]:
#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os
import shutil
#
#
DATABASE = '/home/steve/tmp/demo.db3'
if os.path.exists(DATABASE) :
os.remove(DATABASE)
#
#
_data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})
litew = transport.get.writer(provider=providers.SQLITE,database=DATABASE)
litew.write(_data,table='friends')
Reading from SQLite¶
The cell below reads the data that has been written by the cell above and computes the average age from a plugin function we will write.
- Basic read of the designated table (friends) created above
- Read with pipeline functions defined in code
NOTE
It is possible to use transport.factory.instance or transport.instance or transport.get.<[reader|writer]> they are the same. It allows the maintainers to know that we used a factory design pattern.
In [4]:
import transport
from transport import providers
import os
import numpy as np
def _autoincrement (_data,**kwargs) :
"""
This function will add an autoincrement field to the table
"""
_data['autoinc'] = np.arange(_data.shape[0])
return _data
def reduce(_data,**_args) :
"""
This function will reduce the age of the data frame
"""
_data.age /= 10
return _data
reader = transport.get.reader(provider=providers.SQLITE,database=DATABASE,table='friends')
#
# basic read of the data created in the first cell
_df = reader.read()
print (_df)
print ()
print()
#
# read of the data with pipeline function provided to alter the database
print (reader.read(pipeline=[_autoincrement,reduce]))
The parameters for instianciating a transport object (reader or writer) can be found at data-transport home
In [ ]: