You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
data-transport/notebooks/etl.ipynb

5.6 KiB

None <html lang="en"> <head> </head>

Extract Transform Load (ETL) from Code

The example below reads data from an http source (github) and will copy the data to a csv file and to a database. This example illustrates the one-to-many ETL features.

In [2]:
#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os

#
#
source = {"provider": "http", "url": "https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/addresses.csv"}
target =  [{"provider": "files", "path": "addresses.csv", "delimiter": ","}, {"provider": "sqlite", "database": "sample.db3", "table": "addresses"}]

_handler = transport.get.etl (source=source,target=target)
_data = _handler.read() #-- all etl begins with data being read
_data.head()
Out[2]:
id location_id address_1 address_2 city state_province postal_code country
0 1 1 2600 Middlefield Road NaN Redwood City CA 94063 US
1 2 2 24 Second Avenue NaN San Mateo CA 94401 US
2 3 3 24 Second Avenue NaN San Mateo CA 94403 US
3 4 4 24 Second Avenue NaN San Mateo CA 94401 US
4 5 5 24 Second Avenue NaN San Mateo CA 94401 US

Extract Transform Load (ETL) from CLI

The documentation for this is available at https://healthcareio.the-phi.com/data-transport "Docs" -> "Terminal CLI"

The entire process is documented including how to generate an ETL configuration file.

In [ ]:

</html>