ETL: Introduction

Extract Load & Transform (ETL) consists in copying data from one database to one or many others. This can be done in two different ways: The ETL process will take advantage of registries for plugins and labeled database connectivity to perform pre/post processing tasks.


ETL: Command Line Interface

The configuration file needed to run the ETL is a JSON formatted file where each entry contains:

The CLI (transport), is capable of generating a demo ETL :
    with source: reads CSV data from github
    and target: writes the data to CSV & SQLite3 database
$ transport generate ./demo-etl.json




Data-transport UML Extract-Load-Transform (ETL) Workflow

The command-line interface should be instructed to run the ETL by calling the apply function.

$ transport apply ./demo-etl.json

Additional parameters can be invoked by providing the --help switch

$ transport apply --help

The following examples shows simple configuration files that do NOT require any database to be installed. Feel free to change and edit at your own discression.

Example # 1: Basic ETL

data-transport comes with a CLI integrated that will

    generate an EL configuration file
    $ transport generate ./demo-etl.json
NOTE:The configuration file supports labels and/or plugins, these would have to be done manually

Copy the content and save it to a file "demo-etl.json"

[{
"source": {
    "provider": "http", 
    "url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
}, 
"target": [
    {"provider": "files", "path": "addresses.csv", "delimiter": ","}, 
    {"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
]}]


Example # 2: ETL With Plugins

Copy the content and save it to a file "demo-etl.json"

[{
              "source": {
                  "provider": "http", 
                  "plugins":["demo@autoincrement"],
                  "url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
              }, 
              "target": [
                  {"provider": "files", "path": "addresses.csv", "delimiter": ","}, 
                  {"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
              ]}]