You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

186 lines
9.3 KiB
HTML

<style>
.terminal {
display:grid; grid-template-columns: 55% 45%; gap:8px; font-weight: lighter;
}
hr{border:0px; border-top:3px dotted #CAD5E0; width:50%; margin-left:25%}
.source-code pre {font-size:12px; font-weight:lighter; font-family: courier;}
</style>
<script>
layout = {on:{load:{'install':['www/html/_notes/install.html'],'documentation':['www/html/_notes/documentation.html']}}}
// bootup.init('{{system.context}}',layout)
var label = $('.terminal .tabs label')[0]
label.click()
// menu.events._openTabs('.terminal .tab-content','etl-conf-tab')
// menu.events._openTabs('.make-config','.manual')
label = $('.terminal .etl-conf-tab .tabs label')[0]
label.click()
</script>
<div class="terminal">
<div>
<p>
<h3>ETL: Introduction</h3>
Extract Load & Transform (ETL) consists in copying data from one database to one or many others. This can be done in two different ways:
<ul>
<div><i class="fa-solid fa-minus"></i> Command Line Interface (CLI), driven by JSON configuration</div>
<div><i class="fa-solid fa-minus"></i> Or within custom python code</div>
</ul>
The ETL process will take advantage of registries for <b>plugins</b> and <b>labeled database connectivity</b> to perform <b>pre</b>/<b>post</b> processing tasks.
</p>
<hr>
<p>
<h3>ETL: Command Line Interface</h3>
<p>
The configuration file needed to run the ETL is a JSON formatted file where each entry contains:
<ul>
<div><i class="fa-solid fa-minus"></i> <b>source</b> with the content of an <b>auth-file</b></div>
<div><i class="fa-solid fa-minus"></i> <b>target</b> with <b>list</b> of elements of an <b>auth-file</b></div>
</ul>
<!-- The auth-file<span class="active bold" onclick="menu.events._dialog({type:'dialog',uri:'www/html/wizard/wizard.html',title:'Auth-file Generator'},'{{system.context}}')">generator</span>, shows how the auth-file entry is structured. -->
<div>
The <b>CLI</b> (transport), is capable of generating a demo ETL :
<ul>
<div><i class="fa-solid fa-minus"></i> with <b>source</b>: reads CSV data from github</b></div>
<div><i class="fa-solid fa-minus"></i> and <b>target</b>: writes the data to CSV & SQLite3 database</div>
</ul>
<div class="source-code"><i class="fa-solid fa-copy" onclick="_plugins.copy(this)"></i>
$ transport generate ./demo-etl.json
</div>
</div>
</p>
</p>
<br>
<hr>
<br><div align="center" class="border figure"><img src="www/html/_images/uml-activity.png">
<div class="small border-top" style="margin-top:4px; padding-top:4px">
Data-transport UML Extract-Load-Transform (ETL) Workflow
</div>
</div>
</div>
<div>
<div>
<div class="tabs" >
<input type="radio" name="etl" id="etl-conf"/>
<label for="etl-conf" onclick="menu.events._openTabs('.terminal .tab-content','etl-conf-tab')">1. Configuration</label>
<input type="radio" name="etl" id="etl-exe"/>
<label for="etl-exe" onclick="menu.events._openTabs('.terminal .tab-content','etl-exe-tab')">2. Run ETL CLI</label>
<input type="radio" name="etl" id="etl-code"/>
<label for="etl-code" onclick="menu.events._openTabs('.terminal .tab-content','etl-code-tab')">ETL: Custom Code</label>
<input type="radio" name="etl" id="none" disabled>
<label for="none" style="grid-column:4">&nbsp;</label>
</div>
</div>
<p>
<div class="tab-content">
<div class="etl-exe-tab">
<p>
The command-line interface should be instructed to run the ETL by calling the <b>apply</b> function.
</p>
<p>
<div class="source-code">
$ transport apply ./demo-etl.json
</div>
</p>
<p>
Additional parameters can be invoked by providing the <b>--help</b> switch
</p>
<p>
<div class="source-code">
$ transport apply --help
</div>
</p>
</div>
<div class="etl-code-tab"></div>
<div class="etl-conf-tab">
The following examples shows simple configuration files that do NOT require any database to be installed. Feel free to change and edit at your own discression.
<br>
<p>
<h3>Example # 1: Basic ETL</h3>
<div class="tabs" style="margin:0px; padding:0px; background-color:#ffffff;grid-template-columns: 50% 50%; display:grid;" align="center">
<input type="radio" name="mk-conf" id="mk-man">
<label for="mk-man" onclick="menu.events._openTabs('.make-config','.manual')" style="background-color:#ffffff; border-radius:0px;">Manual</label>
<input type="radio" name="mk-conf" id="mk-gen">
<label for="mk-gen" onclick="menu.events._openTabs('.make-config','.generated')" style="background-color:#ffffff; border-radius:0px;">Generate</label>
</div>
<div style="border-top:0px; min-height:400px;">
<div class="make-config">
<div class="generated">
<p>
<b>data-transport</b> comes with a CLI integrated that will
<ul>
<div><i class="fa-solid fa-minus"></i> <b>generate</b> an EL configuration file</div>
<div class=" source-code"><i class="fa-solid fa-copy" onclick="_plugins.copy(this)"></i>
<span>$ transport generate ./demo-etl.json</span>
</div>
</ul>
<div><i class="fa-solid fa-minus"></i> <b>NOTE:</b>The configuration file supports <b>labels</b> and/or <b>plugins</b>, these would have to be done manually</div>
</p>
</div>
<div class="manual">
<p>Copy the content and save it to a file <b>"demo-etl.json"</b></p>
<div class="source-code" style="text-overflow: ellipsis;">
<i class="fa-solid fa-copy" onclick="code.copy(this)"></i>
<pre>[{
"source": {
"provider": "http",
"url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
},
"target": [
{"provider": "files", "path": "addresses.csv", "delimiter": ","},
{"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
]}]</pre>
</div>
</div>
</div>
</div>
</p>
<hr>
<p>
<h3>Example # 2: ETL With Plugins</h3>
<p>Copy the content and save it to a file <b>"demo-etl.json"</b></p>
<div class="source-code" style="text-overflow: ellipsis;">
<i class="fa-solid fa-copy" onclick="code.copy(this)"></i>
<pre >[{
"source": {
"provider": "http",
"plugins":["demo@autoincrement"],
"url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
},
"target": [
{"provider": "files", "path": "addresses.csv", "delimiter": ","},
{"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
]}]</pre>
</div>
</p>
</div>
</div>
</p>
<!-- <div class="border-round border">
<h3 style="border-color: transparent;">UML Activity Diagram - ETL</h3>
<br><div class="border-top">
<ul>
<i class="fa-solid fa-minus"> </i> The diagram shows <b>1-to-many</b> database support
<br><i class="fa-solid fa-minus"> </i> The ETL job is specified by JSON configuration file
</ul>
</div>
</div>
<br><div id="documentation" class="border-round border" style="min-height:250px"></div> -->
</div>
</div>