You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
186 lines
9.3 KiB
HTML
186 lines
9.3 KiB
HTML
<style>
|
|
.terminal {
|
|
display:grid; grid-template-columns: 55% 45%; gap:8px; font-weight: lighter;
|
|
}
|
|
hr{border:0px; border-top:3px dotted #CAD5E0; width:50%; margin-left:25%}
|
|
.source-code pre {font-size:12px; font-weight:lighter; font-family: courier;}
|
|
</style>
|
|
<script>
|
|
layout = {on:{load:{'install':['www/html/_notes/install.html'],'documentation':['www/html/_notes/documentation.html']}}}
|
|
// bootup.init('{{system.context}}',layout)
|
|
var label = $('.terminal .tabs label')[0]
|
|
label.click()
|
|
// menu.events._openTabs('.terminal .tab-content','etl-conf-tab')
|
|
// menu.events._openTabs('.make-config','.manual')
|
|
label = $('.terminal .etl-conf-tab .tabs label')[0]
|
|
label.click()
|
|
</script>
|
|
<div class="terminal">
|
|
<div>
|
|
<p>
|
|
<h3>ETL: Introduction</h3>
|
|
Extract Load & Transform (ETL) consists in copying data from one database to one or many others. This can be done in two different ways:
|
|
<ul>
|
|
<div><i class="fa-solid fa-minus"></i> Command Line Interface (CLI), driven by JSON configuration</div>
|
|
<div><i class="fa-solid fa-minus"></i> Or within custom python code</div>
|
|
</ul>
|
|
The ETL process will take advantage of registries for <b>plugins</b> and <b>labeled database connectivity</b> to perform <b>pre</b>/<b>post</b> processing tasks.
|
|
|
|
</p>
|
|
<hr>
|
|
<p>
|
|
<h3>ETL: Command Line Interface</h3>
|
|
<p>
|
|
The configuration file needed to run the ETL is a JSON formatted file where each entry contains:
|
|
<ul>
|
|
<div><i class="fa-solid fa-minus"></i> <b>source</b> with the content of an <b>auth-file</b></div>
|
|
<div><i class="fa-solid fa-minus"></i> <b>target</b> with <b>list</b> of elements of an <b>auth-file</b></div>
|
|
</ul>
|
|
<!-- The auth-file<span class="active bold" onclick="menu.events._dialog({type:'dialog',uri:'www/html/wizard/wizard.html',title:'Auth-file Generator'},'{{system.context}}')">generator</span>, shows how the auth-file entry is structured. -->
|
|
<div>
|
|
The <b>CLI</b> (transport), is capable of generating a demo ETL :
|
|
<ul>
|
|
<div><i class="fa-solid fa-minus"></i> with <b>source</b>: reads CSV data from github</b></div>
|
|
<div><i class="fa-solid fa-minus"></i> and <b>target</b>: writes the data to CSV & SQLite3 database</div>
|
|
</ul>
|
|
<div class="source-code"><i class="fa-solid fa-copy" onclick="_plugins.copy(this)"></i>
|
|
$ transport generate ./demo-etl.json
|
|
</div>
|
|
</div>
|
|
</p>
|
|
</p>
|
|
<br>
|
|
<hr>
|
|
<br><div align="center" class="border figure"><img src="www/html/_images/uml-activity.png">
|
|
<div class="small border-top" style="margin-top:4px; padding-top:4px">
|
|
Data-transport UML Extract-Load-Transform (ETL) Workflow
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<div>
|
|
<div>
|
|
<div class="tabs" >
|
|
<input type="radio" name="etl" id="etl-conf"/>
|
|
<label for="etl-conf" onclick="menu.events._openTabs('.terminal .tab-content','etl-conf-tab')">1. Configuration</label>
|
|
|
|
<input type="radio" name="etl" id="etl-exe"/>
|
|
<label for="etl-exe" onclick="menu.events._openTabs('.terminal .tab-content','etl-exe-tab')">2. Run ETL CLI</label>
|
|
|
|
<input type="radio" name="etl" id="etl-code"/>
|
|
<label for="etl-code" onclick="menu.events._openTabs('.terminal .tab-content','etl-code-tab')">ETL: Custom Code</label>
|
|
<input type="radio" name="etl" id="none" disabled>
|
|
<label for="none" style="grid-column:4"> </label>
|
|
</div>
|
|
</div>
|
|
<p>
|
|
|
|
<div class="tab-content">
|
|
<div class="etl-exe-tab">
|
|
<p>
|
|
The command-line interface should be instructed to run the ETL by calling the <b>apply</b> function.
|
|
</p>
|
|
<p>
|
|
<div class="source-code">
|
|
$ transport apply ./demo-etl.json
|
|
</div>
|
|
</p>
|
|
<p>
|
|
Additional parameters can be invoked by providing the <b>--help</b> switch
|
|
</p>
|
|
|
|
<p>
|
|
<div class="source-code">
|
|
$ transport apply --help
|
|
</div>
|
|
</p>
|
|
|
|
</div>
|
|
<div class="etl-code-tab"></div>
|
|
<div class="etl-conf-tab">
|
|
The following examples shows simple configuration files that do NOT require any database to be installed. Feel free to change and edit at your own discression.
|
|
<br>
|
|
<p>
|
|
<h3>Example # 1: Basic ETL</h3>
|
|
<div class="tabs" style="margin:0px; padding:0px; background-color:#ffffff;grid-template-columns: 50% 50%; display:grid;" align="center">
|
|
<input type="radio" name="mk-conf" id="mk-man">
|
|
<label for="mk-man" onclick="menu.events._openTabs('.make-config','.manual')" style="background-color:#ffffff; border-radius:0px;">Manual</label>
|
|
<input type="radio" name="mk-conf" id="mk-gen">
|
|
<label for="mk-gen" onclick="menu.events._openTabs('.make-config','.generated')" style="background-color:#ffffff; border-radius:0px;">Generate</label>
|
|
</div>
|
|
|
|
<div style="border-top:0px; min-height:400px;">
|
|
<div class="make-config">
|
|
<div class="generated">
|
|
<p>
|
|
<b>data-transport</b> comes with a CLI integrated that will
|
|
<ul>
|
|
<div><i class="fa-solid fa-minus"></i> <b>generate</b> an EL configuration file</div>
|
|
<div class=" source-code"><i class="fa-solid fa-copy" onclick="_plugins.copy(this)"></i>
|
|
<span>$ transport generate ./demo-etl.json</span>
|
|
</div>
|
|
|
|
</ul>
|
|
<div><i class="fa-solid fa-minus"></i> <b>NOTE:</b>The configuration file supports <b>labels</b> and/or <b>plugins</b>, these would have to be done manually</div>
|
|
</p>
|
|
</div>
|
|
<div class="manual">
|
|
<p>Copy the content and save it to a file <b>"demo-etl.json"</b></p>
|
|
<div class="source-code" style="text-overflow: ellipsis;">
|
|
<i class="fa-solid fa-copy" onclick="code.copy(this)"></i>
|
|
<pre>[{
|
|
"source": {
|
|
"provider": "http",
|
|
"url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
|
|
},
|
|
"target": [
|
|
{"provider": "files", "path": "addresses.csv", "delimiter": ","},
|
|
{"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
|
|
]}]</pre>
|
|
|
|
</div>
|
|
</div>
|
|
|
|
</div>
|
|
</div>
|
|
|
|
</p>
|
|
<hr>
|
|
|
|
<p>
|
|
<h3>Example # 2: ETL With Plugins</h3>
|
|
<p>Copy the content and save it to a file <b>"demo-etl.json"</b></p>
|
|
<div class="source-code" style="text-overflow: ellipsis;">
|
|
<i class="fa-solid fa-copy" onclick="code.copy(this)"></i>
|
|
<pre >[{
|
|
"source": {
|
|
"provider": "http",
|
|
"plugins":["demo@autoincrement"],
|
|
"url": "https://github.com/codeforamerica/ohana-api/blob/master/data/sample-csv/addresses.csv"
|
|
},
|
|
"target": [
|
|
{"provider": "files", "path": "addresses.csv", "delimiter": ","},
|
|
{"provider": "sqlite3", "database": "sample.db3", "table": "addresses"}
|
|
]}]</pre>
|
|
|
|
</div>
|
|
|
|
</p>
|
|
|
|
</div>
|
|
</div>
|
|
</p>
|
|
<!-- <div class="border-round border">
|
|
<h3 style="border-color: transparent;">UML Activity Diagram - ETL</h3>
|
|
|
|
<br><div class="border-top">
|
|
<ul>
|
|
<i class="fa-solid fa-minus"> </i> The diagram shows <b>1-to-many</b> database support
|
|
<br><i class="fa-solid fa-minus"> </i> The ETL job is specified by JSON configuration file
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
|
|
<br><div id="documentation" class="border-round border" style="min-height:250px"></div> -->
|
|
</div>
|
|
</div> |