version update, etl bug fixes

bug fixes: ETL creating a template functions
forogot info file
35 changed files with 1092 additions and 288 deletions
--- a/README.md
+++ b/README.md
@ -4,33 +4,37 @@ This project implements an abstraction of objects that can have access to a vari
 # Why Use Data-Transport ?
-Mostly data scientists that don't really care about the underlying database and would like a simple and consistent way to read/write and move data are well served. Additionally we implemented lightweight Extract Transform Loading API and command line (CLI) tool. Finally it is possible to add pre/post processing pipeline functions to read/write
+Data transport is a simple framework that:
-
+- easy to install & modify (open-source)
-1. Familiarity with **pandas data-frames**
+- enables access to multiple database technologies (pandas, SQLAlchemy)
-2. Connectivity **drivers** are included
+- enables notebook sharing without exposing database credential.
-3. Reading/Writing data from various sources
+- supports pre/post processing specifications (pipeline)
 4. Useful for data migrations or **ETL**
 ## Installation
-Within the virtual environment perform the following :
+Within the virtual environment perform the following (the following will install everything):
    pip install data-transport[all]@git+https://github.com/lnyemba/data-transport.git
-    pip install git+https://github.com/lnyemba/data-transport.git
+Options to install components in square brackets are **nosql**; **cloud**; **other** and **warehouse**
-## Features
+    pip install data-transport[nosql,cloud,other, warehouse,all]@git+https://github.com/lnyemba/data-transport.git
-    - read/write from over a dozen databases
+The components available:
    - run ETL jobs seamlessly
    - scales and integrates into shared environments like apache zeppelin; jupyterhub; SageMaker; ...
-## What's new
+    0. sql          by default netezza; mysql; postgresql; duckdb; sqlite3; sqlserver
    1. nosql        mongodb/ferretdb; couchdb
    2. cloud        s3; bigquery; databricks
    3. other        files; http; rabbitmq
    4. warehouse    apache drill; apache iceberg
-Unlike older versions 2.0 and under, we focus on collaborative environments like jupyter-x servers; apache zeppelin:
+## Additional features
-    1. Simpler syntax to create reader or writer
+    - Reads are separated from writes to avoid accidental writes.
-    2. auth-file registry that can be referenced using a label
+    - Streaming (for large volumes of data) by specifying chunksize
-    3. duckdb support
+    - CLI interface to add to registry, run ETL
    - Implements best-pracices for collaborative environments like apache zeppelin; jupyterhub; SageMaker; ...
 ## Learn More
--- a/bin/transport
+++ b/bin/transport
@ -24,19 +24,28 @@ from multiprocessing import Process
 import os
 import transport
-from transport import etl
+# from transport import etl
 from transport.iowrapper import IETL
 # from transport import providers
 import typer
 from typing_extensions import Annotated
 from typing import Optional
 import time
 from termcolor import colored
 from enum import Enum
 from rich import print
 import plugin_ix as pix
 app = typer.Typer()
 app_e = typer.Typer()   #-- handles etl (run, generate)
 app_x = typer.Typer()   #-- handles plugins (list,add, test)
 app_i = typer.Typer()   #-- handles information (version, license)
 app_r = typer.Typer()   #-- handles registry    
 REGISTRY_PATH=os.sep.join([os.environ['HOME'],'.data-transport'])
 REGISTRY_FILE= 'transport-registry.json'
-CHECK_MARK = ' '.join(['[',colored(u'\u2713', 'green'),']'])
+CHECK_MARK = '[ [green]\u2713[/green] ]' #' '.join(['[',colored(u'\u2713', 'green'),']'])
-TIMES_MARK= ' '.join(['[',colored(u'\u2717','red'),']'])
+TIMES_MARK= '[ [red]\u2717[/red] ]' #' '.join(['[',colored(u'\u2717','red'),']'])
 # @app.command()
 def help() :     
 	print (__doc__)
@ -44,10 +53,13 @@ def wait(jobs):
    while jobs :
        jobs = [thread for thread in jobs if thread.is_alive()]
        time.sleep(1)
-
+def job (_args):
-@app.command(name="apply")
+    pass
@app_e.command(name="run")
 def apply (path:Annotated[str,typer.Argument(help="path of the configuration file")],
-        index:int = typer.Option(default= None, help="index of the item of interest, otherwise everything in the file will be processed")):
+        index:int = typer.Option(default= None, help="index of the item of interest, otherwise everything in the file will be processed"),
        batch:int = typer.Option(default=5, help="The number of parallel processes to run at once")
        ):
    """
    This function applies data transport ETL feature to read data from one source to write it one or several others
    """
@ -56,23 +68,34 @@ def apply (path:Annotated[str,typer.Argument(help="path of the configuration fil
        file = open(path)
        _config = json.loads (file.read() )
        file.close()
-        if index :
+        if index is not None:            
            _config = [_config[ int(index)]]
-        jobs = []            
+        jobs = []          
        for _args in _config :
-            pthread = etl.instance(**_args) #-- automatically starts the process
+            # pthread = etl.instance(**_args) #-- automatically starts the process
            def bootup ():
                _worker = IETL(**_args)
                _worker.run()
            pthread = Process(target=bootup)
            pthread.start()
            jobs.append(pthread)
            if len(jobs) == batch :
                wait(jobs)
                jobs = []
        if jobs :
            wait (jobs)
        #
-        # @TODO: Log the number of processes started and estimated time
+        # @TODO: Log the number of processes started and estfrom transport impfrom transport impimated time
-        while jobs :
+        # while jobs :
-             jobs = [pthread for pthread in jobs if pthread.is_alive()]
+        #      jobs = [pthread for pthread in jobs if pthread.is_alive()]
-             time.sleep(1)
+        #      time.sleep(1)
        #
        # @TODO: Log the job termination here ...
-@app.command(name="providers")
+@app_i.command(name="supported")
 def supported (format:Annotated[str,typer.Argument(help="format of the output, supported formats are (list,table,json)")]="table") :
    """
-    This function will print supported providers/vendors and their associated classifications
+    This function will print supported database technologies
    """
    _df =  (transport.supported())
    if format in ['list','json'] :
@ -80,17 +103,26 @@ def supported (format:Annotated[str,typer.Argument(help="format of the output, s
    else:
         print (_df)
    print ()
-
+@app_i.command(name="version")
-@app.command()
+def version ():
-def version():
+    """
    This function will return the version of the data-transport
    """
    print()
    print (f'[bold] {transport.__app_name__} ,[blue] {transport.__edition__} edition [/blue], version {transport.__version__}[/bold]')
    print ()
@app_i.command(name="license")
 def info():
    """
    This function will display version and license information
    """
-
+    print()
-    print (transport.__app_name__,'version ',transport.__version__)
+    print (f'[bold] {transport.__app_name__} ,{transport.__edition__}, version {transport.__version__}[/bold]')
    print ()
    print (transport.__license__)
-@app.command()
+@app_e.command()
 def generate (path:Annotated[str,typer.Argument(help="path of the ETL configuration file template (name included)")]):
    """
    This function will generate a configuration template to give a sense of how to create one
@ -99,52 +131,146 @@ def generate (path:Annotated[str,typer.Argument(help="path of the ETL configurat
            {
                "source":{"provider":"http","url":"https://raw.githubusercontent.com/codeforamerica/ohana-api/master/data/sample-csv/addresses.csv"},
                "target":
-            [{"provider":"files","path":"addresses.csv","delimiter":","},{"provider":"sqlite","database":"sample.db3","table":"addresses"}]
+            [{"provider":"files","path":"addresses.csv","delimiter":","},{"provider":"sqlite3","database":"sample.db3","table":"addresses"}]
            }
            ]
    file = open(path,'w')
    file.write(json.dumps(_config))
    file.close()
-    print (f"""{CHECK_MARK} Successfully generated a template ETL file at {path}""" )
+    print (f"""{CHECK_MARK} Successfully generated a template ETL file at [bold]{path}[/bold]""" )
    print ("""NOTE: Each line (source or target) is the content of an auth-file""")
-@app.command(name="init")
+@app_r.command(name="reset")
 def initregistry (email:Annotated[str,typer.Argument(help="email")],
                  path:str=typer.Option(default=REGISTRY_PATH,help="path or location of the configuration file"), 
                  override:bool=typer.Option(default=False,help="override existing configuration or not")):
    """
-    This functiion will initialize the registry and have both application and calling code loading the database parameters by a label
+    This functiion will initialize the data-transport registry and have both application and calling code loading the database parameters by a label
    """
    try:
        transport.registry.init(email=email, path=path, override=override)
-        _msg = f"""{CHECK_MARK} Successfully wrote configuration to {path} from {email}"""
+        _msg = f"""{CHECK_MARK} Successfully wrote configuration to [bold]{path}[/bold] from [bold]{email}[/bold]"""
    except Exception as e:
        _msg = f"{TIMES_MARK} {e}"
    print (_msg)
    print ()
-@app.command(name="register")
+
@app_r.command(name="add")
 def register (label:Annotated[str,typer.Argument(help="unique label that will be used to load the parameters of the database")],
              auth_file:Annotated[str,typer.Argument(help="path of the auth_file")],
              default:bool=typer.Option(default=False,help="set the auth_file as default"),
              path:str=typer.Option(default=REGISTRY_PATH,help="path of the data-transport registry file")):
    """
-    This function will register an auth-file i.e database connection and assign it a label, 
+    This function add  a database label for a given auth-file. which allows access to the database using a label of your choice.
-    Learn more about auth-file at https://healthcareio.the-phi.com/data-transport
+    
    """
    try:
        if transport.registry.exists(path) :
            transport.registry.set(label=label,auth_file=auth_file, default=default, path=path)
-            _msg = f"""{CHECK_MARK} Successfully added label "{label}" to data-transport registry"""
+            _msg = f"""{CHECK_MARK} Successfully added label [bold]"{label}"[/bold] to data-transport registry"""
        else:
            _msg = f"""{TIMES_MARK} Registry is not initialized, please initialize the registry (check help)"""
    except Exception as e:
        _msg = f"""{TIMES_MARK} {e}"""
    print (_msg)
@app_r.command(name="template")    
 def template(name:Annotated[str,typer.Argument(help="database technology provider" ) ]):
    """
    This function will generate a template entry for the registry (content of an auth file)
    """
    #
    # retrieve the provider and display the template if it has one
    for _module in  ['sql','cloud','warehouse','nosql','other'] :
        ref = getattr(transport,_module) if hasattr(transport,_module) else None
        _entry = {}
        if ref :
            if hasattr(ref,name) :
                _pointer = getattr(ref,name)
                _entry = dict({'provider':name},**_pointer.template()) if hasattr(_pointer,'template') else {}
                break
    #
    #
    print ( json.dumps(_entry))
    pass    
@app_r.command(name="list")
 def register_list ():
    """
    This function will list existing registry entries and basic information {label,vendor}
    """
    # print (transport.registry.DATA)
    _reg = transport.registry.DATA
    _data = [{'label':key,'provider':_reg[key]['provider']} for key in _reg if 'provider' in _reg[key]]
    _data = pd.DataFrame(_data)
    print (_data)
@app_x.command(name='add') 
 def register_plugs (
    alias:Annotated[str,typer.Argument(help="unique function name within a file")],
    path:Annotated[str,typer.Argument(help="path of the python file, that contains functions")],
    folder:str=typer.Option(default=REGISTRY_PATH,help="path of the data-transport registry folder"),
-    pass
+    ):
    """
    This function will register a file and the functions within we are interested in using
    """
    if ',' in alias :
        alias = [_name.strip() for _name in alias.split(',') if _name.strip() != '' ] 
    else:
        alias = [alias.strip()]
    _pregistry  = pix.Registry(folder=folder,plugin_folder='plugins/code')
    _log = _pregistry.set(path,alias)
    # transport.registry.plugins.init()
    # _log = transport.registry.plugins.add(alias,path)
    _mark = TIMES_MARK if not _log else CHECK_MARK
    _msg  = f"""Could NOT add the [bold]{alias}[/bold]to the registry""" if not _log else f""" successfully added {alias}, {_log} functions registered"""
    print (f"""{_mark} {_msg}""")
@app_x.command(name="list") 
 def registry_list (folder:str=typer.Option(default=REGISTRY_PATH,help="path of the data-transport configuration folder")):
    """
    This function will list all the plugins (python functions/files) that are registered and can be reused
    """
    _pregistry  = pix.Registry(folder=folder)
    _df = _pregistry.stats()
    if _df.empty :
        print (f"{TIMES_MARK} registry at {folder} is not ready")
    else:
        print (_df)
@app_x.command ("has")
 def registry_has (alias:Annotated[str,typer.Argument(help="alias of a function function@file or file.function")],
                  folder:str=typer.Option(default=REGISTRY_PATH,help="path of the data-transport registry file")) :
    _pregistry  = pix.Registry(folder=folder)
    if _pregistry.has(alias) :
        _msg = f"{CHECK_MARK} {alias} was [bold] found [/bold] in registry "
    else:
        _msg = f"{TIMES_MARK} {alias} was [bold] NOT found [/bold] in registry "
    print (_msg)
@app_x.command(name="test") 
 def registry_test (alias:Annotated[str,typer.Argument(help="alias of a function function@file or file.function")],
                  folder:str=typer.Option(default=REGISTRY_PATH,help="path of the data-transport registry folder")) :
    _pregistry  = pix.Registry(folder=folder)
    """
    This function allows to test syntax for a plugin i.e in terms of alias@function
    """
    # _item = transport.registry.plugins.has(key=key)
    _pointer = _pregistry.get(alias) if _pregistry.has(alias) else None
    if _pointer:
        print (f"""{CHECK_MARK} successfully loaded [bold] {alias}[/bold] found in {folder}""")
    else:
        print (f"{TIMES_MARK} unable to load {alias}. Make sure it is registered")
 app.add_typer(app_e,name='etl',help="This function will run etl or generate a template etl configuration file")
 app.add_typer(app_r,name='registry',help='This function allows labeling database access information')
 app.add_typer(app_i,name="info",help="This function will print either license or supported database technologies")
 app.add_typer(app_x, name="plugins",help="This function enables add/list/test of plugins in the registry")
 if __name__ == '__main__' :
     app()
--- a/bin/transport.cmd
+++ b/bin/transport.cmd
@ -0,0 +1,2 @@
 cd /D "%~dp0"
 python transport %1 %2 %3 %4 %5 %6
--- a/info/init.py
+++ b/info/init.py
@ -1,6 +1,7 @@
 __app_name__  = 'data-transport'
 __author__ = 'The Phi Technology'
-__version__= '2.2.6'
+__version__= '2.4.30'
 __edition__= 'enterprise'
 __email__  = "info@the-phi.com"
 __license__=f"""
 Copyright 2010 - 2024, Steve L. Nyemba
@ -13,9 +14,12 @@ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR I
 """
-__whatsnew__=f"""version {__version__}, focuses on collaborative environments like jupyter-base servers (apache zeppelin; jupyter notebook, jupyterlab, jupyterhub)
+__whatsnew__=f"""version {__version__}, 
 1. Added support for read/write logs as well as plugins (when applied)
 2. Bug fix with duckdb (adding readonly) for readers because there are issues with threads & processes
 3. support for streaming data, important to use this with large volumes of data
-    1. simpler syntax to create readers/writers
+
-    2. auth-file registry that can be referenced using a label
+
-    3. duckdb support
+    
 """
--- a/notebooks/postgresql.ipynb
+++ b/notebooks/postgresql.ipynb
@ -14,7 +14,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -58,7 +58,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -103,7 +103,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
@ -131,16 +131,28 @@
   ]
  },
  {
-   "cell_type": "code",
+   "cell_type": "markdown",
-   "execution_count": null,
+   "metadata": {},
   "source": [
    "#### Streaming Large Volumes of Data\n",
    "\n",
    "It is recommended for large volumes of data to stream the data using **chunksize** as a parameter \n",
    "\n",
    "1. in the **read** method \n",
    "2. or **transport.get.reader(\\*\\*...,chunksize=1000)**\n",
    "\n",
    "Use streaming because with large volumes of data some databases limit the volume of data for a single transaction in order to efficiently guarantee maintain **data integrity**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "python (3.10.12)",
   "language": "python",
   "name": "python3"
  },
@ -154,7 +166,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.9.7"
+   "version": "3.10.12"
  }
 },
 "nbformat": 4,
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,54 @@
 [build-system]
 requires = ["setuptools>=61.0", "wheel"]
 build-backend = "setuptools.build_meta"
 [project]
 name = "data-transport"
 dynamic = ["version"]
 authors = [
    {name="Steve L. Nyemba" , email = "info@the-phi.com"},
 ]
 description = ""
 readme = "README.md"
 license = {text = "LICENSE"}
 keywords = ["mongodb","duckdb","couchdb","rabbitmq","file","read","write","s3","sqlite"]
 classifiers = [
 "License :: OSI Approved :: MIT License",
    "Topic :: Utilities",
 ]
 dependencies = [
    "termcolor","sqlalchemy", "aiosqlite","duckdb-engine",
    "mysql-connector-python","psycopg2-binary","nzpy","pymssql","duckdb-engine","aiosqlite",
    "typer","pandas","numpy","sqlalchemy","pyarrow","smart-open",
    "plugin-ix@git+https://github.com/lnyemba/plugins-ix"
 ]
 [project.optional-dependencies]
 #sql         = ["mysql-connector-python","psycopg2-binary","nzpy","pymssql","duckdb-engine","aiosqlite"]
 nosql       = ["pymongo","cloudant"]
 cloud       = ["boto","boto3","botocore","pyncclient","pandas-gbq","google-cloud-bigquery","google-cloud-bigquery-storage", "databricks-sqlalchemy","pyncclient","boto3","boto","botocore"]
 warehouse   = ["pydrill","pyspark","sqlalchemy_drill"]
 other       = ["pika","flask-session"]
 all         = ["pymongo","cloudant","pandas-gbq","google-cloud-bigquery","google-cloud-bigquery-storage", "databricks-sqlalchemy","pyncclient","boto3","boto","botocore","pydrill","pyspark","sqlalchemy_drill", "pika","aiosqlite","boto3","boto","botocore", "pyncclient"]
 [project.urls]
 Homepage = "https://healthcareio.the-phi.com/git/code/transport.git"
 #[project.scripts]
 #transport = "transport:main"
 [tool.setuptools]
 include-package-data = true
 zip-safe = false
 script-files = ["bin/transport","bin/transport.cmd"]
 [tool.setuptools.packages.find]
 include = [ "transport", "transport.*"]
 [tool.setuptools.dynamic]
 version = {attr = "transport.info.__version__"}
 #authors = {attr = "transport.__author__"}
 # If you have a info.py file, you might also want to include the author dynamically:
 # [tool.setuptools.dynamic]
 # version = {attr = "info.__version__"}
 # authors = {attr = "info.__author__"}
--- a/setup.py
+++ b/setup.py
@ -1,28 +0,0 @@
 """
 This is a build file for the 
 """
 from setuptools import setup, find_packages
 import os
 import sys
 # from version import __version__,__author__
 from info import __version__, __author__,__app_name__,__license__
 def read(fname):
    return open(os.path.join(os.path.dirname(__file__), fname)).read() 
 args    = {
    "name":__app_name__,
    "version":__version__,
    "author":__author__,"author_email":"info@the-phi.com",
    "license":__license__,
    # "packages":["transport","info","transport/sql"]},
    "packages": find_packages(include=['info','transport', 'transport.*'])}
 args["keywords"]=['mongodb','duckdb','couchdb','rabbitmq','file','read','write','s3','sqlite']
 args["install_requires"] = ['pyncclient','duckdb-engine','pymongo','sqlalchemy','pandas','typer','pandas-gbq','numpy','cloudant','pika','nzpy','termcolor','boto3','boto','pyarrow','google-cloud-bigquery','google-cloud-bigquery-storage','flask-session','smart_open','botocore','psycopg2-binary','mysql-connector-python','numpy','pymssql']
 args["url"] =   "https://healthcareio.the-phi.com/git/code/transport.git"
 args['scripts'] = ['bin/transport']
 # if sys.version_info[0] == 2 :
 #     args['use_2to3'] = True
 #     args['use_2to3_exclude_fixers']=['lib2to3.fixes.fix_import']
 setup(**args)
--- a/transport/init.py
+++ b/transport/init.py
@ -18,31 +18,55 @@ Source Code is available under MIT License:
 """
 import numpy as np
-from transport import sql, nosql, cloud, other
+#from transport import sql, nosql, cloud, other, warehouse
 from transport import sql 
 try:
    from transport import  nosql
 except Exception as e:
    nosql = {}
 try:
    from transport import  cloud
 except Exception as e:
    cloud = {}
 try:
    from transport import  warehouse
 except Exception as e:
    warehouse= {}
 try:
    from transport import  other
 except Exception as e:
    other = {}
 import pandas as pd
 import json
 import os
-from info import __version__,__author__,__email__,__license__,__app_name__,__whatsnew__
+from transport.info import __version__,__author__,__email__,__license__,__app_name__,__whatsnew__,__edition__
 from transport.iowrapper import IWriter, IReader, IETL
 from transport.plugins import PluginLoader
 from transport import providers
 import copy 
 from transport import registry
-
+from transport.plugins import Plugin 
 PROVIDERS = {}
 def init():
    global PROVIDERS
-    for _module in [cloud,sql,nosql,other] :
+    for _module in [cloud,sql,nosql,other,warehouse] :
        for _provider_name in dir(_module) :
-            if _provider_name.startswith('__') or _provider_name == 'common':
+            if _provider_name.startswith('__') or _provider_name == 'common' or type(_module) in [None,str,dict]:
                continue
            PROVIDERS[_provider_name] = {'module':getattr(_module,_provider_name),'type':_module.__name__}
-def _getauthfile (path) :
+    #
-    f = open(path)
+    # loading the registry
-    _object = json.loads(f.read())
+    if not registry.isloaded() :
-    f.close()
+        registry.load()
-    return _object
+
 # def _getauthfile (path) :
 #     f = open(path)
 #     _object = json.loads(f.read())
 #     f.close()
 #     return _object
 def instance (**_args):
    """
    This function returns an object of to read or write from a supported database provider/vendor
@ -52,16 +76,7 @@ def instance (**_args):
    kwargs      These are arguments that are provider/vendor specific
    """
    global PROVIDERS
-    # if not registry.isloaded () :
+    
    #     if ('path' in _args and registry.exists(_args['path'] )) or registry.exists():
    #         registry.load() if 'path' not in _args else registry.load(_args['path'])
    #         print ([' GOT IT'])
    # if 'label' in _args and registry.isloaded():
    #     _info = registry.get(_args['label'])
    #     if _info :
    #         #
    #         _args = dict(_args,**_info)
    if 'auth_file' in _args:
        if os.path.exists(_args['auth_file']) :
            #
@ -78,7 +93,7 @@ def instance (**_args):
            filename = _args['auth_file']
            raise Exception(f" {filename} was not found or is invalid")
    if 'provider' not in _args and 'auth_file' not in _args :
-        if not registry.isloaded () :
+        if not registry.isloaded () : 
            if ('path' in _args and registry.exists(_args['path'] )) or registry.exists():
                registry.load() if 'path' not in _args else registry.load(_args['path'])
        _info = {}
@ -87,8 +102,6 @@ def instance (**_args):
        else:
            _info = registry.get()    
        if _info :
            #
            # _args = dict(_args,**_info)
            _args = dict(_info,**_args) #-- we can override the registry parameters with our own arguments
    if 'provider' in _args and _args['provider'] in PROVIDERS :
@ -119,8 +132,32 @@ def instance (**_args):
        #         for _delegate in _params :
        #             loader.set(_delegate)
-        loader = None if 'plugins' not in _args else _args['plugins']
+        _plugins = None if 'plugins' not in _args else _args['plugins']
-        return IReader(_agent,loader) if _context == 'read' else IWriter(_agent,loader)
+        
        # if registry.has('logger') :
        #     _kwa = registry.get('logger')
        #     _lmodule = getPROVIDERS[_kwa['provider']]
        if ( ('label' in _args and _args['label'] != 'logger') and registry.has('logger')):
            #
            # We did not request label called logger, so we are setting up a logger if it is specified in the registry
            #
            _kwargs = registry.get('logger')
            _kwargs['context']  = 'write'
            _kwargs['table']    =_module.__name__.split('.')[-1]+'_logs'
            # _logger = instance(**_kwargs)
            _module = PROVIDERS[_kwargs['provider']]['module']
            _logger = getattr(_module,'Writer')
            _logger = _logger(**_kwargs)
        else:
            _logger = None
        _kwargs = {'agent':_agent,'plugins':_plugins,'logger':_logger}
        if 'args' in _args :
            _kwargs['args'] = _args['args']
        # _datatransport =  IReader(_agent,_plugins,_logger) if _context == 'read' else IWriter(_agent,_plugins,_logger)
        _datatransport =  IReader(**_kwargs) if _context == 'read' else IWriter(**_kwargs)
        return _datatransport
    else:
        #
@ -137,7 +174,14 @@ class get :
        if not _args or ('provider' not in _args and 'label' not in _args):
            _args['label'] = 'default'
        _args['context'] = 'read'
-        return instance(**_args)
+        # return instance(**_args)
        # _args['logger'] = instance(**{'label':'logger','context':'write','table':'logs'})
        _handler =  instance(**_args)
        # _handler.setLogger(get.logger())
        return _handler
    @staticmethod
    def writer(**_args):
        """
@ -146,10 +190,26 @@ class get :
        if not _args or ('provider' not in _args and 'label' not in _args):
            _args['label'] = 'default'
        _args['context'] = 'write'
-        return instance(**_args)
+        # _args['logger'] = instance(**{'label':'logger','context':'write','table':'logs'})
        _handler =  instance(**_args)
        #
        # Implementing logging with the 'eat-your-own-dog-food' approach
        # Using dependency injection to set the logger (problem with imports)
        #
        # _handler.setLogger(get.logger())
        return _handler
    @staticmethod
    def logger ():
        if registry.has('logger') :
            _args = registry.get('logger')
            _args['context']  = 'write'
            return instance(**_args)
        return None
    @staticmethod
    def etl (**_args):
        if 'source' in _args and 'target' in _args :
            return IETL(**_args)
        else:
            raise Exception ("Malformed input found, object must have both 'source' and 'target' attributes")
--- a/transport/cloud/bigquery.py
+++ b/transport/cloud/bigquery.py
@ -14,7 +14,11 @@ import numpy as np
 import time
 MAX_CHUNK = 2000000
 def template ():
    return {'provider':'bigquery','private_key':'path-to-key','dataset':'name-of-dataset','table':'table','chunksize':MAX_CHUNK}
 class BigQuery:
    __template__= {"private_key":None,"dataset":None,"table":None}
    def __init__(self,**_args):
        path = _args['service_key'] if 'service_key' in _args else _args['private_key']
        self.credentials = service_account.Credentials.from_service_account_file(path)
@ -23,6 +27,7 @@ class BigQuery:
        self.dtypes = _args['dtypes'] if 'dtypes' in _args else None
        self.table = _args['table'] if 'table' in _args else None
        self.client = bq.Client.from_service_account_json(self.path)
        self._chunksize = int(_args['chunksize']) if 'chunksize' in _args else None
    def meta(self,**_args):
        """
        This function returns meta data for a given table or query with dataset/table properly formatted
@ -80,7 +85,14 @@ class Reader (BigQuery):
            SQL += " LIMIT "+str(_args['limit'])
        if (':dataset' in SQL or ':DATASET' in SQL)  and self.dataset:
            SQL = SQL.replace(':dataset',self.dataset).replace(':DATASET',self.dataset)
-        _info = {'credentials':self.credentials,'dialect':'standard'}       
+        _info = {'credentials':self.credentials,'dialect':'standard'}  
        #
        # @Ent-Feature : adding streaming capability here 
        # 
        if 'chunksize' in _args :
            self._chunksize = int(_args['chunksize'])
        if self._chunksize :
            _info['chunksize'] = self._chunksize 
        return pd_gbq.read_gbq(SQL,**_info) if SQL else None  
        # return self.client.query(SQL).to_dataframe() if SQL else None
--- a/transport/cloud/databricks.py
+++ b/transport/cloud/databricks.py
@ -17,6 +17,8 @@ import sqlalchemy
 # from transport.common import Reader,Writer
 import pandas as pd
 def template ():
    return {'provider':'databricks','host':'fqn-host','token':'token','cluster_path':'path-of-cluster','catalog':'name-of-catalog','database':'schema-or-database','table':'table','chunksize':10000}
 class Bricks:
    """
@ -26,6 +28,7 @@ class Bricks:
    :cluster_path
    :table
    """
    __template__ = {"host":None,"token":None,"cluster_path":None,"catalog":None,"schema":None}
    def __init__(self,**_args):
        _host = _args['host']
        _token= _args['token']
@ -41,6 +44,7 @@ class Bricks:
        _uri = f'''databricks+connector://token:{_token}@{_host}?http_path={_cluster_path}&catalog={_catalog}&schema={self._schema}'''
        self._engine = sqlalchemy.create_engine (_uri)
        self._chunksize = int(_args['chunksize']) if 'chunksize' in _args else None
        pass
    def meta(self,**_args):
        table = _args['table'] if 'table' in _args else self._table
@ -63,7 +67,14 @@ class Bricks:
    def apply(self,_sql):
        try:
            if _sql.lower().startswith('select') :
-                return pd.read_sql(_sql,self._engine)
+                #
                # @ENT-Feature:  adding streaming functions/variables
                if not self._chunksize :
                    return pd.read_sql(_sql,self._engine)
                else:
                    return pd.read_sql(_sql,self._engine,chunksize=self._chunksize)
        except Exception as e:
            pass
@ -83,7 +94,10 @@ class Reader(Bricks):
            sql = f'SELECT * FROM {table}'
        if limit :
            sql = sql + f' LIMIT {limit}'
-
+        #
        # @ENT-Feature:  adding streaming functions/variables
        if 'chunksize' in _args :
            self._chunksize = int(_args['chunksize'])
        if 'sql' in _args or 'table' in _args :            
            return self.apply(sql)
        else:
--- a/transport/cloud/nextcloud.py
+++ b/transport/cloud/nextcloud.py
@ -8,8 +8,10 @@ import pandas as pd
 from io import StringIO
 import json
 import nextcloud_client as nextcloud
-
+def template():
    return {"url":None,"token":None,"uid":None,"file":None}
 class Nextcloud :
    __template__={"url":None,"token":None,"uid":None,"file":None}
    def __init__(self,**_args):
        pass
        self._delimiter = None
--- a/transport/cloud/s3.py
+++ b/transport/cloud/s3.py
@ -20,10 +20,14 @@ from io import StringIO
 import pandas as pd
 import json
 def template():
 	return {'access_key':'access-key','secret_key':'secret-key','region':'region','bucket':'name-of-bucket','file':'file-name','chunksize':10000}
 class s3 :
 	"""
 		@TODO: Implement a search function for a file given a bucket??
 	"""
 	__template__={"access_key":None,"secret_key":None,"bucket":None,"file":None,"region":None}
 	def __init__(self,**args) :
 		"""
 			This function will extract a file or set of files from s3 bucket provided
@ -37,6 +41,7 @@ class s3 :
 			self._bucket_name = args['bucket']	
 			self._file_name = args['file']
 			self._region = args['region']
 			self._chunksize = int(args['chunksize']) if 'chunksize' in args else None
 		except Exception as e :
 			print (e)
 			pass
@ -88,7 +93,10 @@ class Reader(s3) :
 		if not _stream :
 			return None
 		if _object['ContentType'] in ['text/csv'] :
-			return pd.read_csv(StringIO(str(_stream).replace("\\n","\n").replace("\\r","").replace("\'","")))
+			if not self._chunksize :
 				return pd.read_csv(StringIO(str(_stream).replace("\\n","\n").replace("\\r","").replace("\'","")))
 			else:
 				return pd.read_csv(StringIO(str(_stream).replace("\\n","\n").replace("\\r","").replace("\'","")),chunksize=self._chunksize)
 		else:
 			return _stream
--- a/transport/duck.py
+++ b/transport/duck.py
@ -1,19 +0,0 @@
 """
 This file will be intended to handle duckdb database
 """
 import duckdb
 from transport.common import Reader,Writer
 class Duck(Reader):
    def __init__(self,**_args):
        super().__init__(**_args)
        self._path = None if 'path' not in _args else _args['path']
        self._handler = duckdb.connect() if not self._path else duckdb.connect(self._path)
 class DuckReader(Duck) :
    def __init__(self,**_args):
        super().__init__(**_args)
    def read(self,**_args) :
        pass
--- a/transport/info.py
+++ b/transport/info.py
@ -0,0 +1,25 @@
 __app_name__  = 'data-transport'
 __author__ = 'Steve L. Nyemba'
 __version__= '2.4.34'
 __edition__= 'enterprise'
 __email__  = "info@the-phi.com"
 __license__=f"""
 Copyright 2010 - 2024, Steve L. Nyemba
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 """
 __whatsnew__=f"""version {__version__}, 
 1. Added support for read/write logs as well as plugins (when applied)
 2. Bug fix with duckdb (adding readonly) for readers because there are issues with threads & processes
 3. support for streaming data, important to use this with large volumes of data
 """
--- a/transport/iowrapper.py
+++ b/transport/iowrapper.py
@ -5,40 +5,103 @@ NOTE: Plugins are converted to a pipeline, so we apply a pipeline when reading o
        - upon initialization we will load plugins
        - on read/write we apply a pipeline (if passed as an argument)
 """    
-from transport.plugins import plugin, PluginLoader
+from transport.plugins import PluginLoader
 import transport
 from transport import providers
-from multiprocessing import Process
+from multiprocessing import Process, RLock
 import time
 import types
 from . import registry
 from datetime import datetime
 import pandas as pd
 import numpy as np
 import os
 import sys
 import itertools
 import json
 import plugin_ix
-class IO:
+class BaseIO :
    def __init__(self,**_args):
        self._logger = _args['logger'] if 'logger' in _args else None
        self._logTable = 'logs' if 'logTable' not in _args else _args['logTable']
    def setLogger(self,_logger):
        self._logger = _logger        
    def log (self,**_args):
        if self._logger :
            _date = str(datetime.now())                                    
            _data = dict({'pid':os.getpid(),'date':_date[:10],'time':_date[11:19]},**_args)
            for key in _data :
                if type(_data[key]) == list :
                    _data[key] = [_item.__name__ if type(_item).__name__== 'function' else _item for _item in _data[key]]
                _data[key] = str(_data[key]) if type(_data[key]) not in [list,dict] else json.dumps(_data[key])
            self._logger.write(pd.DataFrame([_data])) #,table=self._logTable)
 class IO(BaseIO):
    """
    Base wrapper class for read/write and support for logs
    """
-    def __init__(self,_agent,plugins):
+    def __init__(self,**_args):
        #
        # We need to initialize the logger here ...
        #
        super().__init__(**_args)
        _agent  = _args['agent']
        plugins = _args['plugins'] if 'plugins' else None
        # _logger = _args['logger'] if 'logger' in _args else None
        # self._logger = _logger if not type(_agent) in [IReader,IWriter] else _agent._logger #transport.get.writer(label='logger') #if registry.has('logger') else None
        # if not _logger and hasattr(_agent,'_logger') :
        #     self._logger = getattr(_agent,'_logger')
        self._agent = _agent
        _date = _date = str(datetime.now())
        self._ixloader = plugin_ix.Loader (registry=plugin_ix.Registry(folder=transport.registry.REGISTRY_PATH))
        # self._logTable = 'logs' #'_'.join(['logs',_date[:10]+_date[11:19]]).replace(':','').replace('-','_')
        if plugins :
-            self._init_plugins(plugins)
+            self.init_plugins(plugins)
-        else:
+    # def setLogger(self,_logger):
-            self._plugins = None
+    #     self._logger = _logger        
    # def log (self,**_args):
    #     if self._logger :
    #         _date = str(datetime.now())                                    
    #         _data = dict({'pid':os.getpid(),'date':_date[:10],'time':_date[11:19]},**_args)
    #         for key in _data :
    #             if type(_data[key]) == list :
    #                 _data[key] = [_item.__name__ if type(_item).__name__== 'function' else _item for _item in _data[key]]
    #             _data[key] = str(_data[key]) if type(_data[key]) not in [list,dict] else json.dumps(_data[key])
    #         self._logger.write(pd.DataFrame([_data])) #,table=self._logTable)
    # def _init_plugins(self,_items):
    #     """
    #     This function will load pipelined functions as a plugin loader
    #     """
    #     registry.plugins.init()
    #     self._plugins = PluginLoader(registry=registry.plugins)
    #     [self._plugins.set(_name) for _name in _items]
-    def _init_plugins(self,_args):
+    #     self.log(action='init-plugins',object=self.getClassName(self),input =[_name for _name in _items])
-        """
+    #     # if 'path' in _args and 'names' in _args :
-        This function will load pipelined functions as a plugin loader
+    #     #     self._plugins = PluginLoader(**_args)
-        """
+    #     # else:
-        if 'path' in _args and 'names' in _args :
+    #     #     self._plugins = PluginLoader(registry=registry.plugins)
-            self._plugins = PluginLoader(**_args)
+    #     #     [self._plugins.set(_pointer) for _pointer in _args]
-        else:
+    #     #
-            self._plugins = PluginLoader()
+    #     # @TODO: We should have a way to log what plugins are loaded and ready to use
            [self._plugins.set(_pointer) for _pointer in _args]
        #
        # @TODO: We should have a way to log what plugins are loaded and ready to use
    def meta (self,**_args):
        if hasattr(self._agent,'meta') :
            return self._agent.meta(**_args)
        return []
-
+    def getClassName (self,_object):
        return '.'.join([_object.__class__.__module__,_object.__class__.__name__])
    def close(self):
        if hasattr(self._agent,'close') :
            self._agent.close()
@ -48,6 +111,7 @@ class IO:
        """
        for _pointer in self._plugins :
            _data = _pointer(_data)
            time.sleep(1)
    def apply(self,_query):
        if hasattr(self._agent,'apply') :
            return self._agent.apply(_query)
@ -59,62 +123,168 @@ class IO:
            pointer = getattr(self._agent,_name)
            return pointer(_query)
        return None
    def init_plugins(self,plugins):
        for _ref in plugins :
            self._ixloader.set(_ref)
 class IReader(IO):
    """
    This is a wrapper for read functionalities
    """
-    def __init__(self,_agent,pipeline=None):
+    def __init__(self,**_args):
-        super().__init__(_agent,pipeline)
+        super().__init__(**_args)
        self._args = _args['args']if 'args' in _args else None
    def _stream (self,_data ):
        # self.log(action='streaming',object=self._agent._engine.name, input= type(_data).__name__)
        _shape = []
        for _segment in _data :     
            _shape += list(_segment.shape)
            if self._plugins :
                # yield self._plugins.apply(_segment,self.log)
                yield self._ixloader.visitor(_data,self.log)
            else:
                yield _segment
        _objectName = '.'.join([self._agent.__class__.__module__,self._agent.__class__.__name__])
        _input = {'shape':_shape}
        if hasattr(self._agent,'_table') :
            _input['table'] = self._agent._table
        self.log(action='streaming',object=_objectName, input= _input)
    def read(self,**_args):
        if 'plugins' in _args :
-            self._init_plugins(_args['plugins'])
+            self.init_plugins(_args['plugins'])
-        _data = self._agent.read(**_args)
+
-        if self._plugins and self._plugins.ratio() > 0 :
+        if self._args :
-            _data = self._plugins.apply(_data)
+            _data = self._agent.read(**self._args)
-        #
+        else:
-        # output data 
+            _data = self._agent.read(**_args)
-        return _data
+        
        _objectName = '.'.join([self._agent.__class__.__module__,self._agent.__class__.__name__])
        if types.GeneratorType == type(_data):
            return self._stream(_data)
            # if self._plugins :
            #     return self._stream(_data) 
            # else:
            #     _count = 0
            #     for _segment in _data :
            #         _count += 1
            #         yield _segment
            #     self.log(action='streaming',object=_objectName, input= {'segments':_count})
                # return _data
        elif type(_data) == pd.DataFrame :
            _shape = _data.shape #[0,0] if not _data.shape[] else list(_data.shape)
            _input = {'shape':_shape}
            if hasattr(self._agent,'_table') :
                _input['table'] = self._agent._table
            self.log(action='read',object=_objectName, input=_input)
            _data = self._ixloader.visitor(_data)
            return _data
 class IWriter(IO):
-    def __init__(self,_agent,pipeline=None):
+    lock = RLock()
-        super().__init__(_agent,pipeline)  
+    def __init__(self,**_args):
        super().__init__(**_args)
    def write(self,_data,**_args):
        if 'plugins' in _args :
            self._init_plugins(_args['plugins'])
-        if self._plugins and self._plugins.ratio() > 0 :
+        # if self._plugins and self._plugins.ratio() > 0 :
-            _data = self._plugins.apply(_data)
+        #     _logs = []
-
+        #     _data = self._plugins.apply(_data,_logs,self.log)
-        self._agent.write(_data,**_args)
+            
            # [self.log(**_item) for _item in _logs]
        try:
            # IWriter.lock.acquire()
            _data = self._ixloader.visitor(_data)
            self._agent.write(_data,**_args)
        finally:
            # IWriter.lock.release()
            pass
 #
 # The ETL object in its simplest form is an aggregation of read/write objects
 # @TODO: ETL can/should aggregate a writer as a plugin and apply it as a process
-class IETL(IReader) :
+class IETL(BaseIO) :
    """
    This class performs an ETL operation by ineriting a read and adding writes as pipeline functions
    """
    def __init__(self,**_args):
        super().__init__(transport.get.reader(**_args['source']))
        if 'target' in _args:
            self._targets = _args['target'] if type(_args['target']) == list else [_args['target']]
        else:
            self._targets = []
        self.jobs = []
        #
        # If the parent is already multiprocessing
        self._hasParentProcess = False if 'hasParentProcess' not in _args else _args['hasParentProcess']
    def read(self,**_args):
        _data = super().read(**_args)
        for _kwargs in self._targets :
            self.post(_data,**_kwargs)
        super().__init__()
        self._source = _args['source']
        self._targets= _args['target'] if type(_args['target']) == list else [_args['target']]
        #
        # ETL Initialization, we should provide some measure of context ...
        #
    # def run(self) :
    #     """
    #     We should apply the etl here, if we are in multiprocessing mode
    #     """
    #     return self.read()
    def run(self,**_args):
        # _data = super().read(**_args) if not self._sourceArgs else super().read(**self._sourceArgs)
        # self._targets = [transport.get.writer(**_kwargs) for _kwargs in self._targets]
        _reader = transport.get.reader(**self._source)
        if hasattr(_reader,'_logger') :
            self.setLogger(_reader._logger)
        self.log(action='init-etl',input={'source':self._source,'target':self._targets})
        _data = _reader.read(**self._source['args'])if 'args' in self._source else _reader.read()
        _reader.close()
        _writers = [transport.get.writer(**_kwargs) for _kwargs in self._targets]
        # _schema = [] if not getattr(_reader._agent,'_table') else _reader.meta()
        _schema = [] if not hasattr(_reader._agent,'_table') else _reader.meta()
        if types.GeneratorType == type(_data):
            _index = 0
            for _segment in _data :
                _index += 1
                for _writer in _writers :
                    self.post(_segment,writer=_writer,index=_index,schema=_schema)
                    time.sleep(1)
        else:
            for _writer in _writers :
                self.post(_data,writer=_writer,schema=_schema)
        #     pass
        return _data
        # return _data
    def post (self,_data,**_args) :
        """
        This function returns an instance of a process that will perform the write operation
        :_args  parameters associated with writer object
        """
-        writer = transport.get.writer(**_args)
+        #writer = transport.get.writer(**_args)
-        writer.write(_data)
+        _input = {}
-        writer.close()
+        try:
            _action = 'post'
            _shape = dict(zip(['rows','columns'],_data.shape))
            _index = _args['index'] if 'index' in _args else 0
            writer = _args['writer']
            _schema= _args['schema']
            #
            # -- things to log
            _input = {'shape':_shape,'segment':_index}
            if hasattr(writer._agent,'_table'):
                _input['table'] = writer._agent._table
            for _item in _schema :
                if _item['type'] in ['INTEGER','BIGINT','INT'] :
                    _column  = _item['name']
                    _data[_column] = _data[_column].copy().fillna(0).astype(np.int64)
            writer.write(_data)
        except Exception as e:
            _action = 'post-error'
            _input['error'] = str(e)
            print ([e])
            pass
        self.log(action=_action,object=writer._agent.__module__, input= _input)
--- a/transport/nosql/couchdb.py
+++ b/transport/nosql/couchdb.py
@ -11,7 +11,8 @@ import sys
 # from transport.common import Reader, Writer
 from datetime import datetime
-
+def template():
 	return {'dbname':'database','doc':'document','username':'username','password':'password','url':'url-with-port'}
 class Couch:
 	"""
 	This class is a wrapper for read/write against couchdb. The class captures common operations for read/write.
@ -19,6 +20,7 @@ class Couch:
 		@param	doc		user id involved
 		@param	dbname		database name (target)
 	"""
 	__template__={"url":None,"doc":None,"dbname":None,"username":None,"password":None}
 	def __init__(self,**args):
 		url 		= args['url'] if 'url' in args else 'http://localhost:5984'
 		self._id 	= args['doc']
--- a/transport/nosql/mongodb.py
+++ b/transport/nosql/mongodb.py
@ -20,11 +20,15 @@ import re
 from multiprocessing import Lock, RLock
 from transport.common import IEncoder
 def template():
    return {'provider':'mongodb','host':'localhost','port':27017,'db':'db-name','collection':'collection-name','username':'username','password':'password','mechanism':'SCRAM-SHA-256'}
 class Mongo :
    lock = RLock()
    """
    Basic mongodb functions are captured here
    """
    __template__={"db":None,"collection":None,"host":None,"port":None,"username":None,"password":None}
    def __init__(self,**args):
        """
            :dbname     database name/identifier
--- a/transport/other/callback.py
+++ b/transport/other/callback.py
@ -9,7 +9,7 @@ import numpy as np
 import pandas as pd
 class Writer :
-    lock = Lock()
+    lock = None
    _queue = {'default':queue.Queue()}
    def __init__(self,**_args):
        self._cache     = {}
--- a/transport/other/files.py
+++ b/transport/other/files.py
@ -4,6 +4,10 @@ This file is a wrapper around pandas built-in functionalities to handle characte
 import pandas as pd
 import numpy as np
 import os
 def template():
 	return {'path':None,'delimiter':None}
 class File :	
 	def __init__(self,**params):
 		"""
@ -12,7 +16,7 @@ class File :
 		"""
 		self.path 		= params['path'] if 'path' in params else None
 		self.delimiter	= params['delimiter'] if 'delimiter' in params else ','
-		
+		self._chunksize = None if 'chunksize' not in params else int(params['chunksize'])
 	def isready(self):
 		return os.path.exists(self.path) 
 	def meta(self,**_args):
@ -26,11 +30,19 @@ class Reader (File):
 	def __init__(self,**_args):
 		super().__init__(**_args)
-
+	def _stream(self,path) :
 		reader = pd.read_csv(path,sep=self.delimiter,chunksize=self._chunksize,low_memory=False)
 		for segment in reader :
 			yield segment
 	def read(self,**args):
 		_path = self.path if 'path' not in args else args['path']
 		_delimiter = self.delimiter if 'delimiter' not in args else args['delimiter']
-		return pd.read_csv(_path,delimiter=self.delimiter)
+
 		_df =  pd.read_csv(_path,sep=self.delimiter) if not self._chunksize else self._stream(_path)
 		if 'query' in args :
 			_query = args['query']
 			_df = _df.query(_query)
 		return _df
 	def stream(self,**args):
 		raise Exception ("streaming needs to be implemented")
 class Writer (File):
@ -66,4 +78,4 @@ class Writer (File):
 			pass
 		finally:
 			# DiskWriter.THREAD_LOCK.release()
-			pass
+			pass
--- a/transport/other/http.py
+++ b/transport/other/http.py
@ -7,6 +7,8 @@ import requests
 from io import StringIO
 import pandas as pd
 def template():
 	return {'url':None,'headers':{'key':'value'}}
 class Reader:
 	"""
--- a/transport/other/rabbitmq.py
+++ b/transport/other/rabbitmq.py
@ -17,6 +17,10 @@ import sys
 # 	from common import Reader, Writer
 import json
 from multiprocessing import RLock
 def template():
 	return {'port':5672,'host':'localhost','queue':None,'vhost':None,'username':None,'password':None}
 class MessageQueue:
 	"""
 		This class hierarchy is designed to handle interactions with a queue server using pika framework (our tests are based on rabbitmq)
--- a/transport/plugins/init.py
+++ b/transport/plugins/init.py
@ -11,8 +11,10 @@ import importlib as IL
 import importlib.util
 import sys
 import os
 import pandas as pd
 import time
-class plugin :
+class Plugin :
    """
    Implementing function decorator for data-transport plugins (post-pre)-processing
    """
@ -22,8 +24,9 @@ class plugin :
        :mode   restrict to reader/writer
        :about  tell what the function is about    
        """
-        self._name = _args['name']
+        self._name = _args['name'] if 'name' in _args else None
-        self._about = _args['about']
+        self._version = _args['version'] if 'version' in _args else '0.1'
        self._doc = _args['doc'] if 'doc' in _args else "N/A"
        self._mode = _args['mode'] if 'mode' in _args else 'rw'
    def __call__(self,pointer,**kwargs):
        def wrapper(_args,**kwargs):
@ -32,57 +35,64 @@ class plugin :
        # @TODO:
        # add attributes to the wrapper object
        #
        self._name = pointer.__name__ if not self._name else self._name
        setattr(wrapper,'transport',True)
        setattr(wrapper,'name',self._name)
-        setattr(wrapper,'mode',self._mode)
+        setattr(wrapper,'version',self._version)
-        setattr(wrapper,'about',self._about)
+        setattr(wrapper,'doc',self._doc)
        return wrapper
 class PluginLoader :
    """
    This class is intended to load a plugin and make it available and assess the quality of the developed plugin
    """
    def __init__(self,**_args):
        """
        :path   location of the plugin (should be a single file)
        :_names of functions to load
        """
-        _names = _args['names'] if 'names' in _args else None
+        # _names = _args['names'] if 'names' in _args else None
-        path = _args['path'] if 'path' in _args else None
+        # path = _args['path'] if 'path' in _args else None
-        self._names = _names if type(_names) == list else [_names]
+        # self._names = _names if type(_names) == list else [_names]
        self._modules = {}
        self._names = []
-        if path and os.path.exists(path) and _names:
+        self._registry = _args['registry']
-            for _name in self._names :
+
                spec = importlib.util.spec_from_file_location('private', path)
                module = importlib.util.module_from_spec(spec)
                spec.loader.exec_module(module) #--loads it into sys.modules
                if hasattr(module,_name) :
                    if self.isplugin(module,_name) :
                        self._modules[_name] = getattr(module,_name)
                    else:
                        print ([f'Found {_name}', 'not plugin'])
                else:
                    #
                    # @TODO: We should log this somewhere some how
                    print (['skipping ',_name, hasattr(module,_name)])
                    pass
        else:
            #
            # Initialization is empty
            self._names = []
        pass
-    def set(self,_pointer) :
+    def load (self,**_args):
        self._modules = {}
        self._names = []
        path = _args ['path']
        if os.path.exists(path) :
            _alias = path.split(os.sep)[-1]
            spec = importlib.util.spec_from_file_location(_alias, path)
            module = importlib.util.module_from_spec(spec)
            spec.loader.exec_module(module) #--loads it into sys.modules
            for _name in dir(module) :
                if self.isplugin(module,_name) :
                    self._module[_name] = getattr(module,_name)
                    # self._names [_name]
    def format (self,**_args):
        uri = _args['alias'],_args['name']
    # def set(self,_pointer) :
    def set(self,_key) :
        """
        This function will set a pointer to the list of modules to be called
        This should be used within the context of using the framework as a library
        """
-        _name = _pointer.__name__
+        if type(_key).__name__ == 'function':
            #
            # The pointer is in the code provided by the user and loaded in memory
            #
            _pointer = _key
            _key = 'inline@'+_key.__name__
            # self._names.append(_key.__name__)
        else:
            _pointer = self._registry.get(key=_key)
        if _pointer  :
            self._modules[_key] = _pointer
            self._names.append(_key)
        self._modules[_name] = _pointer
        self._names.append(_name)
    def isplugin(self,module,name):
        """
        This function determines if a module is a recognized plugin
@ -107,12 +117,31 @@ class PluginLoader :
        _n = len(self._names)
        return len(set(self._modules.keys()) & set (self._names)) / _n
-    def apply(self,_data):
+    def apply(self,_data,_logger=[]):
        _input= {}
        for _name in self._modules :
-            _pointer = self._modules[_name]
+            try:
-            #
+                _input = {'action':'plugin','object':_name,'input':{'status':'PASS'}}
-            # @TODO: add exception handling
+                _pointer = self._modules[_name]
-            _data = _pointer(_data)
+                if type(_data) == list :
                    _data = pd.DataFrame(_data)
                _brow,_bcol = list(_data.shape) 
                #
                # @TODO: add exception handling
                _data = _pointer(_data)
                _input['input']['shape'] = {'rows-dropped':_brow - _data.shape[0]}
            except Exception as e:
                _input['input']['status'] = 'FAILED'
                print (e)
            time.sleep(1)
            if _logger:
                try:
                    _logger(**_input)
                except Exception as e:
                    pass    
        return _data
    # def apply(self,_data,_name):
    #     """
--- a/transport/providers/init.py
+++ b/transport/providers/init.py
@ -11,7 +11,7 @@ BIGQUERY	='bigquery'
 FILE 	= 'file'
 ETL = 'etl'
-SQLITE = 'sqlite'
+SQLITE = 'sqlite3'
 SQLITE3= 'sqlite3'
 DUCKDB = 'duckdb'
@ -44,7 +44,9 @@ PGSQL	= POSTGRESQL
 AWS_S3  = 's3'
 RABBIT = RABBITMQ
-
+ICEBERG='iceberg'
-
+APACHE_ICEBERG = 'iceberg'
 DRILL = 'drill'
 APACHE_DRILL = 'drill'
 # QLISTENER = 'qlistener'
--- a/transport/registry.py
+++ b/transport/registry.py
@ -1,53 +1,61 @@
 import os
 import json
-from info import __version__
+from transport.info import __version__
 import copy
 import transport
 import importlib
 import importlib.util
 import shutil
 from io import StringIO
 """
 This class manages data from the registry and allows (read only)
@TODO: add property to the DATA attribute
 """
-REGISTRY_PATH=os.sep.join([os.environ['HOME'],'.data-transport'])
+if 'HOME' in os.environ :
    REGISTRY_PATH=os.sep.join([os.environ['HOME'],'.data-transport'])
 else:
    REGISTRY_PATH=os.sep.join([os.environ['USERPROFILE'],'.data-transport'])
 #
 # This path can be overriden by an environment variable ...
 #
 if 'DATA_TRANSPORT_REGISTRY_PATH' in os.environ :
    REGISTRY_PATH = os.environ['DATA_TRANSPORT_REGISTRY_PATH']
 REGISTRY_FILE= 'transport-registry.json'
 DATA = {}
 def isloaded ():
    return DATA not in [{},None]
-def exists (path=REGISTRY_PATH) :
+def exists (path=REGISTRY_PATH,_file=REGISTRY_FILE) :
    """
    This function determines if there is a registry at all
    """
    p = os.path.exists(path)
-    q = os.path.exists( os.sep.join([path,REGISTRY_FILE]))
+    q = os.path.exists( os.sep.join([path,_file]))
    return p and q
-def load (_path=REGISTRY_PATH):
+def load (_path=REGISTRY_PATH,_file=REGISTRY_FILE):
    global DATA
    if exists(_path) :
-        path = os.sep.join([_path,REGISTRY_FILE])
+        path = os.sep.join([_path,_file])
        f = open(path)
        DATA = json.loads(f.read())
        f.close()
-def init (email,path=REGISTRY_PATH,override=False):
+def init (email,path=REGISTRY_PATH,override=False,_file=REGISTRY_FILE):
    """
    Initializing the registry and will raise an exception in the advent of an issue
    """
    p = '@' in email
-    q = False if '.' not in email else email.split('.')[-1] in ['edu','com','io','ai','org']
+    #q = False if '.' not in email else email.split('.')[-1] in ['edu','com','io','ai','org']
    q = len(email.split('.')[-1]) in [2,3]
    if p and q :
        _config = {"email":email,'version':__version__}
        if not os.path.exists(path):
            os.makedirs(path)
-        filename = os.sep.join([path,REGISTRY_FILE])
+        filename = os.sep.join([path,_file])
        if not os.path.exists(filename) or override == True :
            f = open(filename,'w')
@ -62,6 +70,8 @@ def init (email,path=REGISTRY_PATH,override=False):
 def lookup (label):
    global DATA
    return label in DATA
 has = lookup 
 def get (label='default') :
    global DATA
    return copy.copy(DATA[label]) if label in DATA else {}
@ -73,8 +83,11 @@ def set (label, auth_file, default=False,path=REGISTRY_PATH) :
    if label == 'default' :
        raise Exception ("""Invalid label name provided, please change the label name and use the switch""")
    reg_file = os.sep.join([path,REGISTRY_FILE])
-    if os.path.exists (auth_file) and os.path.exists(path) and os.path.exists(reg_file):
+    if os.path.exists(path) and os.path.exists(reg_file):
-        f = open(auth_file)
+        if type(auth_file) == str and os.path.exists (auth_file) :
            f = open(auth_file)
        elif type(auth_file) == StringIO:
            f = auth_file
        _info = json.loads(f.read())
        f.close()
        f = open(reg_file)
--- a/transport/sql/init.py
+++ b/transport/sql/init.py
@ -3,7 +3,7 @@ This namespace/package wrap the sql functionalities for a certain data-stores
    - netezza, postgresql, mysql and sqlite
    - mariadb, redshift (also included)
 """
-from . import postgresql, mysql, netezza, sqlite, sqlserver, duckdb
+from . import postgresql, mysql, netezza, sqlite3, sqlserver, duckdb
 #
@ -11,7 +11,7 @@ from . import postgresql, mysql, netezza, sqlite, sqlserver, duckdb
 #
 mariadb     = mysql
 redshift    = postgresql
-sqlite3     = sqlite
+# sqlite3     = sqlite
 # from transport import sql
--- a/transport/sql/common.py
+++ b/transport/sql/common.py
@ -1,11 +1,14 @@
 """
 This file encapsulates common operations associated with SQL databases via SQLAlchemy
-
+@ENT:
    - To support streaming (with generators) we the parameter chunksize which essentially enables streaming
 """
 import sqlalchemy as sqa
-from sqlalchemy import text 
+from sqlalchemy import text , MetaData, inspect
 import pandas as pd
 def template():
    return {'host':'localhost','database':'database','table':'table'}
 class Base:
    def __init__(self,**_args):
@ -13,7 +16,14 @@ class Base:
        self._port = None
        self._database = _args['database']
        self._table = _args['table'] if 'table' in _args else None
-        self._engine= sqa.create_engine(self._get_uri(**_args),future=True)
+        _uri = self._get_uri(**_args)
        if type(_uri) == str :
            self._engine= sqa.create_engine(_uri,future=True)
        else:
            _uri,_kwargs = _uri
            self._engine= sqa.create_engine(_uri,**_kwargs,future=True)
        self._chunksize = int(_args['chunksize']) if 'chunksize' in _args else None
    def _set_uri(self,**_args) :
        """
        :provider   provider
@ -34,21 +44,34 @@ class Base:
        :table  optional name of the table (can be fully qualified)
        """
        _table = self._table if 'table' not in _args else _args['table']
        _map = {'TINYINT':'INTEGER','BIGINT':'INTEGER','TEXT':'STRING','DOUBLE_PRECISION':'FLOAT','NUMERIC':'FLOAT','DECIMAL':'FLOAT','REAL':'FLOAT'}
        _schema = []
-        if _table :
+        # if _table :
-            if sqa.__version__.startswith('1.') :
+        #     if sqa.__version__.startswith('1.') :
-                _handler = sqa.MetaData(bind=self._engine)
+        #         _handler = sqa.MetaData(bind=self._engine)
-                _handler.reflect()
+        #         _handler.reflect()
-            else:
+        #     else:
-                #
+        #         #
-                # sqlalchemy's version 2.+
+        #         # sqlalchemy's version 2.+
-                _handler = sqa.MetaData()
+        #         _handler = sqa.MetaData()
-                _handler.reflect(bind=self._engine)
+        #         _handler.reflect(bind=self._engine)
-            #
+        #     #
-            # Let us extract the schema with the native types
+        #     # Let us extract the schema with the native types
-            _map = {'BIGINT':'INTEGER','TEXT':'STRING','DOUBLE_PRECISION':'FLOAT','NUMERIC':'FLOAT','DECIMAL':'FLOAT','REAL':'FLOAT'}
+        #     _map = {'BIGINT':'INTEGER','TEXT':'STRING','DOUBLE_PRECISION':'FLOAT','NUMERIC':'FLOAT','DECIMAL':'FLOAT','REAL':'FLOAT'}
-            _schema = [{"name":_attr.name,"type":_map.get(str(_attr.type),str(_attr.type))} for _attr in _handler.tables[_table].columns]
+        #     _schema = [{"name":_attr.name,"type":_map.get(str(_attr.type),str(_attr.type))} for _attr in _handler.tables[_table].columns]
-        return _schema
+        #
        try:
            if _table :
                _inspector = inspect(self._engine)
                _columns = _inspector.get_columns(_table)
                _schema = [{'name':column['name'],'type':_map.get(str(column['type']),str(column['type'])) } for column in _columns]
                return _schema
        except Exception as e:
            pass
        # else:
        return []
    def  has(self,**_args):
        return self.meta(**_args)
    def apply(self,sql):
@ -58,12 +81,16 @@ class Base:
        @TODO: Execution of stored procedures
        """
-        if sql.lower().startswith('select') or sql.lower().startswith('with') :
+        if sql.strip().lower().startswith('select') or sql.strip().lower().startswith('with') or sql.strip().startswith('show'):
            if not self._chunksize:
                return pd.read_sql(sql,self._engine) 
            else:
                return pd.read_sql(sql,self._engine,chunksize=self._chunksize) 
            return pd.read_sql(sql,self._engine) 
        else:
            _handler = self._engine.connect()
-            _handler.execute(text(sql))
+            _handler.execute(text(sql.strip()))
            _handler.commit ()
            _handler.close()
        return None
@ -71,6 +98,7 @@ class Base:
 class SQLBase(Base):
    def __init__(self,**_args):
        super().__init__(**_args)
        self._schema = _args.get('schema',None)
    def get_provider(self):
        raise Exception ("Provider Needs to be set ...")
    def get_default_port(self) :
@ -94,7 +122,11 @@ class SQLBase(Base):
        # _uri = [_item.strip() for _item in _uri if _item.strip()]
        # return '/'.join(_uri)
        return f'{_provider}://{_host}/{_database}' if _account == '' else f'{_provider}://{_account}{_host}/{_database}'
-
+    def close(self,) :
        try:
            self._engine.dispose()
        except :
            pass
 class BaseReader(SQLBase):
    def __init__(self,**_args):
        super().__init__(**_args)    
@ -106,8 +138,15 @@ class BaseReader(SQLBase):
            sql = _args['sql']
        else:
            _table = _args['table'] if 'table' in _args else self._table
            if self._schema and type(self._schema) == str :
                _table = f'{self._schema}.{_table}'
            sql = f'SELECT * FROM {_table}'
        if 'chunksize' in _args :
            self._chunksize = int(_args['chunksize'])
        return self.apply(sql)
 class BaseWriter (SQLBase):
@ -118,7 +157,7 @@ class BaseWriter (SQLBase):
        super().__init__(**_args)
    def write(self,_data,**_args):
        if type(_data) == dict :
-            _df = pd.DataFrame(_data)
+            _df = pd.DataFrame([_data])
        elif type(_data) == list :
            _df = pd.DataFrame(_data)
        else:
@ -135,5 +174,8 @@ class BaseWriter (SQLBase):
        #     _mode['schema'] = _args['schema']
        # if 'if_exists' in _args :
        #     _mode['if_exists'] = _args['if_exists']
-
+        if 'schema' in _args and type(_args['schema']) == str:
-        _df.to_sql(_table,self._engine,**_mode)
+            self._schema = _args.get('schema',None)
        if self._schema :
           _mode['schema'] = self._schema
        _df.to_sql(_table,self._engine,**_mode)    
--- a/transport/sql/duckdb.py
+++ b/transport/sql/duckdb.py
@ -3,6 +3,9 @@ This module implements the handler for duckdb (in memory or not)
 """
 from transport.sql.common import Base, BaseReader, BaseWriter
 def template ():
    return {'database':'path-to-database','table':'table'}
 class Duck :
    def __init__(self,**_args):
        #
@ -15,9 +18,11 @@ class Duck :
    def _get_uri(self,**_args):
        return f"""duckdb:///{self.database}"""
 class Reader(Duck,BaseReader) :
-    def __init__(self,**_args):
+    def __init__(self,**_args):        
        Duck.__init__(self,**_args)
        BaseReader.__init__(self,**_args)
    def _get_uri(self,**_args):
        return super()._get_uri(**_args),{'connect_args':{'read_only':True}}
 class Writer(Duck,BaseWriter):
    def __init__(self,**_args):
        Duck.__init__(self,**_args)
--- a/transport/sql/mysql.py
+++ b/transport/sql/mysql.py
@ -1,8 +1,11 @@
 """
 This file implements support for mysql and maria db (with drivers mysql+mysql)
 """
-from transport.sql.common import BaseReader, BaseWriter
+from transport.sql.common import BaseReader, BaseWriter, template as _template
 # import mysql.connector as my
 def template ():
    return dict(_template(),**{'port':3306})
 class MYSQL:
    def get_provider(self):
--- a/transport/sql/netezza.py
+++ b/transport/sql/netezza.py
@ -1,5 +1,8 @@
 import nzpy as nz   
-from transport.sql.common import BaseReader, BaseWriter
+from transport.sql.common import BaseReader, BaseWriter , template as _template
 def template ():
    return dict(_template(),**{'port':5480,'chunksize':10000})
 class Netezza:
    def get_provider(self):
--- a/transport/sql/postgresql.py
+++ b/transport/sql/postgresql.py
@ -1,7 +1,10 @@
-from transport.sql.common import BaseReader , BaseWriter
+from transport.sql.common import BaseReader , BaseWriter, template as _template
 from psycopg2.extensions import register_adapter, AsIs
 import numpy as np
 def template ():
    return dict(_template(),**{'port':5432,'chunksize':10000})
 register_adapter(np.int64, AsIs)
--- a/transport/sql/sqlite3.py
+++ b/transport/sql/sqlite3.py
@ -1,7 +1,11 @@
 import sqlalchemy
 import pandas as pd
 from transport.sql.common import Base, BaseReader, BaseWriter
-class SQLite (BaseReader):
+from multiprocessing import RLock
 def template():
    return {'database':'path-to-database','table':'table'}
 class SQLite3 :
    lock = RLock()
    def __init__(self,**_args):
        super().__init__(**_args)
        if 'path' in _args :
@ -12,7 +16,7 @@ class SQLite (BaseReader):
        path = self._database
        return f'sqlite:///{path}' # ensure this is the correct path for the sqlite file. 
-class Reader(SQLite,BaseReader):
+class Reader(SQLite3,BaseReader):
    def __init__(self,**_args):
        super().__init__(**_args)
    # def read(self,**_args):
@ -20,6 +24,12 @@ class Reader(SQLite,BaseReader):
    #     return pd.read_sql(sql,self._engine)
-class Writer (SQLite,BaseWriter):
+class Writer (SQLite3,BaseWriter):
    def __init__(self,**_args):
-        super().__init__(**_args)
+        super().__init__(**_args)
    def write(self,_data,**_kwargs):
        try:
            SQLite3.lock.acquire()
            super().write(_data,**_kwargs)
        finally:
            SQLite3.lock.release()
--- a/transport/sql/sqlserver.py
+++ b/transport/sql/sqlserver.py
@ -3,10 +3,15 @@ Handling Microsoft SQL Server via pymssql driver/connector
 """
 import sqlalchemy
 import pandas as pd
-from transport.sql.common import Base, BaseReader, BaseWriter
+from transport.sql.common import Base, BaseReader, BaseWriter, template as _template
 def template ():
    return dict(_template(),**{'port':1433})
 class MsSQLServer:
    def __init__(self,**_args) :
        super().__init__(**_args)
        pass
--- a/transport/warehouse/init.py
+++ b/transport/warehouse/init.py
@ -0,0 +1,7 @@
 """
 This namespace/package is intended to handle read/writes against data warehouse solutions like :
    - apache iceberg
    - clickhouse (...)
 """
 from . import iceberg, drill
--- a/transport/warehouse/drill.py
+++ b/transport/warehouse/drill.py
@ -0,0 +1,57 @@
 import sqlalchemy
 import pandas as pd
 from .. sql.common import BaseReader , BaseWriter
 import sqlalchemy as sqa
 def template():
    return {'host':'localhost','port':8047,'ssl':False,'table':None,'database':None}
 class Drill :
    __template = {'host':None,'port':None,'ssl':None,'table':None,'database':None}
    def __init__(self,**_args):
        self._host = _args['host'] if 'host' in _args else 'localhost'
        self._port = _args['port'] if 'port' in _args else self.get_default_port()
        self._ssl = False if 'ssl' not in _args else _args['ssl']
        self._table = _args['table'] if 'table' in _args else None
        if self._table and '.' in self._table :
            _seg = self._table.split('.')
            if len(_seg) > 2 :
                self._schema,self._database = _seg[:2]
        else:
            self._database=_args['database']
            self._schema = self._database.split('.')[0]
    def _get_uri(self,**_args):
        return f'drill+sadrill://{self._host}:{self._port}/{self._database}?use_ssl={self._ssl}'
    def get_provider(self):
        return "drill+sadrill"
    def get_default_port(self):
        return "8047"
    def meta(self,**_args):
        _table = _args['table'] if 'table' in _args else self._table
        if '.' in _table :
            _schema = _table.split('.')[:2]
            _schema = '.'.join(_schema)
            _table = _table.split('.')[-1]
        else:
            _schema = self._schema
        # _sql = f"select COLUMN_NAME AS name, CASE WHEN DATA_TYPE ='CHARACTER VARYING' THEN 'CHAR ( 125 )' ELSE DATA_TYPE END AS type from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='{_schema}' and TABLE_NAME='{_table}'"
        _sql = f"select COLUMN_NAME AS name, CASE WHEN DATA_TYPE ='CHARACTER VARYING' THEN 'CHAR ( '||COLUMN_SIZE||' )' ELSE DATA_TYPE END AS type from INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA='{_schema}' and TABLE_NAME='{_table}'"
        try:
            _df  = pd.read_sql(_sql,self._engine)
            return _df.to_dict(orient='records')
        except Exception as e:
            print (e)
            pass
        return []
 class Reader (Drill,BaseReader) :
    def __init__(self,**_args):
        super().__init__(**_args)
        self._chunksize = 0 if 'chunksize' not in _args else _args['chunksize']
        self._engine= sqa.create_engine(self._get_uri(),future=True)
 class Writer(Drill,BaseWriter):
    def __init__(self,**_args):
        super().__init__(self,**_args)
--- a/transport/warehouse/iceberg.py
+++ b/transport/warehouse/iceberg.py
@ -0,0 +1,155 @@
 """
 dependency:
    - spark and SPARK_HOME environment variable must be set
 NOTE:
    When using streaming option, insure that it is inline with default (1000 rows) or increase it in spark-defaults.conf
 """
 from pyspark.sql import SparkSession
 from pyspark import SparkContext
 from pyspark.sql.types import *
 from pyspark.sql.functions import col, to_date, to_timestamp
 import copy
 def template():
    return {'catalog':None,'database':None,'table':None}
 class Iceberg :
    def __init__(self,**_args):
        """
        providing catalog meta information (you must get this from apache iceberg)
        """
        #
        # Turning off logging (it's annoying & un-professional)
        #
        # _spconf = SparkContext()
        # _spconf.setLogLevel("ERROR")
        #
        # @TODO:
        #   Make arrangements for additional configuration elements 
        #
        self._session = SparkSession.builder.appName("data-transport").getOrCreate()
        self._session.conf.set("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MICROS")
        # self._session.sparkContext.setLogLevel("ERROR")
        self._catalog = self._session.catalog
        self._table = _args['table'] if 'table' in _args else None
        if 'catalog' in _args :
            #
            # Let us set the default catalog
            self._catalog.setCurrentCatalog(_args['catalog'])
        else:
            # No current catalog has been set ...
            pass
        if 'database' in _args :
            self._database = _args['database']
            self._catalog.setCurrentDatabase(self._database)
        else:
            #
            # Should we set the default as the first one if available ?
            #
            pass
        self._catalogName = self._catalog.currentCatalog()
        self._databaseName = self._catalog.currentDatabase()
    def meta (self,**_args) :
        """
        This function should return the schema of a table (only)
        """
        _schema = []
        try:
            _table = _args['table'] if 'table' in _args else self._table
            _tableName = self._getPrefix(**_args) + f".{_table}"
            _tmp = self._session.table(_tableName).schema
            _schema = _tmp.jsonValue()['fields']
            for _item in _schema :
                del _item['nullable'],_item['metadata']
        except Exception as e:
            pass
        return _schema
    def _getPrefix (self,**_args):        
        _catName = self._catalogName if 'catalog' not in _args else _args['catalog']
        _datName = self._databaseName if 'database' not in _args else _args['database']
        return '.'.join([_catName,_datName])
    def apply(self,_query):
        """
        sql query/command to run against apache iceberg
        """
        return self._session.sql(_query).toPandas()
    def has (self,**_args):
        try:
            _prefix = self._getPrefix(**_args)
            if _prefix.endswith('.') :
                return False
            return _args['table'] in [_item.name for _item in self._catalog.listTables(_prefix)]
        except Exception as e:
            print (e)
            return False
    def close(self):
        self._session.stop()
 class Reader(Iceberg) :
    def __init__(self,**_args):
        super().__init__(**_args)
    def read(self,**_args):
        _table = self._table
        _prefix = self._getPrefix(**_args)        
        if 'table' in _args or _table:
            _table = _args['table'] if 'table' in _args else _table
            _table = _prefix + f'.{_table}'
            return self._session.table(_table).toPandas()
        else:
            sql = _args['sql']
            return self._session.sql(sql).toPandas()
        pass
 class Writer (Iceberg):
    """
    Writing data to an Apache Iceberg data warehouse (using pyspark)
    """
    def __init__(self,**_args):
        super().__init__(**_args)
        self._mode = 'append' if 'mode' not in _args else _args['mode']
        self._table = None if 'table' not in _args else _args['table']
    def format (self,_schema) :
        _iceSchema = StructType([])
        _map = {'integer':IntegerType(),'float':DoubleType(),'double':DoubleType(),'date':DateType(),
                'timestamp':TimestampType(),'datetime':TimestampType(),'string':StringType(),'varchar':StringType()}
        for _item in _schema :
            _name = _item['name']
            _type = _item['type'].lower()
            if _type not in _map :
                _iceType = StringType()
            else:
                _iceType = _map[_type]
            _iceSchema.add (StructField(_name,_iceType,True))
        return _iceSchema if len(_iceSchema) else []
    def write(self,_data,**_args):
        _prefix = self._getPrefix(**_args)
        if 'table' not in _args and not self._table :
            raise Exception (f"Table Name should be specified for catalog/database {_prefix}")
        _schema = self.format(_args['schema']) if 'schema' in _args else []
        if not _schema :
            rdd = self._session.createDataFrame(_data,verifySchema=False)
        else :
            rdd = self._session.createDataFrame(_data,schema=_schema,verifySchema=True)
        _mode = self._mode if 'mode' not in _args else _args['mode']
        _table = self._table if 'table' not in _args else _args['table']
        # print (_data.shape,_mode,_table)
        if not self._session.catalog.tableExists(_table):
        #     # @TODO:
        #     # add partitioning information here 
            rdd.writeTo(_table).using('iceberg').create()
        # #     _mode = 'overwrite'
        # #     rdd.write.format('iceberg').mode(_mode).saveAsTable(_table)
        else:
            # rdd.writeTo(_table).append()
        # #     _table = f'{_prefix}.{_table}'
            rdd.coalesce(10).write.format('iceberg').mode('append').save(_table)
Author	SHA1	Message	Date
Steve Nyemba	9d00a459eb	version update, etl bug fixes	15 hours ago
Steve Nyemba	b80c076ec9	bug fixes: ETL creating a template functions	15 hours ago
Steve Nyemba	8381f4cbc0	forogot info file	2 weeks ago
Steve Nyemba	7aec155334	bug fix: refactored info folder	2 weeks ago
Steve Nyemba	a065ac8f12	bug fixes (indentation)	2 months ago
Steve Nyemba	2021abd1da	windows runner	2 months ago
Steve Nyemba	977aa91045	bug fix & added windows runner	2 months ago
Steve Nyemba	d154ac3cd0	bug fix:	5 months ago
Steve Nyemba	e0e48a3d02	bug fix: installer & registry	5 months ago
Steve Nyemba	aba887ec29	bug fix: crash on concurrency	6 months ago
Steve Nyemba	570f2294b9	bug fix with registry	6 months ago
Steve Nyemba	e3efc70c01	upgrade pyproject.toml, bug fix with registry	6 months ago
Steve Nyemba	6ffc7ed7b5	bug fix	6 months ago
Steve Nyemba	3025e6571b	bug fix	6 months ago
Steve Nyemba	a42ee59129	bug fix: logger	7 months ago
Steve Nyemba	b1975d6a42	bug fix: missing table (sql)	8 months ago
Steve Nyemba	c7a5d42f42	bug fix: dependency	9 months ago
Steve Nyemba	2eee726191	bug fix: version update	9 months ago
Steve Nyemba	bf32c54cd4	bug fix: setup file	10 months ago
Steve Nyemba	4fbf2d495a	adding dependency plugin-ix	10 months ago
Steve Nyemba	73fa9d90a9	adding plugin handler (enhancement)	10 months ago
Steve Nyemba	fce888606c	bug fix: double entries (yikes)	10 months ago
Steve Nyemba	2d359db5fa	bug fix & version update, using schemas read/write	10 months ago
Steve Nyemba	cdeebd3ce4	bug fix & version update, using schemas read/write	10 months ago
Steve Nyemba	c1bc167b7f	bug fix & version update	10 months ago
Steve Nyemba	d8dd50ab47	bug fix	10 months ago
Steve Nyemba	a0b0a8a26f	verison update	10 months ago
Steve Nyemba	97c5ae6fb3	bug fix: duckdb readonly, version update with edition	10 months ago
Steve Nyemba	a6da232d5f	bug fix ...	10 months ago
Steve Nyemba	e7df1e967f	bug fix sqlalchemy dispose connection	11 months ago
Steve Nyemba	a4b4a453bb	bug fix: etl & logger	11 months ago
Steve Nyemba	a022bdf92f	bug fix: etl & logger	11 months ago
Steve Nyemba	93537095a4	bug fix: etl & logger	11 months ago
Steve Nyemba	329a575f89	bug fix: issue with sqlalchemy & python 3.12	11 months ago
Steve Nyemba	0dbb0a38e5	bug fix: issue with sqlalchemy & python 3.12	11 months ago
Steve Nyemba	6637405898	bug fix: issue with sqlalchemy & python 3.12	11 months ago
Steve Nyemba	e82145690b	bug fix: version	12 months ago
Steve Nyemba	5c423205c5	bug fixes and enhancements, iceberg casting, typer parameters, etl throtling	12 months ago
Steve Nyemba	b2a2e49858	bug fix	1 year ago
Steve Nyemba	0fd29207cf	bug fixes ...	1 year ago
Steve Nyemba	5ee0186e58	bug fixes ...	1 year ago
Steve Nyemba	541ae41786	bug fix	1 year ago
Steve Nyemba	dc9329218a	bug fixes: read, with source that accepts an sql query	1 year ago
Steve Nyemba	990bb343a4	bug fixes: read, with source that accepts an sql query	1 year ago
Steve Nyemba	bcf25a4e27	bug fixes: drill & iceberg, etl	1 year ago
Steve Nyemba	a2ab60660e	bug fix	1 year ago
Steve Nyemba	92db89daaf	bug fix	1 year ago
Steve Nyemba	a28848194a	bug fixes: stream, drill,iceberg	1 year ago
Steve Nyemba	5dbe541025	adding templates to the class hierarchies, helps with wizard	1 year ago
Steve Nyemba	ea1cb7b1bb	bug fixes with ETL and added properties to perform parameter validation and provide input template	1 year ago
Steve Nyemba	8904c7184a	warehouse support, plugin registry and streaming	1 year ago
Steve Nyemba	4e97b32530	feat: streaming support on reads	1 year ago
Steve Nyemba	a1cf78a889	bug fixes: drill inheritance and met data function	1 year ago
Steve Nyemba	685aac7d6b	bug fix: write when table doesn't exist	1 year ago
Steve Nyemba	07be81bace	adding warehouse support (iceberg)	1 year ago
		`@ -0,0 +1,2 @@`
							`cd /D "%~dp0"`
							`python transport %1 %2 %3 %4 %5 %6`