|
|
# Introduction
|
|
|
|
|
|
This project implements an abstraction of objects that can have access to a variety of data stores, implementing read/write with a simple and expressive interface. This abstraction works with **NoSQL**, **SQL** and **Cloud** data stores and leverages **pandas**.
|
|
|
|
|
|
# Why Use Data-Transport ?
|
|
|
|
|
|
Data transport is a simple framework that enables read/write to multiple databases or technologies that can hold data. In using **data-transport**, you are able to:
|
|
|
|
|
|
- Enjoy the simplicity of **data-transport** because it leverages SQLAlchemy & Pandas data-frames.
|
|
|
- Share notebooks and code without having to disclosing database credentials.
|
|
|
- Seamlessly and consistently access to multiple database technologies at no cost
|
|
|
- No need to worry about accidental writes to a database leading to inconsistent data
|
|
|
- Implement consistent pre and post processing as a pipeline i.e aggregation of functions
|
|
|
- **data-transport** is open-source under MIT License https://github.com/lnyemba/data-transport
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
Within the virtual environment perform the following, the options for installation are:
|
|
|
|
|
|
**sql** - by default postgresql, mysql, sqlserver, sqlite3+, duckdb
|
|
|
|
|
|
pip install data-transport[cloud,nosql,other,all]git+https://github.com/lnyemba/data-transport.git
|
|
|
|
|
|
Options to install components in square brackets, these components are
|
|
|
|
|
|
**warehouse** - Apache Iceberg, Apache Drill
|
|
|
|
|
|
**cloud** - to support nextcloud, s3
|
|
|
|
|
|
**nosql** - support for mongodb, couchdb
|
|
|
|
|
|
**other** - support for files, rabbitmq, http
|
|
|
|
|
|
pip install data-transport[nosql,cloud,warehouse,all]@git+https://github.com/lnyemba/data-transport.git
|
|
|
|
|
|
## Additional features
|
|
|
|
|
|
- In addition to read/write, there is support for functions for pre/post processing
|
|
|
- CLI interface to add to registry, run ETL
|
|
|
- scales and integrates into shared environments like apache zeppelin; jupyterhub; SageMaker; ...
|
|
|
|
|
|
## Learn More
|
|
|
|
|
|
We have available notebooks with sample code to read/write against mongodb, couchdb, Netezza, PostgreSQL, Google Bigquery, Databricks, Microsoft SQL Server, MySQL ... Visit [data-transport homepage](https://healthcareio.the-phi.com/data-transport)
|