This framework allows read/write and ETL against many SQL, Cloud, NoSQL databases, and other persistent data stores. Additional features include support for user-defined plugin pre/post processing functions
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Go to file
Steve Nyemba 037019c1d7
bug fix
10 months ago
bin bug fix: registry (more usable) and added to factory method 10 months ago
info new version 10 months ago
notebooks documentation ... 10 months ago
transport bug fix 10 months ago
.gitignore .. 1 year ago
README.md documentation typo 10 months ago
requirements.txt S3 Requirments file 8 years ago
setup.py duckdb support 10 months ago

README.md

Introduction

This project implements an abstraction of objects that can have access to a variety of data stores, implementing read/write with a simple and expressive interface. This abstraction works with NoSQL, SQL and Cloud data stores and leverages pandas.

Why Use Data-Transport ?

Mostly data scientists that don't really care about the underlying database and would like a simple and consistent way to read/write and move data are well served. Additionally we implemented lightweight Extract Transform Loading API and command line (CLI) tool. Finally it is possible to add pre/post processing pipeline functions to read/write

  1. Familiarity with pandas data-frames
  2. Connectivity drivers are included
  3. Reading/Writing data from various sources
  4. Useful for data migrations or ETL

Installation

Within the virtual environment perform the following :

pip install git+https://github.com/lnyemba/data-transport.git

Learn More

We have available notebooks with sample code to read/write against mongodb, couchdb, Netezza, PostgreSQL, Google Bigquery, Databricks, Microsoft SQL Server, MySQL ... Visit data-transport homepage