diff --git a/README.md b/README.md index 0add2c7..c641952 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,8 @@ Mostly data scientists that don't really care about the underlying database and 1. Familiarity with **pandas data-frames** 2. Connectivity **drivers** are included -3. Useful for data migrations or ETL +3. Mining data from various sources +4. Useful for data migrations or ETL # Usage @@ -35,7 +36,8 @@ Within the virtual environment perform the following : pip install git+https://dev.the-phi.com/git/steve/data-transport.git -Once installed **data-transport** can be used as a library in code or a command line interface (CLI) +Once installed **data-transport** can be used as a library in code or a command line interface (CLI), as a CLI it is used for ETL and requires a configuration file. + ## Data Transport as a Library (in code) --- @@ -112,12 +114,71 @@ df = reader.read(mongo=_command) print (df.head()) reader.close() ``` -**Writing to Mongodb** +**Read/Writing to Mongodb** --- + +Scenario 1: Mongodb with security in place + +1. Define an authentication file on disk + + The semantics of the attributes are provided by mongodb, please visit [mongodb documentation](https://mongodb.org/docs). In this example the file is located on _/transport/mongo.json_ +
+
+configuration file + +``` +{ + "username":"me","password":"changeme", + "mechanism":"SCRAM-SHA-1", + "authSource":"admin" +} +``` +Connecting to Mongodb + +``` +import transport +PIPELINE = ... #-- do this yourself +MONGO_KEY = '/transport/mongo.json' +mreader = transport.factory.instance(provider=transport.providers.MONGODB,auth_file=MONGO_KEY,context='read',db='mydb',doc='logs') +_aggregateDF = mreader.read(mongo=PIPELINE) #--results of a aggregate pipeline +_collectionDF= mreader.read() + + +``` + +In order to enable write, change **context** attribute to **'read'**. +
+
+- The configuration file is in JSON format +- The commands passed to mongodb are the same as you would if you applied runCommand in mongodb +- The output is a pandas data-frame +- By default the transport reads, to enable write operations use **context='write'** + +|parameters|description | +| --- | --- | +|db| Name of the database| +|port| Port number to connect to +|doc| Name of the collection of documents| +|username|Username | +|password|password| +|authSource|user database that has authentication info| +|mechanism|Mechnism used for authentication| + +**NOTE** + +Arguments like **db** or **doc** can be placed in the authentication file +
+
+ +**Limitations** + +Reads and writes aren't encapsulated in the same object, this is to allow the calling code to deliberately perform actions and hopefully minimize accidents associated with data wrangling. + + ``` import transport improt pandas as pd -writer = factory.instance(provider='mongodb',context='write',host='localhost',port='27018',db='example',doc='logs') +writer = factory.instance(provider=transport.providers.MONGODB,context='write',host='localhost',port='27018',db='example',doc='logs') df = pd.DataFrame({"names":["steve","nico"],"age":[40,30]}) writer.write(df)