From 6918f80eb480c73103aebf3ce78e44098762bbbe Mon Sep 17 00:00:00 2001 From: "Steve L. Nyemba -- The Architect" Date: Thu, 27 Sep 2018 10:35:35 -0500 Subject: [PATCH] documentation --- README.md | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index aab19b9..a5bc4c4 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,31 @@ This project is intended to compute an estimated value of risk for a given database. - 1. Pull meta data of the database and create a dataset via joins + 1. Pull meta data of the database and create a dataset via joins 2. Generate the dataset with random selection of features - 3. Compute risk via SQL using group by \ No newline at end of file + 3. Compute risk via SQL using group by +## Python environment + + The following are the dependencies needed to run the code: + + pandas + numpy + pandas-gbq + google-cloud-bigquery + + +## Usage + + *Generate The merged dataset + + python risk.py create --i_dataset --o_dataset --table --path --key [--file ] + + * Cmpute risk + + python risk.py compute --i_dataset --table --path --key +## Limitations + - It works against bigquery for now + @TODO: + - Need to write a transport layer (database interface) + - Support for referential integrity, so one table can be selected and a dataset derived given referential integrity + - Add support for journalist risk \ No newline at end of file