You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
steve
6a5c4a7754
|
6 years ago | |
---|---|---|
notebooks | 6 years ago | |
src | 6 years ago | |
README.md | 6 years ago |
README.md
deid-risk
This project is intended to compute an estimated value of risk for a given database.
1. Pull meta data of the database and create a dataset via joins
2. Generate the dataset with random selection of features
3. Compute risk via SQL using group by
Python environment
The following are the dependencies needed to run the code:
pandas
numpy
pandas-gbq
google-cloud-bigquery
Usage
*Generate The merged dataset
python risk.py create --i_dataset <in dataset|schema> --o_dataset <out dataset|schema> --table <name> --path <bigquery-key-file> --key <patient-id-field-name> [--file ]
* Compute risk (marketer, prosecutor)
python risk.py compute --i_dataset <dataset> --table <name> --path <bigquery-key-file> --key <patient-id-field-name>
Limitations
- It works against bigquery for now
@TODO:
- Need to write a transport layer (database interface)
- Support for referential integrity, so one table can be selected and a dataset derived given referential integrity
- Add support for journalist risk