You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
215 lines
44 KiB
Plaintext
215 lines
44 KiB
Plaintext
6 years ago
|
{
|
||
|
"cells": [
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 1,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"import pandas as pd\n",
|
||
|
"import numpy as np\n",
|
||
|
"df = pd.read_csv('../src/out/risk_xoi.csv')"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"outputs": [
|
||
|
{
|
||
|
"data": {
|
||
|
"text/html": [
|
||
|
"<div>\n",
|
||
|
"<style scoped>\n",
|
||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||
|
" vertical-align: middle;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe tbody tr th {\n",
|
||
|
" vertical-align: top;\n",
|
||
|
" }\n",
|
||
|
"\n",
|
||
|
" .dataframe thead th {\n",
|
||
|
" text-align: right;\n",
|
||
|
" }\n",
|
||
|
"</style>\n",
|
||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||
|
" <thead>\n",
|
||
|
" <tr style=\"text-align: right;\">\n",
|
||
|
" <th></th>\n",
|
||
|
" <th>Unnamed: 0</th>\n",
|
||
|
" <th>group_count</th>\n",
|
||
|
" <th>row_count</th>\n",
|
||
|
" <th>marketer</th>\n",
|
||
|
" <th>prosecutor</th>\n",
|
||
|
" <th>field_count</th>\n",
|
||
|
" </tr>\n",
|
||
|
" </thead>\n",
|
||
|
" <tbody>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>0</th>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>432512</td>\n",
|
||
|
" <td>79080802</td>\n",
|
||
|
" <td>0.005469</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>10</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>1</th>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>17824004</td>\n",
|
||
|
" <td>79080802</td>\n",
|
||
|
" <td>0.225390</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>28</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>2</th>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>43538084</td>\n",
|
||
|
" <td>79080802</td>\n",
|
||
|
" <td>0.550552</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>38</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>3</th>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>64042788</td>\n",
|
||
|
" <td>79080802</td>\n",
|
||
|
" <td>0.809840</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>46</td>\n",
|
||
|
" </tr>\n",
|
||
|
" <tr>\n",
|
||
|
" <th>4</th>\n",
|
||
|
" <td>0</td>\n",
|
||
|
" <td>6866070</td>\n",
|
||
|
" <td>79080802</td>\n",
|
||
|
" <td>0.086823</td>\n",
|
||
|
" <td>1</td>\n",
|
||
|
" <td>17</td>\n",
|
||
|
" </tr>\n",
|
||
|
" </tbody>\n",
|
||
|
"</table>\n",
|
||
|
"</div>"
|
||
|
],
|
||
|
"text/plain": [
|
||
|
" Unnamed: 0 group_count row_count marketer prosecutor field_count\n",
|
||
|
"0 0 432512 79080802 0.005469 1 10\n",
|
||
|
"1 0 17824004 79080802 0.225390 1 28\n",
|
||
|
"2 0 43538084 79080802 0.550552 1 38\n",
|
||
|
"3 0 64042788 79080802 0.809840 1 46\n",
|
||
|
"4 0 6866070 79080802 0.086823 1 17"
|
||
|
]
|
||
|
},
|
||
|
"execution_count": 2,
|
||
|
"metadata": {},
|
||
|
"output_type": "execute_result"
|
||
|
}
|
||
|
],
|
||
|
"source": [
|
||
|
"df.head()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": 3,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": [
|
||
|
"compiled = df.groupby('field_count')[['field_count','marketer','prosecutor']].mean()\n",
|
||
|
"figure = compiled[['marketer','prosecutor']].plot.line().get_figure()"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "markdown",
|
||
|
"metadata": {
|
||
|
"variables": {
|
||
|
" figure ": "<img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAicAAAG0CAYAAADpSoetAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAPYQAAD2EBqD+naQAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJzs3XmYXHWdP/r3qb2r9z3dnU46K0mALCQQwqbORDKI8efMMDIuZIyKI5L5qRmvGhVyR2dA5wLj3PmhGaMMuIJycQMMYDQiEghJSMKSPeklve/V3dW1nnP/OPU9VdVdy6l96ffrefoRuk91fYOQ/uSzfSVFURQQERER5QlDrg9AREREFIrBCREREeUVBidERESUVxicEBERUV5hcEJERER5hcEJERER5RUGJ0RERJRXGJwQERFRXmFwQkRERHmFwQkRERHlFQYnRERElFdMuT6AHrIso6enB+Xl5ZAkKdfHISIiIh0URcHExASam5thMOjPhxREcNLT04PW1tZcH4OIiIiS0NXVhfnz5+t+viCCk/LycgDqL66ioiLHpyEiIiI9HA4HWltbtZ/jehVEcCJKORUVFQxOiIiICkyiLRlsiCUiIqK8wuCEiIiI8gqDEyIiIsorDE6IiIgorzA4ISIiorzC4ISIiIjyCoMTIiIiyisMToiIiCivMDghIiKivMLghIiIiPJKwsHJiy++iK1bt6K5uRmSJOGXv/xl3NccOHAAV111FaxWK5YuXYpHH300mbMSERHRHJBwcDI1NYU1a9bg4Ycf1vX8xYsXceutt+Jd73oXjh07hs9+9rP4xCc+geeeey7hwxIREVHxS/jiv1tuuQW33HKL7uf37NmDRYsW4cEHHwQArFy5Ei+99BL+4z/+A1u2bEn07dNHUQCvM3fvT0RElE/MdiDBC/oyJeO3Eh88eBCbN28O+9yWLVvw2c9+Nupr3G433G639vcOhyP9B/M6gfua0/99iYiICtGXewBLaa5PASALDbF9fX1obGwM+1xjYyMcDgemp6cjvub+++9HZWWl9tHa2prpYxIREVGeyHjmJBm7du3Czp07tb93OBzpD1DMdjVKJCIiIvXnYp7IeHAyb9489Pf3h32uv78fFRUVKCkpifgaq9UKq9Wa2YNJUt6kr4iIiCgo42WdTZs2Yf/+/WGfe+GFF7Bp06ZMvzUREREVoISDk8nJSRw7dgzHjh0DoI4KHzt2DJ2dnQDUksy2bdu05z/1qU/hwoUL+MIXvoBTp07h29/+Nn72s5/hc5/7XJp+CURERFRMEg5ODh8+jHXr1mHdunUAgJ07d2LdunW49957AQC9vb1aoAIAixYtwjPPPIMXXngBa9aswYMPPojvfe97uR0jJiIiorwlKYqi5PoQ8TgcDlRWVmJ8fBwVFRW5Pg4RERHpkOzPb96tQ0RERHmFwQkREVERUxQFvzrWjbP9E7k+im4MToiIiIrYkY5RfObxY/jb77yMjuGpXB9HFwYnRERERezN7nEAgMPlw6d+dBTTHn+OTxQfgxMiIqIidnZgUvvrk70OfOUXbyDfZ2EYnBARERUxEZz8/dWtMBokPPV6N370SkeOTxUbgxMiIqIidi4QnHzk2oX40l+tAAB87em3caRjNJfHionBCRERUZEannRjZMoDSQKW1JfhEzcuwq1XNsHrV/DpHx/B4IQ710eMiMEJERFRkRIlnfnVJSixGCFJEr5522osbShDv8ONHT85Cp9fzvEpZ2NwQkREVKREcLKsoVz7XJnVhD0fWY9SixGvXhzBN/edytXxomJwQkREVKTOBRavLWsoC/v80oYyPPB3awAAe/90EU+f6Mn62WJhcEJERFSkROZk6YzgBABuubIJ//iOxQCALzx5Iq82yDI4ISIiKlJaWaexPOLX/6+bL8OmxbUwGSQM5FFzrCnXByAiIqL0G3d6tWmcSJkTADAZDfivD63DlNuHhbWl2TxeTAxOiIiIitC5QbVM01xpQ5k1+o/7ujIr6sqs2TqWLizrEBERFaGz/YF+kyglnXzG4ISIiKgIBceII5d08hmDEyIioiIUa1In3zE4ISIiygM+v4wnj1xK20r5aDtOCgGDEyIiojyw54/n8fmfH8eDz59O+XtNuLzoGXcBYOaEiIiIkiDLCh5/rQtAsByTivODUwCA+nIrquyWlL9ftjE4ISIiyrGDF4ZxaXQaANAzNp3y9ztbwCUdgMEJERFRzj0RyJoAQL/DBW+KNwWfK+BJHYDBCRERUU6NO73Y91af9veyAvQF+kWSpU3qFOCOE4DBCRERpYGiKJhy+3J9jIL0q+Pd8PhkrGyqwIIaO4DUSztnB1jWISKiOe5ffvM21n7teZzsdeT6KAVHlHQ+sGE+WqpKAAA948kHJ9Mev9a/wuCEiIjmrJfPD8HrV3C8ayzXRykob3aP460eByxGA96/tgUt1Wpw0j2afHByfnASigLUlFpQm2d35ujF4ISIiFKiKIr2J/XhKU+OT1NYfnZYzZrcfHkjqkstaA5kTrrHku85OVfAm2EFBidERJSSMacXTo8fADA8mV/BiaIouT5CVC6vH798vRsA8IENrQCAliobgNR6TkS/CYMTIiKasy6FlCBGptKzej0d/u2Zt7Hu6y/gXOCHdb557q0+OFw+tFSV4IaldQCgZU5SCk76C3uMGGBwQkREKbo06tT+Op/KOs+c6MWY04u9L17M9VEiEiWd29bPh8EgAYDWENs9Np101ie446Qwx4gBBidERJSi7pA/5edLWcfp8Wl3y/zqeDfGnd4cnyhc14gTfz43DElSgxNBZE6cHj/GpxM/s9vnR/uwurp+WSMzJ0RENEeFl3XyIzi5ELhbBgBcXhk/P9IV4+ns+/mRSwCA65fUoTWw2wQAbGYjakvVu3C6kyjtXByagqwA5TYTGsoLc1IHYHBCREQpmhmc5EMT6oUhNTgJVEvwo1c6IMu5PxcA+GUFTwZKOh+4unXW18U4cU8SEzuh/SaSJKVwytxicEJERCkJ7Tnx+GVM5MGm2AuD6g/p91zZhHKrCe3DTrx0bijHp1L9+dwQesZdqCwx4+ZVjbO+3lwpdp04Z30tnrNF0G8CMDghIqIUzSw/jORB38n5QFnnipZK/G2gp+MHBztyeSTNE4GsyfvXNsNmNs76ujaxk8T9OmIyqZD7TQAGJ0RElILxaS8mXGqmpK5M7ZXIh4kdkTlZUl+Gj1y7EADw+1P9YVmeXBid8uCFt/oBRC7pAEBzYNdJMj0noqxTyDtOAAYnRESUArFmvabUgpZqtbEz102xsqxoDbGL60uxtKEM1y+thawAP3m1M6dn+8Xr3fD4ZVzeXIHLmysjPjM/yRX2Xr+Mi0NiUodlHSIimqNEJqKlqkSbMhmezO0itj6HC9NeP0wGSbvl945r2wCol+y5ff6cne3Xx3sAALdHyZoAyS9i6xiegk9WYLcY0VxpS/6QeYDBCRERJU2UHuZXl6CmND/KOiJrsqDWDrNR/TG3eWUDmiptGJ7y4Nk3enNyLllWcLpP7QkRG2EjEcHJwIQ7oUAq9E6dQp7UARicEBFRCsQYcUtVCWoDPSe5LutcGFJ/SC+uC/ZdmIwGfOiaBQCAH+aoMbY3kNExG6Ww3SYz1ZZaYDWpP577x/VnoYql3wRgcEJERCkQfRHzq/OnrHN+QDTDloZ9/u+vWQCzUcLRzjG82T2es3MtrC3VMjqRSJKkrbG/NKa/gbdYxogBBidERJQC8cNzfrUdNaXqRtKcl3UCTaFL6sMzCPXlVtxyRROA3GRPzg+KjE5pnCdD+070jxMHgxNmToiIaA7TyjrVeVTWCZnUmemOTepYcS7u2xHByRIdwYMYJ9bbFOuXFe37F/qOE4DBCRERJWnS7cNY4Ad8S1hZJ3fBidPj05p0F9fP/iG9YWE1Vswrz8l9OyJompnRiSTRiZ2uESc8PhlWkwHzq6P3sxQKBidERJQU0W9SYTOhwmbWpnVyeb+O2PNRbQ+eJ5QkSdi2qQ1A9u/b0TIn
|
||
|
}
|
||
|
},
|
||
|
"source": [
|
||
|
"# Dataset Used\n",
|
||
|
"---\n",
|
||
|
"\n",
|
||
|
"We performed joins against all the tables from all-of-us and truncated records while randomly selecting on record per every join. As a result we have roughly a dataset of about **80 million** records and about **5000** distinct patients.\n",
|
||
|
"\n",
|
||
|
"## Expriment Design\n",
|
||
|
"---\n",
|
||
|
"\n",
|
||
|
"We compute both marketer and prosecutor risk computation while randomly selecting the number of attributes out of **111**. This selection is between ***2*** and **111** attributes. The number of maximum number of attributes that can be computed at any time is **64** : limitations of Google's Big-query. We performed **500** runs.\n",
|
||
|
"\n",
|
||
|
"## Results\n",
|
||
|
"---\n",
|
||
|
"\n",
|
||
|
"The results show the prosecutor risk is unchanging perhaps as an artifact of the number of runs **500** or the dataset curation: The joins we performed. The prosecutor risk shows there is at least one record that vulnerable.\n",
|
||
|
"\n",
|
||
|
"The marketer risk seems to increase as the number of randomly selected attributes increases as a general trend. \n",
|
||
|
"\n",
|
||
|
"{{ figure }} \n",
|
||
|
"\n",
|
||
|
"\n"
|
||
|
]
|
||
|
},
|
||
|
{
|
||
|
"cell_type": "code",
|
||
|
"execution_count": null,
|
||
|
"metadata": {},
|
||
|
"outputs": [],
|
||
|
"source": []
|
||
|
}
|
||
|
],
|
||
|
"metadata": {
|
||
|
"kernelspec": {
|
||
|
"display_name": "Python 2",
|
||
|
"language": "python",
|
||
|
"name": "python2"
|
||
|
},
|
||
|
"language_info": {
|
||
|
"codemirror_mode": {
|
||
|
"name": "ipython",
|
||
|
"version": 2
|
||
|
},
|
||
|
"file_extension": ".py",
|
||
|
"mimetype": "text/x-python",
|
||
|
"name": "python",
|
||
|
"nbconvert_exporter": "python",
|
||
|
"pygments_lexer": "ipython2",
|