You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
data-transport/notebooks/bigquery.ipynb

4.5 KiB

None <html lang="en"> <head> </head>

Writing to Google Bigquery

  1. Insure you have a Google Bigquery service account key on disk
  2. The service key location is set as an environment variable BQ_KEY
  3. The dataset will be automatically created within the project associated with the service key

The cell below creates a dataframe that will be stored within Google Bigquery

In [1]:
#
# Writing to Google Bigquery database
#
import transport
from transport import providers
import pandas as pd
import os

PRIVATE_KEY = os.environ['BQ_KEY'] #-- location of the service key
DATASET = 'demo'
_data = pd.DataFrame({"name":['James Bond','Steve Rogers','Steve Nyemba'],'age':[55,150,44]})
bqw = transport.factory.instance(provider=providers.BIGQUERY,dataset=DATASET,table='friends',context='write',private_key=PRIVATE_KEY)
bqw.write(_data,if_exists='replace') #-- default is append
print (['data transport version ', transport.__version__])
100%|██████████| 1/1 [00:00<00:00, 5440.08it/s]
['data transport version ', '2.0.0']

Reading from Google Bigquery

The cell below reads the data that has been written by the cell above and computes the average age within a Google Bigquery (simple query).

  • Basic read of the designated table (friends) created above
  • Execute an aggregate SQL against the table

NOTE

It is possible to use transport.factory.instance or transport.instance they are the same. It allows the maintainers to know that we used a factory design pattern.

In [2]:
import transport
from transport import providers
import os
PRIVATE_KEY=os.environ['BQ_KEY']
pgr = transport.instance(provider=providers.BIGQUERY,dataset='demo',table='friends',private_key=PRIVATE_KEY)
_df = pgr.read()
_query = 'SELECT COUNT(*) _counts, AVG(age) from demo.friends'
_sdf = pgr.read(sql=_query)
print (_df)
print ('--------- STATISTICS ------------')
print (_sdf)
Downloading: 100%|██████████|
Downloading: 100%|██████████|
           name  age
0    James Bond   55
1  Steve Rogers  150
2  Steve Nyemba   44
--------- STATISTICS ------------
   _counts   f0_
0        3  83.0

The cell bellow show the content of an auth_file, in this case if the dataset/table in question is not to be shared then you can use auth_file with information associated with the parameters.

NOTE:

The auth_file is intended to be JSON formatted

In [3]:
{
    
    "dataset":"demo","table":"friends"
}
Out[3]:
{'dataset': 'demo', 'table': 'friends'}
In [ ]:

</html>