DocTable
Overview¶A DocTable
acts as an object-oriented interface to a single database table. It will maintain a connection to the database and sqlalchemy metadata about the table schema. Schemas are provided as classes that act much like dataclasses
(see the doctable schema guide for more explanation), and indices and constraints are often specified in a definition of a subclass of DocTable
.
You may also want to see the vignettes for more examples, the DocTable
docs for more information about the class, or the schema guide for more information about creating schemas. I also recommend looking examples for insert, delete, select, and update methods.
Here I'll just provide some examples to give a sense of how DocTable
works.
import random
random.seed(0)
import pandas as pd
import numpy as np
from dataclasses import dataclass
import sys
sys.path.append('..')
import doctable
DocTable
schemas are created using the doctable.schema
decorator on a class that uses doctable.Col
for defaulted parameters. Check out the schema guide for more detail about schema classes. Our demonstration class will include three columns: id
, name
, and age
, with an additional .is_old
property derived from age
for example.
@doctable.schema
class Record:
__slots__ = []
id: int = doctable.IDCol()
name: str = doctable.Col(nullable=False)
age: int = doctable.Col()
@property
def is_old(self):
return self.age >= 30 # lol
We can instantiate a DocTable
by passing a target
and schema
(Record
in our example) parameters, and I show the resulting schema using .schema_table()
. Note that the type hints were used to describe column types, and id
was used as the auto-incremented primary key.
table = doctable.DocTable(target=':memory:', schema=Record)
table.schema_table()
name | type | nullable | default | autoincrement | primary_key | |
---|---|---|---|---|---|---|
0 | id | INTEGER | False | None | auto | 1 |
1 | name | VARCHAR | False | None | auto | 0 |
2 | age | INTEGER | True | None | auto | 0 |
Probably a more common use case will be to subclass DocTable
to provide some basic definitions.
class RecordTable(doctable.DocTable):
_tabname_ = 'records'
_schema_ = Record
_indices_ = (
doctable.Index('ind_age', 'age'),
)
_constraints_ = (
doctable.Constraint('check', 'age > 0'),
)
table = RecordTable(target=':memory:')
table
<__main__.RecordTable at 0x7f89eefd8ca0>
The main goal of doctable was to create an object-oriented interface to working with database tables. To that end, I'll show some of the more common use cases in the following examples.
First, note that subscripting the table object allows you to access sqlalchemy Column
objects, which, as I will show a bit later, can be used to create where conditionals for select and update queries.
table['id']
Column('id', Integer(), table=<records>, primary_key=True, nullable=False)
As we'll show later, these column objects also have some operators defined such that they can be used to construct complex queries and functions.
table['id'] > 3
<sqlalchemy.sql.elements.BinaryExpression object at 0x7f89eef91100>
We use the .insert()
method to insert a row passed as a dictionary of column name -> value entries.
for i in range(5):
age = random.random() # number in [0,1]
is_old = age > 0.5
#row = {'name':'user_'+str(i), 'age':age, 'is_old':is_old}
record = Record(name='user_'+str(i), age=age)
table.insert(record)
table.head()
id | name | age | |
---|---|---|---|
0 | 1 | user_0 | 0.844422 |
1 | 2 | user_1 | 0.757954 |
2 | 3 | user_2 | 0.420572 |
3 | 4 | user_3 | 0.258917 |
4 | 5 | user_4 | 0.511275 |
Now we show how to select data from the table. Use the .count()
method to check the number of rows. It also accepts some column conditionals to count entries that satisfy a given criteria
table.count(), table.count(table['age']>=30)
(5, 0)
Use the .select()
method with no arguments to retrieve all rows of the table. You can also choose to select one or more columns to select.
table.select()
[Record(id=1, name='user_0', age=0.8444218515250481), Record(id=2, name='user_1', age=0.7579544029403025), Record(id=3, name='user_2', age=0.420571580830845), Record(id=4, name='user_3', age=0.25891675029296335), Record(id=5, name='user_4', age=0.5112747213686085)]
table.select('name')
['user_0', 'user_1', 'user_2', 'user_3', 'user_4']
table.select(['id','name'])
[Record(id=1, name='user_0'), Record(id=2, name='user_1'), Record(id=3, name='user_2'), Record(id=4, name='user_3'), Record(id=5, name='user_4')]
table.select(table['age'].sum)
[2.7931393069577677]
The SUM() and COUNT() SQL functions have been mapped to .sum
and .count
attributes of columns.
table.select([table['age'].sum,table['age'].count], as_dataclass=False)
[(2.7931393069577677, 5)]
Alternatively, to see the results as a pandas dataframe, we can use .select_df()
.
table.select_df()
id | name | age | |
---|---|---|---|
0 | 1 | user_0 | 0.844422 |
1 | 2 | user_1 | 0.757954 |
2 | 3 | user_2 | 0.420572 |
3 | 4 | user_3 | 0.258917 |
4 | 5 | user_4 | 0.511275 |
Now we can select specific elements of the db using the where
argument of the .select()
method.
table.select(where=table['age'] >= 1)
[]
table.select(where=table['id']==3)
[Record(id=3, name='user_2', age=0.420571580830845)]
We can update the results in a similar way, using the where
argument.
table.update({'name':'smartypants'}, where=table['id']==3)
table.select()
[Record(id=1, name='user_0', age=0.8444218515250481), Record(id=2, name='user_1', age=0.7579544029403025), Record(id=3, name='smartypants', age=0.420571580830845), Record(id=4, name='user_3', age=0.25891675029296335), Record(id=5, name='user_4', age=0.5112747213686085)]
table.update({'age':table['age']*100})
table.select()
[Record(id=1, name='user_0', age=84.4421851525048), Record(id=2, name='user_1', age=75.79544029403024), Record(id=3, name='smartypants', age=42.0571580830845), Record(id=4, name='user_3', age=25.891675029296334), Record(id=5, name='user_4', age=51.12747213686085)]
And we can delete elements using the .delete()
method.
table.delete(where=table['id']==3)
table.select()
[Record(id=1, name='user_0', age=84.4421851525048), Record(id=2, name='user_1', age=75.79544029403024), Record(id=4, name='user_3', age=25.891675029296334), Record(id=5, name='user_4', age=51.12747213686085)]
DocTable2 allows you to access columns through direct subscripting, then relies on the power of sqlalchemy column objects to do most of the work of constructing queries. Here are a few notes on their use. For more demonstration, see the example in examples/dt2_select.ipynb
# subscript is used to access underlying sqlalchemy column reference (without querying data)
table['id']
Column('id', Integer(), table=<records>, primary_key=True, nullable=False)
# conditionals are applied directly to the column objects (as we'll see with "where" clause)
table['id'] < 3
<sqlalchemy.sql.elements.BinaryExpression object at 0x7f89eeee5460>
# can also access using .col() method
table.col('id')
Column('id', Integer(), table=<records>, primary_key=True, nullable=False)
# to access all column objects (only useful for working directly with sql info)
table.columns
<sqlalchemy.sql.base.ImmutableColumnCollection at 0x7f89eef47680>
# to access more detailed schema information
table.schema_table()
name | type | nullable | default | autoincrement | primary_key | |
---|---|---|---|---|---|---|
0 | id | INTEGER | False | None | auto | 1 |
1 | name | VARCHAR | False | None | auto | 0 |
2 | age | INTEGER | True | None | auto | 0 |
# If needed, you can also access the sqlalchemy table object using the .table property.
table.table
Table('records', MetaData(bind=Engine(sqlite:///:memory:)), Column('id', Integer(), table=<records>, primary_key=True, nullable=False), Column('name', String(), table=<records>, nullable=False), Column('age', Integer(), table=<records>), schema=None)
# the count method is also an easy way to count rows in the database
table.count()
4
# the print method makes it easy to see the table name and total row count
print(table)
<DocTable (3 cols)::sqlite:///:memory::records>