Skip to content

Table Schemas

In this document, I give some examples for defining single and multi-table database schemas in Python.

import sys
sys.path.append('../')
import doctable
import pprint

Containers and the table_schema decorator

The first step in using doctable is to define a container object. Container objects are defined using the table_schema decorator, and are used both to define the schema of a database table and to wrap the data for insertion and selection. Container objects act very similar to normal dataclasses - in fact, they actually are dataclasses with additional information needed to create the database table attached. This informaiton is collected at the time when the decorator is used, and thus the decorator serves only to parse the database schema from the class definition, attach that information to the container class, and return the container type as a dataclass.

@doctable.table_schema # equivalent to @doctable.table_schema()
class Container1:
    name: str

ins = doctable.inspect_schema(Container1)
print(ins.table_name())
ins.column_info_df()
Container1
Col Name Col Type Attr Name Hint Order Primary Key Foreign Key Index Default
0 name String name str (inf, 0) False False None None
@doctable.table_schema(table_name='container2')
class Container2:
    name: str
    age: int
ins = doctable.inspect_schema(Container2)
print(ins.table_name())
ins.column_info_df()
container2
Col Name Col Type Attr Name Hint Order Primary Key Foreign Key Index Default
0 name String name str (inf, 0) False False None None
1 age Integer age int (inf, 1) False False None None

Specifying Column Properties

There are two sets of parameters you may adjust to change the behavior of a column:

  • ColumnArgs: adjust the behavior of the generated column. This does not affect the container object, but does affect the database column.

  • FieldArgs: adjust the behavior of container attribute by passing arguments to dataclasses.field(). This does not affect the database column, but does affect the way the container object can be used.

Both are passed directly to the Column function, which, as you can see, simply returns a dataclasses.field object with column arguments passed to the metadata attribute. Note that by default, the default argument is set to doctable.MISSING, so the parameter is optional and will be populated with that value. Missing values will be ignored when inserting the object into the database.

doctable.Column(
    column_args=doctable.ColumnArgs(),
    field_args=doctable.FieldArgs(),
)
Field(name=None,type=None,default=MISSING,default_factory=<dataclasses._MISSING_TYPE object at 0x7fc0f9afe590>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({'_column_args': ColumnArgs(order=inf, column_name=None, type_kwargs={}, use_type=None, sqlalchemy_type=None, autoincrement=False, nullable=True, unique=None, primary_key=False, index=None, foreign_key=None, default=None, onupdate=None, server_default=None, server_onupdate=None, comment=None, other_kwargs={})}),kw_only=<dataclasses._MISSING_TYPE object at 0x7fc0f9afe590>,_field_type=None)
import datetime

class PhoneNumber(str):
    pass

class Address(str):
    pass

@doctable.table_schema(table_name='container3')
class Container3:
    name: str
    age: int = doctable.Column(field_args=doctable.FieldArgs(init_required=True))
    address: Address = doctable.Column()
    phone: PhoneNumber = doctable.Column()

    # this column will appear first in the database, even though this attribute is later
    _id: int = doctable.Column(
        column_args=doctable.ColumnArgs(
            column_name='id', # name of the column in the db (might not want to have an attr called 'id')
            order = 0, # affects the ordering of the columns in the db
            primary_key=True,
            autoincrement=True,
        ),
    )

    # doctable will define default and onupdate when inserting into database
    added: datetime.datetime = doctable.Column(
        column_args=doctable.ColumnArgs(
            default=datetime.datetime.now, 
            onupdate=datetime.datetime.now
        ),
        field_args = doctable.FieldArgs(
            repr=False, # don't show this field when printing
        )
    )    

doctable.inspect_schema(Container3).column_info_df()
Col Name Col Type Attr Name Hint Order Primary Key Foreign Key Index Default
0 id Integer _id int (0, 4) True False None None
1 name String name str (inf, 0) False False None None
2 age Integer age int (inf, 1) False False None None
3 address String address Address (inf, 2) False False None None
4 phone String phone PhoneNumber (inf, 3) False False None None
5 added DateTime added datetime (inf, 5) False False None now

Notice that the string representation does not show the added attribute, as specified via FieldAargs(repr=False).

Container3('Devin J. Cornell', 30)
Container3(name='Devin J. Cornell', age=30, address=MISSING, phone=MISSING, _id=MISSING)

Indices

Indices may be added to a table by passing a dictionary of name, Index pairs to the indices parameter of the table_schema decorator. The arguments are the columns, and any additional keyword arguments may be passed after.

@doctable.table_schema(
    table_name='container4',
    indices = {
        'ind_name': doctable.Index('name'),
        'ind_name_age': doctable.Index('name', 'age', unique=True),
    }
)
class Container4:
    name: str
    age: int

ins = doctable.inspect_schema(Container4)
ins.index_info_df()
name columns kwargs
0 ind_name name
1 ind_name_age name, age unique: True

Constraints

You may pass constraints through the constraint parameter of the table_schema decorator.

There are several types of constraints you may want to use in your schema. The following methods are thin wrappers over the SQLAlchemy objects of the same name.

docs Constraint Description
link ForeignKey(local_columns, foreign_columns, optional[onupdate], optional[ondelete]) A foreign key constraint.
link CheckConstraint(text, optional[Name]) A unique constraint.
link UniqueConstraint(*column_names, optional[name]) A unique constraint.
link PrimaryKeyConstraint(*column_names, optional[name]) A unique constraint.
@doctable.table_schema(
    table_name='container5',
    constraints = [
        #doctable.ForeignKey(..), # see multi-table schemas below
        doctable.CheckConstraint('age >= 0', name='check_age'),
        doctable.UniqueConstraint('age', 'name', name='unique_age_name'),
        doctable.PrimaryKeyConstraint('id'),
    ]
)
class Container5:
    id: int # this is the primary key now
    name: str
    age: int

Column Types

The column type resolution works according to the following steps:

  1. Check ColumnArgs.sqlalchemy_type and use this if it is not None.
  2. Check if column is foreign key - if it is, ask sqlalchemy to resolve the type
  3. Check ColumnArgs.use_type and use this if it is provided.
  4. Use the provided type hint to resolve the type.

The valid type hints and their sqlalchemy equivalents are listed below.

Type Hint SQLAlchemy Type
int sqlalchemy.Integer
float sqlalchemy.Float
bool sqlalchemy.Boolean
str sqlalchemy.String
bytes sqlalchemy.LargeBinary
datetime.datetime sqlalchemy.DateTime
datetime.time sqlalchemy.Time
datetime.date sqlalchemy.Date
typing.Any sqlalchemy.PickleType
'datetime.datetime' sqlalchemy.DateTime
'datetime.time' sqlalchemy.Time
'datetime.date' sqlalchemy.Date
'Any' sqlalchemy.PickleType

You can get the mappings programatically if needed as well:

doctable.type_mappings()
{int: sqlalchemy.sql.sqltypes.Integer,
 float: sqlalchemy.sql.sqltypes.Float,
 bool: sqlalchemy.sql.sqltypes.Boolean,
 str: sqlalchemy.sql.sqltypes.String,
 bytes: sqlalchemy.sql.sqltypes.LargeBinary,
 datetime.datetime: sqlalchemy.sql.sqltypes.DateTime,
 datetime.time: sqlalchemy.sql.sqltypes.Time,
 datetime.date: sqlalchemy.sql.sqltypes.Date,
 doctable.schema.column.column_types.PickleType: sqlalchemy.sql.sqltypes.PickleType,
 'datetime.datetime': sqlalchemy.sql.sqltypes.DateTime,
 'datetime.time': sqlalchemy.sql.sqltypes.Time,
 'datetime.date': sqlalchemy.sql.sqltypes.Date,
 doctable.schema.column.column_types.JSON: sqlalchemy.sql.sqltypes.JSON}

Special Column Types

There are several special column types that can be used in your schemas.

Type Hint SQLAlchemy Type Description
doctable.JSON sqlalchemy.JSON Calls json.dumps on write, json.loads on read.
doctable.PickleType sqlalchemy.PickleType Calls pickle.dumps on write, pickle.loads on read.
import dataclasses
import typing

@dataclasses.dataclass
class Address:
    street: str
    city: str
    state: str
    zip: str

@doctable.table_schema
class Container6:
    name: str

    # NOTE: will be serialized as a JSON string in the database
    # notice how we can use a more accurate type hint and still specify
    # the column type using use_type
    other_info: typing.Dict[str, typing.Union[str,int,float]] = doctable.Column(
        column_args=doctable.ColumnArgs(
            use_type=doctable.JSON,
        ),
        field_args=doctable.FieldArgs(default_factory=dict),
    )

    # NOTE: will be pickled in the database
    address: Address = doctable.Column(
        column_args=doctable.ColumnArgs(
            use_type=doctable.PickleType,
        )
    )

doctable.inspect_schema(Container6).column_info_df()
Col Name Col Type Attr Name Hint Order Primary Key Foreign Key Index Default
0 name String name str (inf, 0) False False None None
1 other_info JSON other_info Dict (inf, 1) False False None None
2 address PickleType address Address (inf, 2) False False None None

Now create a new container object that contains an address for insertion.

new_obj = Container6(
    name = 'Devin J. Cornell', 
    other_info = {'favorite_color': 'blue'},
    address = Address('123 Main St.', 'San Francisco', 'CA', '94122'), 
)
new_obj
Container6(name='Devin J. Cornell', other_info={'favorite_color': 'blue'}, address=Address(street='123 Main St.', city='San Francisco', state='CA', zip='94122'))

Now we open a new database, insert the row, and query it back - you can see that the dict data was converted to json and back again, and the address was converted to pickle data and back again.

core = doctable.ConnectCore.open(':memory:', 'sqlite')
with core.begin_ddl() as ddl:
    tab = ddl.create_table(Container6)

with tab.query() as q:
    q.insert_single(new_obj)

with core.query() as q:
    result = q.select(tab.all_cols())
result.first()
('Devin J. Cornell', {'favorite_color': 'blue'}, Address(street='123 Main St.', city='San Francisco', state='CA', zip='94122'))

Multi-table Schemas

The example below shows two linked tables: one for colors, and the other for people. Each person has a favorite color that is constrained by a foriegn key to the colors table. The colors table also has a unique constraint on the color name. I demonstrate use of the Column function to describe behavior of columns - specifically the use of ColumnArgs to specify additional column features that are not conveyed through type annotations or attribute names. I also show use of the Index object for creating indexes, the UniqueConstraint object for creating unique constraints, and the ForeignKey object for creating foreign key constraints.

Note that the container object representing the database schema is also a usable dataclass that can used like any other container object. In fact, tables created according to this schema can insert these objects directly and will wrap return values issued via select queries.

import datetime

@doctable.table_schema(
    table_name='color',
    constraints = [
        doctable.UniqueConstraint('name'),
    ]
)
class Color:
    name: str
    id: int = doctable.Column(
        column_args=doctable.ColumnArgs(
            primary_key=True,
            autoincrement=True,
        )
    )

# lets say we use this instead of an int
class PersonID(int):
    pass

# add table-level parameters to this decorator
@doctable.table_schema(
    table_name='person',
    indices = {
        'ind_name_birthday': doctable.Index('name', 'birthday', unique=True),
    },
    constraints = [ # these constraints are set on the database
        doctable.CheckConstraint('length(address) > 0'), # cannot have a blank address
        doctable.UniqueConstraint('birthday', 'fav_color'),
        doctable.ForeignKey(['fav_color'], ['color.name'], onupdate='CASCADE', ondelete='CASCADE'),
    ],
    frozen = True, # parameter passed to dataclasses.dataclass
)
class Person:
    name: str

    # default value will be "not provided" - good standardization
    address: str = doctable.Column(
        column_args=doctable.ColumnArgs(
            server_default='not provided',
        )
    )

    # provided as datetime, set to be indexed
    birthday: datetime.datetime = doctable.Column(
        column_args=doctable.ColumnArgs(
            index = True,
        )
    )

    # note that this has a foreign key constraint above
    fav_color: str = doctable.Column(
        column_args=doctable.ColumnArgs(
            nullable=False,
        )
    )

    id: PersonID = doctable.Column( # standard id column
        column_args=doctable.ColumnArgs(
            order=0, # will be the first column
            primary_key=True,
            autoincrement=True
        ),
    )

    # doctable will define default and onupdate when inserting into database
    added: datetime.datetime = doctable.Column(
        column_args=doctable.ColumnArgs(
            index=True,
            default=datetime.datetime.utcnow, 
            onupdate=datetime.datetime.utcnow
        )
    )

    # this property will not be stored in the database 
    #   - it acts like any other property
    @property
    def age(self):
        return datetime.datetime.now() - self.birthday


core = doctable.ConnectCore.open(
    target=':memory:', 
    dialect='sqlite'
)
# NOTE: weird error when trying to run this twice after defining containers
with core.begin_ddl() as emitter:
    core.enable_foreign_keys() # NOTE: NEEDED TO ENABLE FOREIGN KEYS
    color_tab = emitter.create_table_if_not_exists(container_type=Color)
    person_tab = emitter.create_table_if_not_exists(container_type=Person)
for col_info in person_tab.inspect_columns():
    print(f'{col_info["name"]}: {col_info["type"]}')
id: INTEGER
name: VARCHAR
address: VARCHAR
birthday: DATETIME
fav_color: VARCHAR
added: DATETIME

Insertion into the color table is fairly straightforward.

color_names = ['red', 'green', 'blue']
colors = [Color(name=name) for name in color_names]
with color_tab.query() as q:
    q.insert_multi(colors)
    for c in q.select():
        print(c)
    #print(q.select())
Color(name='red', id=1)
Color(name='green', id=2)
Color(name='blue', id=3)

Insertion into the person table is similar, and note that we see an exception if we try to insert a person with a favorite color that is not in the color table.

persons = [
    Person(name='John', birthday=datetime.datetime(1990, 1, 1), fav_color='red'),
    Person(name='Sue', birthday=datetime.datetime(1991, 1, 1), fav_color='green'),
    Person(name='Ren', birthday=datetime.datetime(1995, 1, 1), fav_color='blue'),
]
other_person = Person(
    name='Bob', 
    address='123 Main St', 
    birthday=datetime.datetime(1990, 1, 1), 
    fav_color='other', # NOTE: THIS WILL CAUSE AN ERROR (NOT IN COLOR TABLE)
)

import sqlalchemy.exc

sec_in_one_year = 24*60*60*365
with person_tab.query() as q:
    q.insert_multi(persons, ifnotunique='replace')

    try:
        q.insert_single(other_person, ifnotunique='replace')
        print(f'THIS SHOULD NOT APPEAR')
    except sqlalchemy.exc.IntegrityError as e:
        print(f'successfully threw exception: {e}')

    for p in q.select():
        print(f'{p.name} ({p.fav_color}): {p.age.total_seconds()//sec_in_one_year:0.0f} y/o')
successfully threw exception: (sqlite3.IntegrityError) FOREIGN KEY constraint failed
[SQL: INSERT OR REPLACE INTO person (name, address, birthday, fav_color, added) VALUES (?, ?, ?, ?, ?)]
[parameters: ('Bob', '123 Main St', '1990-01-01 00:00:00.000000', 'other', '2023-11-14 22:17:40.402308')]
(Background on this error at: https://sqlalche.me/e/20/gkpj)
John (red): 33 y/o
Sue (green): 32 y/o
Ren (blue): 28 y/o

The foreign key works as expected because we set onupdate: changing that value in the parent table will update the value in the child table.

with color_tab.query() as q:
    q.update_single(dict(name='reddish'), where=color_tab['name']=='red')
    for c in q.select():
        print(c)

with person_tab.query() as q:
    for p in q.select():
        print(f'{p.name} ({p.fav_color}): {p.age.total_seconds()//sec_in_one_year:0.0f} y/o')
Color(name='reddish', id=1)
Color(name='green', id=2)
Color(name='blue', id=3)
John (reddish): 33 y/o
Sue (green): 32 y/o
Ren (blue): 28 y/o