`DocTable` Overview

A DocTable acts as an object-oriented interface to a single database table. It combines the utility of dataclasses to create schemas from simple object definitions and sqlalchemy to create connections and execute queries to a database. It should be easy to convert existing data-oriented objects to database schemas, and use those objects when inserting/retrieving data.

In this document I'll cover these topics:

Creating Schemas
Managing Connections
Inserting, Deleting, and Selecting
Select Queries

You may also want to see the vignettes for more examples, the DocTable docs for more information about the class, or the schema guide for more information about creating schemas. I also recommend looking examples for insert, delete, select, and update methods.

import random
random.seed(0)
import pandas as pd
import numpy as np
from dataclasses import dataclass

import sys
sys.path.append('..')
import doctable

1. Creating a Database Schema

DocTable schemas are created using the doctable.schema decorator on a class that uses doctable.Col for defaulted parameters. Check out the schema guide for more detail about schema classes. Our demonstration class will include three columns: id, name, and age, with an additional .is_old property derived from age for example.

Note that the id column uses the default value IDCol() which sets the variable to be the primary key and to auto-increment. Arguments passed to the generic Col() function are passed directly to the sqlalchemy metadata to direct column creation. See more in the schema guide.

@doctable.schema
class Record:
    __slots__ = []
    id: int = doctable.IDCol()
    name: str = doctable.Col(nullable=False)
    age: int = doctable.Col()

    @property
    def is_old(self):
        return self.age >= 30 # lol

We can instantiate a DocTable by passing a target and schema (Record in our example) parameters, and I show the resulting schema using .schema_table(). Note that the type hints were used to describe column types, and id was used as the auto-incremented primary key.

table = doctable.DocTable(target=':memory:', schema=Record)
table.schema_table()

	name	type	nullable	default	autoincrement	primary_key
0	id	INTEGER	False	None	auto	1
1	name	VARCHAR	False	None	auto	0
2	age	INTEGER	True	None	auto	0

Probably a more common use case will be to subclass DocTable to provide some basic definitions.

class RecordTable(doctable.DocTable):
    _tabname_ = 'records'
    _schema_ = Record
    _indices_ = (
        doctable.Index('ind_age', 'age'),
    )
    _constraints_ = (
        doctable.Constraint('check', 'age > 0'),
    )

table = RecordTable(target=':memory:')
table

<__main__.RecordTable at 0x7f5b40325940>

2. Maintaining Database Connections

Obviously a big part of working with databases involves managing connections with the database. By default, DocTable instances DO NOT maintain persistent connections to the database - instead, they open a connection as-needed when executing a query. Benchmark comparisons show that the cost of creating a connection is so low relative to an actual insertion that this probably the approach for most applications.

Alternatively, there are several ways of working with connections: as a context manager, using the persistent_conn constructor parameter, manually calling open_conn() and close_conn() (not recommended), and manually requesting a connection to execute using your own sqlalchemy or raw sql library queries.

As a context manager. Note that the __enter__ method returns the doctable instance itself, so you can access it using with or without the "as" keyword.

tab = doctable.DocTable(target=':memory:', schema=Record)
print(tab._conn)
with tab as t:
    r = Record(name = 'Devin Cornell', age = 32)
    print(dir(r))
    t.insert_single(Record(name = 'Devin Cornell', age = 32))
    print(t._conn)

# alternatively, no need to use "as"
with tab:
    tab.insert_single(Record(name = 'Devin Cornell', age = 32))
    print(tab._conn)
tab.head()

None
['__annotations__', '__class__', '__dataclass_fields__', '__dataclass_params__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', '__wrapped__', '_doctable__age', '_doctable__id', '_doctable__name', '_doctable_as_dict', '_doctable_from_db', '_doctable_get_val', 'age', 'as_dict', 'get_val', 'id', 'is_old', 'name']
<sqlalchemy.engine.base.Connection object at 0x7f5b40304550>
<sqlalchemy.engine.base.Connection object at 0x7f5b403141c0>


/DataDrive/code/doctable/examples/../doctable/doctable.py:391: UserWarning: .insert_single() is depricated: please use .q.insert_single() or .q.insert_single_raw()
  warnings.warn(f'.insert_single() is depricated: please use .q.insert_single() or '
/DataDrive/code/doctable/examples/../doctable/doctable.py:408: UserWarning: Method .head() is depricated. Please use .q.select_head() instead.
  warnings.warn('Method .head() is depricated. Please use .q.select_head() instead.')

	id	name	age
0	1	Devin Cornell	32
1	2	Devin Cornell	32

Using the constructor argument persistent_conn

tab = doctable.DocTable(target=':memory:', schema=Record, persistent_conn=False)
print(tab._conn)
tab = doctable.DocTable(target=':memory:', schema=Record, persistent_conn=True)
tab._conn

None





<sqlalchemy.engine.base.Connection at 0x7f5b402cd280>

Manually calling .open_conn() and .close_conn(). I recommend using a context manager if you go this route.

tab = doctable.DocTable(target=':memory:', schema=Record)
print(tab._conn)
tab.open_conn()
print(tab._conn)
tab.close_conn()
print(tab._conn)

None
<sqlalchemy.engine.base.Connection object at 0x7f5b403097c0>
None

grabbing a connection object to execute your own sqlalchemy queries

conn = tab.connect()
conn

<sqlalchemy.engine.base.Connection at 0x7f5b40309550>

3. Insert, Delete, and Select

The nature of doctable schema definitions means the easiest way to work with database data is often to use the schema class as a normal dataclass. I recommend the schema guide for more detail about the relationship between dataclasses, schema classes, and behavior of the actual database. While the intent behind using dataclasses for database schemas is intuitive and valuable, it can be tricky.

NOTE!!!: Unlike ORM-based applications, DocTable instances do not have any connection to instances of the schema class - they are simply used to encapsulate data to be stored and retrieved in the table. This is why the same object can be inserted multiple times in this example.

Lets start off by creating some record objects and inserting them into the database with .insert_single() and .insert_many(). In the Record constructor here we do not specify the id value - this is because our database schema dictated that it will be automatically incremented by the database - if we omit the value in the constructor, by default it will simply not pass any value to the database at all (this can be changed later though). See that the results of our call to .head() shows that the rows were given id values upon insertion.

table = doctable.DocTable(target=':memory:', schema=Record)

o = Record(name='Devin Cornell', age=35)
table.insert_single(o, verbose=True)
table.insert_single(o)
table.insert_many([o, o], verbose=True)
table.head()

DocTable: INSERT OR FAIL INTO _documents_ (id, name, age) VALUES (?, ?, ?)
DocTable: INSERT OR FAIL INTO _documents_ (id, name, age) VALUES (?, ?, ?)


/DataDrive/code/doctable/examples/../doctable/doctable.py:391: UserWarning: .insert_single() is depricated: please use .q.insert_single() or .q.insert_single_raw()
  warnings.warn(f'.insert_single() is depricated: please use .q.insert_single() or '
/DataDrive/code/doctable/examples/../doctable/doctable.py:379: UserWarning: .insert_many() is depricated: please use .q.insert_multi() or .q.insert_multi_raw()
  warnings.warn(f'.insert_many() is depricated: please use .q.insert_multi() or '
/DataDrive/code/doctable/examples/../doctable/doctable.py:408: UserWarning: Method .head() is depricated. Please use .q.select_head() instead.
  warnings.warn('Method .head() is depricated. Please use .q.select_head() instead.')

	id	name	age
0	1	Devin Cornell	35
1	2	Devin Cornell	35
2	3	Devin Cornell	35
3	4	Devin Cornell	35

Now we use .select() to retrieve data from the database. Here we call it with no parameters to simply get all the objects we previously inserted, this time with the id values that the database provided.

results = table.select(verbose=True)
results

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_


/DataDrive/code/doctable/examples/../doctable/doctable.py:452: UserWarning: Method .select() is depricated. Please use .q.select() instead.
  warnings.warn('Method .select() is depricated. Please use .q.select() instead.')





[Record(id=1, name='Devin Cornell', age=35),
 Record(id=2, name='Devin Cornell', age=35),
 Record(id=3, name='Devin Cornell', age=35),
 Record(id=4, name='Devin Cornell', age=35)]

4. More Complicated Queries

And, of course, the most important part of any database library is to execute queries. To do this, DocTable objects keep track of sqlalchemy core MetaData and Table objects and build queries using the select(), delete(), insert(), and update() methods from sqlalchemy core.

First, note that subscripting the table object allows you to access the underlying sqlalchemy Column objects, which, as I will show a bit later, can be used to create where conditionals for select and update queries. You can also access specific column data using the .c property of the doctable.

table = doctable.DocTable(target=':memory:', schema=Record)

table['id'], table.c.id

(Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False),
 Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False))

As we'll show later, these column objects also have some operators defined such that they can be used to construct complex queries and functions. You can read more about this in the sqlalchemy operators documentation.

table.c.id > 3, table.c.id.in_([1,2]), table.c.age == 4

(<sqlalchemy.sql.elements.BinaryExpression object at 0x7f5b4025cd00>,
 <sqlalchemy.sql.elements.BinaryExpression object at 0x7f5b4025cc70>,
 <sqlalchemy.sql.elements.BinaryExpression object at 0x7f5b4025c8e0>)

You can use these expressions as part of select(), update(), and delete() operations by passing them to the where argument.

table.insert_single(Record(name='Devin Cornell', age=35))
table.insert_single(Record(name='Sam Adams', age=250))
table.insert_single(Record(name='Rando', age=500))

table.select(where=table.c.id >= 3, verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_ 
WHERE _documents_.id >= ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:391: UserWarning: .insert_single() is depricated: please use .q.insert_single() or .q.insert_single_raw()
  warnings.warn(f'.insert_single() is depricated: please use .q.insert_single() or '





[Record(id=3, name='Rando', age=500)]

table.select_first(where=table.c.name=='Devin Cornell', verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_ 
WHERE _documents_.name = ?
 LIMIT ? OFFSET ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:427: UserWarning: Method .select_first() is depricated. Please use .q.select_first() instead.
  warnings.warn('Method .select_first() is depricated. Please use .q.select_first() instead.')





Record(id=1, name='Devin Cornell', age=35)

Select Statements

Now we show how to select data from the table. Use the .count() method to check the number of rows. It also accepts some column conditionals to count entries that satisfy a given criteria

table.count(verbose=True), table.count(table['age']>=30, verbose=True)

DocTable: SELECT count(_documents_.id) AS count_1 
FROM _documents_
 LIMIT ? OFFSET ?
DocTable: SELECT count(_documents_.id) AS count_1 
FROM _documents_ 
WHERE _documents_.age >= ?
 LIMIT ? OFFSET ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:403: UserWarning: Method .count() is depricated. Please use .q.count() instead.
  warnings.warn('Method .count() is depricated. Please use .q.count() instead.')





(3, 3)

Use the .select() method with no arguments to retrieve all rows of the table. You can also choose to select one or more columns to select.

table.select(verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_





[Record(id=1, name='Devin Cornell', age=35),
 Record(id=2, name='Sam Adams', age=250),
 Record(id=3, name='Rando', age=500)]

By specifying a column name, you can retrieve a list of column values, or by offering a list of data, you can request only those datas.

table.select('name', verbose=True)

DocTable: SELECT _documents_.name 
FROM _documents_





['Devin Cornell', 'Sam Adams', 'Rando']

# note we have no access to the ID column - just name, but still part of Record type.
table.select(['name'], verbose=True)

DocTable: SELECT _documents_.name 
FROM _documents_





[Record(name='Devin Cornell'), Record(name='Sam Adams'), Record(name='Rando')]

Accessing a property which was not retrieved from the database will raise an exception.

rec = table.select_first(['name'])
try:
    rec.id
except doctable.RowDataNotAvailableError as e:
    print('This exception was raised:', e)

This exception was raised: The "id" property is not available. This might happen if you did not retrieve the information from a database or if you did not provide a value in the class constructor.

You may also use aggregation functions like .sum.

table.select_first(table['age'].sum(), verbose=True)

DocTable: SELECT sum(_documents_.age) AS sum_1 
FROM _documents_
 LIMIT ? OFFSET ?





785

The SUM() and COUNT() SQL functions have been mapped to .sum and .count attributes of columns. Use as_dataclass=False if you do retrieve data which does not fit into a Record object.

table.select_first([table['age'].sum(),table['age'].count()], verbose=True)

DocTable: SELECT sum(_documents_.age) AS sum_1, count(_documents_.age) AS count_1 
FROM _documents_
 LIMIT ? OFFSET ?
DocTable: SELECT sum(_documents_.age) AS sum_1, count(_documents_.age) AS count_1 
FROM _documents_
 LIMIT ? OFFSET ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:443: UserWarning: Conversion from row to object failed according to the following error. Please use .q.select_first(..,raw_result=True) next time in the future to avoid this issue. e=RowDataConversionFailed("Conversion from <class 'sqlalchemy.engine.row.LegacyRow'> to <class '__main__.Record'> failed.")
  warnings.warn(f'Conversion from row to object failed according to the following '





(785, 3)

Alternatively, to see the results as a pandas dataframe, we can use .select_df().

table.select_df(verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_


/DataDrive/code/doctable/examples/../doctable/doctable.py:420: UserWarning: Method .select_df() is depricated. Please use .q.select_df() instead.
  warnings.warn('Method .select_df() is depricated. Please use .q.select_df() instead.')

	id	name	age
0	1	Devin Cornell	35
1	2	Sam Adams	250
2	3	Rando	500

Now we can select specific elements of the db using the where argument of the .select() method.

table.select(where=table['age'] >= 1, verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_ 
WHERE _documents_.age >= ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:452: UserWarning: Method .select() is depricated. Please use .q.select() instead.
  warnings.warn('Method .select() is depricated. Please use .q.select() instead.')





[Record(id=1, name='Devin Cornell', age=35),
 Record(id=2, name='Sam Adams', age=250),
 Record(id=3, name='Rando', age=500)]

table.select(where=table['id']==3, verbose=True)

DocTable: SELECT _documents_.id, _documents_.name, _documents_.age 
FROM _documents_ 
WHERE _documents_.id = ?





[Record(id=3, name='Rando', age=500)]

We can update the results in a similar way, using the where argument.

table.update({'name':'smartypants'}, where=table['id']==3, verbose=True)
table.select()

DocTable: UPDATE _documents_ SET name=? WHERE _documents_.id = ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:504: UserWarning: Method .update() is depricated. Please use .q.update() instead.
  warnings.warn('Method .update() is depricated. Please use .q.update() instead.')





[Record(id=1, name='Devin Cornell', age=35),
 Record(id=2, name='Sam Adams', age=250),
 Record(id=3, name='smartypants', age=500)]

print(table['age']*100)
table.update({'age':table['age']*100}, verbose=True)
table.select()

_documents_.age * :age_1
DocTable: UPDATE _documents_ SET age=(_documents_.age * ?)





[Record(id=1, name='Devin Cornell', age=3500),
 Record(id=2, name='Sam Adams', age=25000),
 Record(id=3, name='smartypants', age=50000)]

And we can delete elements using the .delete() method.

table.delete(where=table['id']==3, verbose=True)
table.select()

DocTable: DELETE FROM _documents_ WHERE _documents_.id = ?


/DataDrive/code/doctable/examples/../doctable/doctable.py:509: UserWarning: Method .delete() is depricated. Please use .q.delete() instead.
  warnings.warn('Method .delete() is depricated. Please use .q.delete() instead.')





[Record(id=1, name='Devin Cornell', age=3500),
 Record(id=2, name='Sam Adams', age=25000)]

Notes on DB Interface

DocTable2 allows you to access columns through direct subscripting, then relies on the power of sqlalchemy column objects to do most of the work of constructing queries. Here are a few notes on their use. For more demonstration, see the example in examples/dt2_select.ipynb

# subscript is used to access underlying sqlalchemy column reference (without querying data)
table['id']

Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

# conditionals are applied directly to the column objects (as we'll see with "where" clause)
table['id'] < 3

<sqlalchemy.sql.elements.BinaryExpression object at 0x7f5b4022fc10>

# can also access using .col() method
table.col('id')

Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False)

# to access all column objects (only useful for working directly with sql info)
table.columns

<sqlalchemy.sql.base.ImmutableColumnCollection at 0x7f5b40388db0>

# to access more detailed schema information
table.schema_table()

	name	type	nullable	default	autoincrement	primary_key
0	id	INTEGER	False	None	auto	1
1	name	VARCHAR	False	None	auto	0
2	age	INTEGER	True	None	auto	0

# If needed, you can also access the sqlalchemy table object using the .table property.
table.table

Table('_documents_', MetaData(bind=Engine(sqlite:///:memory:)), Column('id', Integer(), table=<_documents_>, primary_key=True, nullable=False), Column('name', String(), table=<_documents_>, nullable=False), Column('age', Integer(), table=<_documents_>), schema=None)

# the count method is also an easy way to count rows in the database
table.count()

/DataDrive/code/doctable/examples/../doctable/doctable.py:403: UserWarning: Method .count() is depricated. Please use .q.count() instead.
  warnings.warn('Method .count() is depricated. Please use .q.count() instead.')





2

# the print method makes it easy to see the table name and total row count
print(table)

<DocTable (3 cols)::sqlite:///:memory::_documents_>

DocTable Overview