Select Queries

In this document I will describe the interface for performing select queries with doctable.


import pandas as pd
import numpy as np
import typing

import sys
sys.path.append('..')
import doctable

Define a demonstration schema

The very first step is to define a table schema that will be appropriate for our examples. This table includes the typical id column (the first column, specified by order=0), as well as string, integer, and boolean attributes. The object used to specify the schema is called a container, and I will use that terminology as we go.

@doctable.table_schema
class Record:
    name: str = doctable.Column(column_args=doctable.ColumnArgs(nullable=False, unique=True))
    age: int = doctable.Column()
    is_old: bool = doctable.Column()

    id: int = doctable.Column(
        column_args=doctable.ColumnArgs(
            order = 0, 
            primary_key=True, 
            autoincrement=True
        ),
    )

core = doctable.ConnectCore.open(target=':memory:', dialect='sqlite', echo=True)

with core.begin_ddl() as ddl:
    rtab = ddl.create_table_if_not_exists(container_type=Record)

2023-11-13 08:38:32,059 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,060 INFO sqlalchemy.engine.Engine PRAGMA main.table_info("Record")
2023-11-13 08:38:32,060 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-11-13 08:38:32,061 INFO sqlalchemy.engine.Engine PRAGMA temp.table_info("Record")
2023-11-13 08:38:32,062 INFO sqlalchemy.engine.Engine [raw sql] ()
2023-11-13 08:38:32,063 INFO sqlalchemy.engine.Engine 
CREATE TABLE "Record" (
    id INTEGER, 
    age INTEGER, 
    is_old INTEGER, 
    name VARCHAR NOT NULL, 
    PRIMARY KEY (id), 
    UNIQUE (name)
)


2023-11-13 08:38:32,063 INFO sqlalchemy.engine.Engine [no key 0.00039s] ()
2023-11-13 08:38:32,064 INFO sqlalchemy.engine.Engine COMMIT

Insert test data

We insert the test data using the TableQuery interface. Because this document is about select queries, feel free to look over this for now. I show the contents of the table as a dataframe below. The interface for doing this will be covered later in this document.

import random
random.seed(0)

new_records: typing.List[Record] = list()
for i in range(10):
    age = int(random.random()*100) # number in [0,1]
    is_old = age > 50
    new_records.append(Record(name='user_'+str(i), age=age, is_old=is_old))

# insert new records
with rtab.query() as q:
    print(q.insert_multi(new_records))

# dataframe select (for example purposes - .df() will be covered later)
with core.query() as q:
    r = q.select(rtab.all_cols()).df()
r

2023-11-13 08:38:32,104 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,104 INFO sqlalchemy.engine.Engine INSERT OR FAIL INTO "Record" (age, is_old, name) VALUES (?, ?, ?)
2023-11-13 08:38:32,105 INFO sqlalchemy.engine.Engine [generated in 0.00140s] [(84, True, 'user_0'), (75, True, 'user_1'), (42, False, 'user_2'), (25, False, 'user_3'), (51, True, 'user_4'), (40, False, 'user_5'), (78, True, 'user_6'), (30, False, 'user_7'), (47, False, 'user_8'), (58, True, 'user_9')]
<sqlalchemy.engine.cursor.CursorResult object at 0x7feef3d133f0>
2023-11-13 08:38:32,106 INFO sqlalchemy.engine.Engine COMMIT
2023-11-13 08:38:32,109 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,109 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 08:38:32,109 INFO sqlalchemy.engine.Engine [generated in 0.00102s] ()
2023-11-13 08:38:32,112 INFO sqlalchemy.engine.Engine COMMIT

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1
2	3	42	0	user_2
3	4	25	0	user_3
4	5	51	1	user_4
5	6	40	0	user_5
6	7	78	1	user_6
7	8	30	0	user_7
8	9	47	0	user_8
9	10	58	1	user_9

Two Interfaces: `ConnectQuery` and `TableQuery`

There are two interfaces for performing queries: ConnectQuery and TableQuery.

ConnectQuery table-agnostic interface for querying any table in any result format.
TableQuery table-specific interface for querying a specific table. Insert and select from container objects used to define the schema.

# ConnectQuery - table agnostic
with core.query() as q:
    print(type(q))

# TableQuery - queries are relative to specific 
# table, results appear as container objects
with rtab.query() as q:
    print(type(q))

<class 'doctable.query.connectquery.ConnectQuery'>
<class 'doctable.query.tablequery.TableQuery'>

`ConnectQuery` Basics

First I will discuss the ConnectQuery interface, which is created via the ConnectCore.query() method. This object maintains a database connection, and, when used as a context manager, will commit all changes upon exit. It is fine to use the ConnectQuery object without a context manager for queries that do not require commits.

This example is the most basic select query we can execute. Note that ConnectQuery methods are table-agnostic, so we must specify columns to be selected - in this case, we provide rtab.all_cols() to specify that we want to query all columns from the Record table. It returns a sqlalchemy.CursorResult object that we will discuss later.

core.query().select(rtab.all_cols())

2023-11-13 08:38:32,204 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,205 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 08:38:32,205 INFO sqlalchemy.engine.Engine [cached since 0.09654s ago] ()





<sqlalchemy.engine.cursor.CursorResult at 0x7feef3d88130>

Selecting Specific Columns

In many cases, you would not want to select all columns from a given table - for this reason, there are several methods you can use to specify the desired columns. In addition to .all_cols() in the above snippet, you may use any of these methods.

Method	Description
`.all_cols()`	specify that we want all columns
`.cols('col1', 'col2')`	specify set of columns
`table['col1']`	specify single column
`table[['col1', 'col2']]`	specify multiple columns
`table['col1':'col3']`	specify sequential range of columns

# select all columns
rtab.all_cols()

[Column('id', Integer(), table=<Record>, primary_key=True),
 Column('age', Integer(), table=<Record>),
 Column('is_old', Integer(), table=<Record>),
 Column('name', String(), table=<Record>, nullable=False)]

# .cols method
rtab.cols('id', 'name')

[Column('id', Integer(), table=<Record>, primary_key=True),
 Column('name', String(), table=<Record>, nullable=False)]

# single-column subscript
rtab['age']

Column('age', Integer(), table=<Record>)

# list of columns
rtab[['id','is_old']]

[Column('id', Integer(), table=<Record>, primary_key=True),
 Column('is_old', Integer(), table=<Record>)]

# slice select
rtab['id':'is_old']

[Column('id', Integer(), table=<Record>, primary_key=True),
 Column('age', Integer(), table=<Record>),
 Column('is_old', Integer(), table=<Record>)]

Note that the .select() method requires a list of columns, so we can combine these methods by combining the lists they return. Obviously, the order matters for the returned values.

rtab.cols('id','is_old') + [rtab['name']]

[Column('id', Integer(), table=<Record>, primary_key=True),
 Column('is_old', Integer(), table=<Record>),
 Column('name', String(), table=<Record>, nullable=False)]

The .select() method always accepts a list of columns, so be sure to wrap single-column selections in a list.

core.query().select([rtab['name']])

2023-11-13 08:38:32,540 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,541 INFO sqlalchemy.engine.Engine SELECT "Record".name 
FROM "Record"
2023-11-13 08:38:32,542 INFO sqlalchemy.engine.Engine [generated in 0.00130s] ()





<sqlalchemy.engine.cursor.CursorResult at 0x7feef3d12f90>

core.query().select(rtab.cols('id','is_old') + [rtab['name']])

2023-11-13 08:38:32,589 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,590 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 08:38:32,590 INFO sqlalchemy.engine.Engine [generated in 0.00142s] ()





<sqlalchemy.engine.cursor.CursorResult at 0x7feef3d89010>

Working with Query Results

Now we turn to working with the results objects. So far I have demonstrated values for returning sqlalchemy.CursorResult objects, but additional methods are required to return the results in a usable format. The following methods are available for various purposes:

Method	Description
`result.all()`	return all results in query
`result.df()`	return multiple results as a dataframe
`result.first()`	return first result in query
`result.one()`	return exactly one result in query. NOTE: raises exception if not exactly one result.
`result.scalar_one()`	return single result, end query. NOTE: raises exception if not exactly one result.
`result.scalars().all()`	return single column of results

with core.query() as q:
    r = q.select(rtab.all_cols(), limit=3)
r.all()

2023-11-13 08:38:32,637 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,638 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:32,639 INFO sqlalchemy.engine.Engine [generated in 0.00185s] (3, 0)
2023-11-13 08:38:32,640 INFO sqlalchemy.engine.Engine COMMIT





[(1, 84, 1, 'user_0'), (2, 75, 1, 'user_1'), (3, 42, 0, 'user_2')]

core.query().select(rtab.all_cols(), limit=3).df()

2023-11-13 08:38:32,689 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,690 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:32,690 INFO sqlalchemy.engine.Engine [cached since 0.05336s ago] (3, 0)

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1
2	3	42	0	user_2

# raises exception without limit=1
core.query().select(rtab.all_cols()).first()

2023-11-13 08:38:32,740 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,741 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
2023-11-13 08:38:32,742 INFO sqlalchemy.engine.Engine [cached since 0.6331s ago] ()





(1, 84, 1, 'user_0')

# raises exception if more than one result is returned 
# (here I forced this with limit=1)
core.query().select(rtab.all_cols(), limit=1).one()

2023-11-13 08:38:32,788 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,789 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:32,790 INFO sqlalchemy.engine.Engine [cached since 0.1527s ago] (1, 0)





(1, 84, 1, 'user_0')

# this returns the first column from the first row, then closes the cursor
core.query().select(rtab.all_cols(), limit=1).scalar_one()

2023-11-13 08:38:32,837 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,838 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:32,838 INFO sqlalchemy.engine.Engine [cached since 0.2012s ago] (1, 0)





1

# it makes more sense to query a single column
core.query().select(rtab.cols('is_old'), limit=1).scalar_one()

2023-11-13 08:38:32,889 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,890 INFO sqlalchemy.engine.Engine SELECT "Record".is_old 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:32,890 INFO sqlalchemy.engine.Engine [generated in 0.00136s] (1, 0)





1

# and when when you need a single column, use .scalars() instead of .all()
core.query().select(rtab.cols('is_old')).scalars().all()

2023-11-13 08:38:32,940 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,941 INFO sqlalchemy.engine.Engine SELECT "Record".is_old 
FROM "Record"
2023-11-13 08:38:32,941 INFO sqlalchemy.engine.Engine [generated in 0.00138s] ()





[1, 1, 0, 0, 1, 0, 1, 0, 0, 1]

Conditional Select Statements

operator	description
`&`, `doctable.exp.and_()`	and
`\\|`, `doctable.exp.or_()`	or
`==`	equals
`!=`, `doctable.exp.not_()`	not equals
`>`	greater than
`>=`	greater than or equal to
`<`	less than
`<=`	less than or equal to
`in_()`	in list
`contains()`	check if item is substring
`like()`	like string
`ilike()`	case-insensitive like string
`between()`, `doctable.exp.between()`	between two values
`is_()`	is value
`isnot()`	is not value
`startswith()`	starts with string

core.query().select(rtab.all_cols(), where=rtab['id']==2).df()

2023-11-13 08:38:32,989 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:32,990 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id = ?
2023-11-13 08:38:32,990 INFO sqlalchemy.engine.Engine [generated in 0.00131s] (2,)

	id	age	is_old	name
0	2	75	1	user_1

core.query().select(rtab.all_cols(), where=rtab['id']<rtab['id']).df()

2023-11-13 08:38:33,037 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,038 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id < "Record".id
2023-11-13 08:38:33,038 INFO sqlalchemy.engine.Engine [generated in 0.00128s] ()

core.query().select(rtab.all_cols(), where=(rtab['id']%2)==0).df()

2023-11-13 08:38:33,089 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,090 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id % ? = ?
2023-11-13 08:38:33,091 INFO sqlalchemy.engine.Engine [generated in 0.00138s] (2, 0)

	id	age	is_old	name
0	2	75	1	user_1
1	4	25	0	user_3
2	6	40	0	user_5
3	8	30	0	user_7
4	10	58	1	user_9

condition = (rtab['id']>=2) & (rtab['id']<=4) & (rtab['name']!='user_2')
core.query().select(rtab.all_cols(), where=condition).df()

2023-11-13 08:38:33,142 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,143 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id >= ? AND "Record".id <= ? AND "Record".name != ?
2023-11-13 08:38:33,143 INFO sqlalchemy.engine.Engine [generated in 0.00147s] (2, 4, 'user_2')

	id	age	is_old	name
0	2	75	1	user_1
1	4	25	0	user_3

condition = rtab['name'].in_(('user_2','user_3'))
core.query().select(rtab.all_cols(), where=condition).df()

2023-11-13 08:38:33,194 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,195 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".name IN (?, ?)
2023-11-13 08:38:33,195 INFO sqlalchemy.engine.Engine [generated in 0.00146s] ('user_2', 'user_3')

	id	age	is_old	name
0	3	42	0	user_2
1	4	25	0	user_3

core.query().select(rtab.all_cols(), where=rtab['id'].between(2,4)).df()

2023-11-13 08:38:33,246 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,247 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id BETWEEN ? AND ?
2023-11-13 08:38:33,247 INFO sqlalchemy.engine.Engine [generated in 0.00134s] (2, 4)

	id	age	is_old	name
0	2	75	1	user_1
1	3	42	0	user_2
2	4	25	0	user_3

condition = ~(rtab['name'].in_(('user_2','user_3'))) & (rtab['id'] < 4)
core.query().select(rtab.all_cols(), where=condition).df()

2023-11-13 08:38:33,298 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,299 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE ("Record".name NOT IN (?, ?)) AND "Record".id < ?
2023-11-13 08:38:33,300 INFO sqlalchemy.engine.Engine [generated in 0.00171s] ('user_2', 'user_3', 4)

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1

condition = doctable.exp.or_(doctable.exp.not_(rtab['id']==4)) & (rtab['id'] <= 2)
core.query().select(rtab.all_cols(), where=condition).df()

2023-11-13 08:38:33,350 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,351 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".id != ? AND "Record".id <= ?
2023-11-13 08:38:33,352 INFO sqlalchemy.engine.Engine [generated in 0.00181s] (4, 2)

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1

with core.query() as q:
    ages = q.select([rtab['age']]).scalars().all()
    mean_age = sum(ages)/len(ages)
    result = q.select(rtab.all_cols(), where=rtab['age']>mean_age)
result.df()

2023-11-13 08:38:33,402 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,404 INFO sqlalchemy.engine.Engine SELECT "Record".age 
FROM "Record"
2023-11-13 08:38:33,404 INFO sqlalchemy.engine.Engine [generated in 0.00155s] ()
2023-11-13 08:38:33,406 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:33,406 INFO sqlalchemy.engine.Engine [generated in 0.00045s] (53.0,)
2023-11-13 08:38:33,407 INFO sqlalchemy.engine.Engine COMMIT

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1
2	7	78	1	user_6
3	10	58	1	user_9

Column Operators

In addition to any of the methods used for conditional selects, there are several additional methods that can be used to transform columns in the select statement.

Method	Description
`.label()`	rename column in result (particularly useful after transformations)
`.min()`, `doctable.f.min()`	max of column values
`.max()`, `doctable.f.max()`	max of column values
`.sum()`, `doctable.f.sum()`	sum of column
`.count()`, `doctable.f.count()`	count number of results. NOTE: must use `f.count()` when counting transformed columns.
`.distinct()`, `doctable.f.distinct()`	get distinct values
`/`	divide
`*`	multiply
`+`	add
`-`	subtract
`%`	modulo
`.concat()`	concatenate strings

columns = [
    (rtab['id'] % 2).label('mod_id'), 
    rtab['name'].label('myname')
]
core.query().select(columns, where=rtab['is_old']).df()

2023-11-13 08:38:33,457 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,458 INFO sqlalchemy.engine.Engine SELECT "Record".id % ? AS mod_id, "Record".name AS myname 
FROM "Record" 
WHERE "Record".is_old
2023-11-13 08:38:33,458 INFO sqlalchemy.engine.Engine [generated in 0.00153s] (2,)

	mod_id	myname
0	1	user_0
1	0	user_1
2	1	user_4
3	1	user_6
4	0	user_9

formula = rtab['age'].sum() / rtab['age'].count()
core.query().select([formula]).scalar_one()

2023-11-13 08:38:33,510 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,511 INFO sqlalchemy.engine.Engine SELECT sum("Record".age) / (count("Record".age) + 0.0) AS anon_1 
FROM "Record"
2023-11-13 08:38:33,511 INFO sqlalchemy.engine.Engine [generated in 0.00153s] ()





Decimal('53.0000000000')

formula = rtab['age'].max() - rtab['age'].min()
core.query().select([formula]).scalar_one()

2023-11-13 08:38:33,561 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,562 INFO sqlalchemy.engine.Engine SELECT max("Record".age) - min("Record".age) AS anon_1 
FROM "Record"
2023-11-13 08:38:33,563 INFO sqlalchemy.engine.Engine [generated in 0.00155s] ()





59

# average age of individuals over 30
formula = rtab['age'].sum() / rtab['age'].count()
condition = rtab['age'] > 30
core.query().select([formula], where=condition).scalar_one()

2023-11-13 08:38:33,614 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,615 INFO sqlalchemy.engine.Engine SELECT sum("Record".age) / (count("Record".age) + 0.0) AS anon_1 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:33,615 INFO sqlalchemy.engine.Engine [generated in 0.00140s] (30,)





Decimal('59.3750000000')

# descriptive stats on age of individuals over 30
columns = [
    (rtab['age'].sum() / rtab['age'].count()).label('mean'),
    rtab['age'].max().label('max'),
    rtab['age'].min().label('min'),
]
condition = rtab['age'] > 30
core.query().select(columns, where=condition).df()

2023-11-13 08:38:33,666 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,667 INFO sqlalchemy.engine.Engine SELECT sum("Record".age) / (count("Record".age) + 0.0) AS mean, max("Record".age) AS max, min("Record".age) AS min 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:33,667 INFO sqlalchemy.engine.Engine [generated in 0.00142s] (30,)

	mean	max	min
0	59.3750000000	84	40

# all distinct values
formula = rtab['is_old'].distinct()
core.query().select([formula]).df()

2023-11-13 08:38:33,717 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,718 INFO sqlalchemy.engine.Engine SELECT distinct("Record".is_old) AS distinct_1 
FROM "Record"
2023-11-13 08:38:33,718 INFO sqlalchemy.engine.Engine [generated in 0.00171s] ()

	distinct_1
0	1
1	0

# count individuals over 30
core.query().select([doctable.f.count()], where=rtab['age']>30).scalar_one()

2023-11-13 08:38:33,765 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,766 INFO sqlalchemy.engine.Engine SELECT count(*) AS count_1 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:33,767 INFO sqlalchemy.engine.Engine [generated in 0.00157s] (30,)





8

# similar to previous
core.query().select([rtab['id'].count()], where=rtab['age']>30).scalar_one()

2023-11-13 08:38:33,817 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,818 INFO sqlalchemy.engine.Engine SELECT count("Record".id) AS count_1 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:33,818 INFO sqlalchemy.engine.Engine [generated in 0.00127s] (30,)





8

Additional Parameters: Order By, Group By, Limit, Offset

More complicated queries involving ordering, grouping, limiting, and specifying offset can be specified using parameters to the .select() method.

Parameter	Description
`limit`	limit number of results
`order_by`	list of columns to order by
`group_by`	list of columns to group by
`offset`	offset results by specified number

Order By and Limits

# get the five youngest individuals in order
core.query().select(rtab.all_cols(), order_by=rtab['age'], limit=5).df()

2023-11-13 08:38:33,865 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,865 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" ORDER BY "Record".age
 LIMIT ? OFFSET ?
2023-11-13 08:38:33,866 INFO sqlalchemy.engine.Engine [generated in 0.00127s] (5, 0)

	id	age	name
0	4	25	user_3
1	8	30	user_7
2	6	40	user_5
3	3	42	user_2
4	9	47	user_8

# get the five oldest now
core.query().select(rtab.all_cols(), order_by=rtab['age'].desc(), limit=5).df()

2023-11-13 08:38:33,913 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,914 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" ORDER BY "Record".age DESC
 LIMIT ? OFFSET ?
2023-11-13 08:38:33,915 INFO sqlalchemy.engine.Engine [generated in 0.00165s] (5, 0)

	id	age	is_old	name
0	1	84	1	user_0
1	7	78	1	user_6
2	2	75	1	user_1
3	10	58	1	user_9
4	5	51	1	user_4

# order by is_old, but preserve order of id otherwise
order = [
    rtab['is_old'].desc(),
    rtab['id'].asc(),
]
core.query().select(rtab.all_cols(), order_by=order, limit=5).df()

2023-11-13 08:38:33,965 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:33,966 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" ORDER BY "Record".is_old DESC, "Record".id ASC
 LIMIT ? OFFSET ?
2023-11-13 08:38:33,966 INFO sqlalchemy.engine.Engine [generated in 0.00135s] (5, 0)

	id	age	is_old	name
0	1	84	1	user_0
1	2	75	1	user_1
2	5	51	1	user_4
3	7	78	1	user_6
4	10	58	1	user_9

Grouping and Column Operators

# summary stats by is_old
cols = [
    #rtab['is_old'].count().label('count'),
    doctable.f.count().label('count'),
    rtab['age'].min().label('min'),
    rtab['age'].max().label('max'),
    (rtab['age'].sum()/rtab['age'].count()).label('mean'),
]
core.query().select(cols, group_by=rtab['is_old']).df()

2023-11-13 08:38:34,018 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,019 INFO sqlalchemy.engine.Engine SELECT count(*) AS count, min("Record".age) AS min, max("Record".age) AS max, sum("Record".age) / (count("Record".age) + 0.0) AS mean 
FROM "Record" GROUP BY "Record".is_old
2023-11-13 08:38:34,019 INFO sqlalchemy.engine.Engine [generated in 0.00141s] ()

	count	min	max	mean
0	5	25	47	36.8000000000
1	5	51	84	69.2000000000

# summarize age by decade
decade_expression = doctable.f.round(rtab['age'] / 10)
cols = [
    decade_expression.label('decade'),
    rtab['age'].count().label('count'),
    rtab['age'].min().label('min'),
    rtab['age'].max().label('max'),
    (rtab['age'].sum()/rtab['age'].count()).label('mean'),
]
core.query().select(cols, group_by=decade_expression).df()

2023-11-13 08:38:34,070 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,071 INFO sqlalchemy.engine.Engine SELECT round("Record".age / (? + 0.0)) AS decade, count("Record".age) AS count, min("Record".age) AS min, max("Record".age) AS max, sum("Record".age) / (count("Record".age) + 0.0) AS mean 
FROM "Record" GROUP BY round("Record".age / (? + 0.0))
2023-11-13 08:38:34,072 INFO sqlalchemy.engine.Engine [generated in 0.00134s] (10, 10)

	decade	count	min	max	mean
0	3.0	2	25	30	27.5000000000
1	4.0	2	40	42	41.0000000000
2	5.0	2	47	51	49.0000000000
3	6.0	1	58	58	58.0000000000
4	8.0	3	75	84	79.0000000000

Offset and Selecting Chunks

The offset parameter is used to pagify results into multiple queries - something that is particularly useful if the result set is too larget to fit into memory.

# get the three oldest individuals, offset by three
core.query().select(rtab.all_cols(), order_by=rtab['age'].desc(), limit=3, offset=3).df()

2023-11-13 08:38:34,121 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,121 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" ORDER BY "Record".age DESC
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,122 INFO sqlalchemy.engine.Engine [generated in 0.00137s] (3, 3)

	id	age	is_old	name
0	10	58	1	user_9
1	5	51	1	user_4
2	9	47	0	user_8

for chunk in core.query().select_chunks(rtab.all_cols(), chunksize=3):
    print(chunk)

2023-11-13 08:38:34,169 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,169 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,170 INFO sqlalchemy.engine.Engine [generated in 0.00117s] (3, 0)
[(1, 84, 1, 'user_0'), (2, 75, 1, 'user_1'), (3, 42, 0, 'user_2')]
2023-11-13 08:38:34,171 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,171 INFO sqlalchemy.engine.Engine [cached since 0.002855s ago] (3, 3)
[(4, 25, 0, 'user_3'), (5, 51, 1, 'user_4'), (6, 40, 0, 'user_5')]
2023-11-13 08:38:34,172 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,173 INFO sqlalchemy.engine.Engine [cached since 0.004328s ago] (3, 6)
[(7, 78, 1, 'user_6'), (8, 30, 0, 'user_7'), (9, 47, 0, 'user_8')]
2023-11-13 08:38:34,175 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,175 INFO sqlalchemy.engine.Engine [cached since 0.006853s ago] (3, 9)
[(10, 58, 1, 'user_9')]

`TableQuery` Basics

The TableQuery interface is used to make table-specific queries, and, in exchange for this restriction, allows you to insert and select container objects directly. Queries on tables look much like their table-agnostic counterparts, with a few exceptions. Every query still begins with the .query() method, which returns a TableQuery object with methods for inserting and selecting container objects.

List of selected columns is optional - if you do not specify, the query will default to all columns in the table. Otherwise, you should provide a subset of the columns, where all attributes that were not received will refer to doctable.MISSING, which you may check for downstream.
Results of a select query are returned as a list of container objects. This means that we have called the .all() method on the sqlalchemy.CursorResult object.
All returned results must match the attributes of the container object. Most often you will want to select raw database rows, but transformations via group_by and other operators are also possible as long as the result set attributes match the container attributes - that is, they can be expanded to the container constructor.
Behavior of where, order_by, limit, are offset all operate as-expected.

Below you can see a few examples demonstrating this behavior.

rtab.query().select(where=rtab['age']>50)

2023-11-13 08:38:34,217 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,218 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record" 
WHERE "Record".age > ?
2023-11-13 08:38:34,219 INFO sqlalchemy.engine.Engine [generated in 0.00128s] (50,)





[Record(name='user_0', age=84, is_old=1, id=1),
 Record(name='user_1', age=75, is_old=1, id=2),
 Record(name='user_4', age=51, is_old=1, id=5),
 Record(name='user_6', age=78, is_old=1, id=7),
 Record(name='user_9', age=58, is_old=1, id=10)]

result = rtab.query().select(rtab.cols('id', 'age'), where=rtab['is_old'])
f'{result[0].name is doctable.MISSING=}', result

2023-11-13 08:38:34,268 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,269 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age 
FROM "Record" 
WHERE "Record".is_old
2023-11-13 08:38:34,270 INFO sqlalchemy.engine.Engine [generated in 0.00134s] ()





('result[0].name is doctable.MISSING=True',
 [Record(name=MISSING, age=84, is_old=MISSING, id=1),
  Record(name=MISSING, age=75, is_old=MISSING, id=2),
  Record(name=MISSING, age=51, is_old=MISSING, id=5),
  Record(name=MISSING, age=78, is_old=MISSING, id=7),
  Record(name=MISSING, age=58, is_old=MISSING, id=10)])

# this is valid, although perhaps not recommended
cols = [
    (rtab['age'].sum()/rtab['age'].count()).label('age'),
]
rtab.query().select(cols=cols)

2023-11-13 08:38:34,317 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,318 INFO sqlalchemy.engine.Engine SELECT sum("Record".age) / (count("Record".age) + 0.0) AS age 
FROM "Record"
2023-11-13 08:38:34,318 INFO sqlalchemy.engine.Engine [generated in 0.00123s] ()





[Record(name=MISSING, age=Decimal('53.0000000000'), is_old=MISSING, id=MISSING)]

.select_chunks() also more or less works as-expected, with the chunks being converted to container objects.

for chunk in rtab.query().select_chunks(chunksize=3):
    print(chunk)

2023-11-13 08:38:34,364 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-11-13 08:38:34,365 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,365 INFO sqlalchemy.engine.Engine [cached since 0.1964s ago] (3, 0)
[Record(name='user_0', age=84, is_old=1, id=1), Record(name='user_1', age=75, is_old=1, id=2), Record(name='user_2', age=42, is_old=0, id=3)]
2023-11-13 08:38:34,366 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,367 INFO sqlalchemy.engine.Engine [cached since 0.198s ago] (3, 3)
[Record(name='user_3', age=25, is_old=0, id=4), Record(name='user_4', age=51, is_old=1, id=5), Record(name='user_5', age=40, is_old=0, id=6)]
2023-11-13 08:38:34,368 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,368 INFO sqlalchemy.engine.Engine [cached since 0.1994s ago] (3, 6)
[Record(name='user_6', age=78, is_old=1, id=7), Record(name='user_7', age=30, is_old=0, id=8), Record(name='user_8', age=47, is_old=0, id=9)]
2023-11-13 08:38:34,369 INFO sqlalchemy.engine.Engine SELECT "Record".id, "Record".age, "Record".is_old, "Record".name 
FROM "Record"
 LIMIT ? OFFSET ?
2023-11-13 08:38:34,369 INFO sqlalchemy.engine.Engine [cached since 0.2007s ago] (3, 9)
[Record(name='user_9', age=58, is_old=1, id=10)]