DocTable Examples: Select

Here I show how to select data from a DocTable. We cover object-oriented conditional selects emulating the WHERE SQL clause, as well as some reduce functions.

Regular Selects

These functions all return lists of ResultProxy objects. As such, they can be accessed using numerical indices or keyword indices. For instance, if one select output row is row=(1, 'user_0') (after selecting "id" and "user"), it can be accessed such that row[0]==row['id'] and row[1]==row['user'].

Conditional Selects

Column Operators

I bind the .min, .max, .count, .sum, and .mode methods to the column objects. Additionally, I move the .count method to a separate DocTable2 method.

ORDER BY, GROUP BY, LIMIT

These additional arguments have also been provided.

SQL String Commands and Additional Clauses

For cases where DocTable2 does not provide a convenient interface, you may submit raw SQL commands. These may be a bit more unwieldly, but they offer maximum flexibility. They may be used either as simply an addition to the WHERE or arbitrary end clauses, or accessed in totality.

Count Method and Get Next ID

.count() is a convenience method. Mostly the same could be accomplished by db.select_first(db['id'].count), but this requires no reference to a specific column.

.next_id() is especially useful if one hopes to enter the id (or any primary key column) into new rows manually. Especially useful because SQL engines don't provide new ids except when a single insert is performed.

Select as Pandas Series and DataFrame

These are especially useful when working with metadata because Pandas provides robust descriptive and plotting features than SQL alone. Good for generating sample information.

Select with Buffer

In cases where you have many rows or each row contains a lot of data, you may want to perform a select query which makes requests in chunks. This is performed using the SQL OFFSET command, and querying up to buffsize while yielding each returned row. This system is designed this way because the underlying sql engine buffers all rows retreived from a query, and thus there is no way to stream data into memory without this system.

NOTE: The limit keyword is incompatible with this method - it will return all results. A workaround is to use the approx_max_rows param, which will return at minimum this number of rows, at max the specified number of rows plus buffsize.