Databases for text analysis

doctable is a Python package for designing and manipulating database tables through an object-oriented interface.

This package allows you to define and manipulate databases as regular Python objects. See the vignette examples.

  1. Create a database schema definition using the @doctable.schema decorator. In addition to providing the schema definition for the table, this class operates much like a dataclass to encapsulate your retrieved or inserted data. See the schema guide for more details.
  2. Define the interface to your data model by creating a class that inherits from DocTable. The base class provides methods for inserting, retrieving, and changing database rows.
  3. Merge your parsed text with relevant document metadata and use your DocTable interface to store and retrieve data from the database.
  4. Develop a text parsing pipeline using ParsePipeline. This class can distribute your text parsing across multiple processes, and can be used to create ParseTree objects to be stored in your database.

Install
pip install doctable
pip install --upgrade git+https://github.com/devincornell/doctable.git@master

DocTable
Overview » API » Vignette »

Object-oriented interface for querying and manipulating database tables.



ParsePipeline
Overview »

Various tools for parsing and storing documents into databases.



Other Utilities

doctable includes a number of other tools for text analysis and database management.

  • Timer: used to time function calls and log your scripts at different stages.
  • FSStore: interact with data rows directly on the filesystem. Great for parallel processing.
  • TempFolder: create temporary folder that will be deleted upon garbage collection.


Vignette: US National Security Strategy Documents

This demonstration shows a typical DocTable workflow. We show how to create a new DocTable, insert NSS document text and metadata, and parse data for storage in the table.

See NSS Example »

Created by

Devin J. Cornell


Devin uses computational methods to study cultural processes through which organizations and individuals produce meaning.

Twitter @devin_cornell