ffmddb¶
A flat-file-with-metadata database.
This is a reference implementation for a simple document database idea based on flat-files, each of which contains at least one field (a large text blob, the document field) and potentially many other fields formed of structured data contained in a metadata blob within the file.
In short, it turns files written in a Jekyll fashion into objects in a database. The ‘post content’ turns into the document field, and the metadata blob turns into other fields. Indices are built and querying becomes possible within the indices (full document querying should rely on something like elasticsearch). The same data and relations are represented, but in a format easily edited in any text editor, easily readible or served from something like Jekyll, and easily stored in a VCS repo. The goal is not speed, but flexibility for manually interfacing with smaller datasets.
ffmddb Configuration¶
ffmddb
relies on a single configuration file to figure out how to interact with the database. This file contains a YAML blob, which describes a few things about the structure of the data. It informs the database of where
Configuration file (.ffmddbrc) example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | my_log_files:
version: 1
collections:
- name: logs
path: ./logs
- name: participants
path: ./participants
indices:
- name: log_tag
from: ['logs', 'metadata:tag']
- name: participants_logs
from: ['participants', 'name:']
to: ['logs', 'metadata:participants']
fence: ['<!--ffmddb', '-->']
|
Config entry | Default | Explanation |
---|---|---|
<Root level key> | N/A | The name of the database. |
collections |
N/A | A list of YAML objects. Each object should containa name and a path entry. The name should contain letters, numbers, and underscores and start with a letter. The path should be relative to the configuration file. Can be empty. |
indices |
N/A | A list of YAML objects. Each object should contain a name, a from field, and an optional to field. The name should contain letters, numbers, and underscores and start with a letter. The from and to fields should contain an array with the first item being the name of a collection and the second being a query for selecting one field. Can be empty, cannot contain data if collections is empty. |
fence |
['---+', '---+'] |
The fence that delineates the metadata field from the document field. Should be a two-item array, with the two items being strings containing the open fence and the close fence. The strings will be interpreted as regular expressions (the default being an example, specifying both fences as three or more hyphens), so be careful to escape where needed. fences occur on a line by themselves. Multiple metadata blocks may occur in a file; they will be merged before parsing. |
index_path |
.ffmddb_idx |
The folder relative to the configuration file which contains the indices. |
multiple_metadata |
False | Whether or not to collect metadata from multiple fenced blocks. |
some more
The ffmddb Database¶
Documents and indices¶
File format¶
Files which will be documents in the database should be textual. They can be of any format, so long as, when read, the fenced metadata may be found. For example, you could have a markdown file with a metadata block:
---
layout: post
title: My great document
tags:
- foxes
- cats
- dogs
---
Wow, foxes and dogs and cats are all *really great*!
In this instance, you can see that the file itself is an actual Jekyll file.
Fences do not need to be Jekyll style (three or more hyphens), but may be anything, so long as they’re specified in the configuration file. For example, you can specify the fence to be an XML comment if you’re storing XML-based documents.
In the configuration file:
mydb:
...
fence: ['<!--ffmddb', '-->']
And in the document:
<mydoc>
<!--ffmddb
foo: bar
baz: qux
-->
...
</mydoc>
ffmddb¶
ffmddb package¶
Subpackages¶
ffmddb.core package¶
Subpackages¶
Models, to ffmddb are any thing that maps from a string or file to a python object.
For files, this includes:
- a
document
, which maps to one of the files ffmddb knows about - a folder acting as a
collection
of documents - an
index
file
For strings, this includes:
- a
configuration
YAML blob (which may come from a file) - a json
query
against the database (which may be a python dict) - a
field
spec
-
class
ffmddb.core.models.config.
Configuration
(name, collections, indices, options)¶ Stores database configuration read from a file or the user.
-
exception
MalformedConfiguration
¶ Bases:
exceptions.Exception
-
classmethod
Configuration.
from_object
(config_obj)¶ Parses a configuration object (as generated by loading a yaml configuration file) into an internal object used by the database
-
Configuration.
marshal
()¶ marshals the configuration object back to YAML
-
exception
-
class
ffmddb.core.models.document.
Collection
(name, path)¶ Stores a reference to a collection of documents
-
marshal
()¶
-
-
class
ffmddb.core.models.index.
CoreIndex
¶ Bases:
ffmddb.core.models.index.Index
Represents the core index, which tracks documents and metadata field names, as well as indices
-
class
ffmddb.core.models.index.
CrossCollectionIndex
(name, from_collection_field, to_collection_field)¶ Bases:
ffmddb.core.models.index.Index
Represents an index on one field common to a collection which maps to a field on another (or the same) collection
-
marshal
()¶
-
-
class
ffmddb.core.models.index.
Index
¶ Provides an interface of common methods for collection types
-
read
()¶
-
write
()¶
-
-
class
ffmddb.core.models.index.
SingleCollectionIndex
(name, collection_field)¶ Bases:
ffmddb.core.models.index.Index
Represents an index on one field common to a collection
-
marshal
()¶
-
-
class
ffmddb.core.models.query.
Filter
(filter_obj)¶ Stores a single filter for comparing a field to a value
-
OPERATORS
= {'le': <function <lambda> at 0x7f437f953ed8>, 'lt': <function <lambda> at 0x7f437f953de8>, 'gt': <function <lambda> at 0x7f437f953d70>, 'in': <function <lambda> at 0x7f437f953f50>, 'ge': <function <lambda> at 0x7f437f953e60>, 'contains': <function <lambda> at 0x7f437f890050>, 'eq': <function <lambda> at 0x7f437f9539b0>, 'ne': <function <lambda> at 0x7f437f953cf8>}¶
-
classmethod
is_filter
(obj)¶ duck-types a dict to see if it looks like a filter object
-
run
(document)¶ runs the test against the document, comparing the metadata field specified by the filter’s field against the provided value using the provided operation
-
-
class
ffmddb.core.models.query.
FilterGroup
(conjunction, filter_list)¶ Stores a list of filter objects joined by a conjunction
-
CONJUNCTIONS
= {'and': <function <lambda> at 0x7f437f890230>, 'not': <function <lambda> at 0x7f437f890320>, 'or': <function <lambda> at 0x7f437f8902a8>}¶
-
classmethod
is_filter_group
(obj)¶ duck-types a dict to see if it looks like a filter-group
-
run
(document)¶ runs each specified filter in the group and reduces the results to a single value with the provided conjunction
-
Submodules¶
-
class
ffmddb.core.database.
Database
(config_obj, config_file=None)¶ Stores a reference to a database (a configuration file and the files it specifies), providing methods to interact with it
-
close
()¶
-
create_collection
(name, path, mkdir_if_needed=True, keep_file=True)¶
-
create_document
(document)¶
-
delete_collection
(name, cascade=False)¶
-
delete_document
(document)¶
-
classmethod
from_file
(config_file)¶
-
classmethod
from_string
(config_str, config_file=None)¶
-
get_collection
(name)¶
-
get_documents
(collection_name, query)¶
-
update_document
(document, field, value)¶
-
ffmddb Use Case Scenario¶
ffmddb was born from the idea that the best tool for editing a textfile is a text editor, and yet even text files benefit from managed metadata and relations between objects, as shown by a case study:
I’ve been on the ‘net for well over twenty years now, and over that period of time, I’ve amassed hundreds of log files. Some are notes, some are important conversations that led to relationship, some are inane conversations with individuals who have since passed away.
In that time, I’ve run through several different organizational schemes, databases, and projects to manage these files. I wanted the organizational benefits of a relational database, the freedom of a document database, and the flexibility of editing the files by hand in whatever editor I choose. Finally, I wanted the ability to keep the files in a repository.
For the above problem space, the relation solution would be:
- A table of participants (name, about)
- A table of logs (name, text, date)
- A mapping table (log, participant)
In ffmddb, that maps to:
- A folder of participant files, text files with any document data, named after the participant
- A folder of log files, text files with any document data, and metadata containing a list of participants and the date of the log
- An index file containing mapping between logs by participant for faster queries
The same data and relations are represented, but in a format easily edited in any text editor, easily readible or served from something like Jekyll, and easily stored in a VCS repo. The goal is not speed, but flexibility for manually interfacing with smaller datasets.