https://github.com/lueck/standoff-mode.git
git clone 'git://github.com/lueck/standoff-mode.git'
standoff-mode
is a major mode for
GNU Emacs that lets you create
annotations on texts in a stand-off manner. It is written for use in
the field of digital humanities and the manual annotation of training
data for named-entity recognition.
There are several tools for creating stand-off markup. Most of them
need to be deployed on a server in a network environment, which may be
a barrier. In contrast standoff-mode
does not need a networking
environment. It wants to enable one to get hands on annotating texts
right away.
Markup can be stored in several formats with standoff-mode
:
including dumped lisp-expressions (implemented), a remote or local
SQL-Database or as RDF-triples in a SPARQL-endpoint following the
emerging standard defined in the
OpenAnnotation ontology
(roadmap) or as local files following
BRAT's plain-text format
(planned).
standoff-mode
doesn't want to be everything under one hood. It's
just a tool for the manual annotation of texts. Statistics must be
done by another tool.
Since it was written for the field of digital humanities, literature
studies in particular, standoff-mode
works not only with plain text
input (source) files, but also with XML. So semantic stand-off markup
produced with it may reference structural markup coded in TEI/P5,
which may be of advantage for further processing.
Stand-off markup is also known as external markup and means:
Stand-off markup refers to a source document by some kind of
pointers. standoff-mode
uses character offsets.
It is contained in an external document (or a database).
The source document is left unchanged and may be read-only.
The source document may contain markup too, called internal markup. Stand-Off Mode facilitates reading of XML source documents by hiding tags and showing glyphs for character references.
Cf. the TEI/P5 guidelines on stand-off markup and the OpenAnnotation ontology.
allows discontinuous markup
allows relations between markup elements (RDF-like directed graphs)
allows attributes on markup elements
allows text comments anchored on one or several markup elements
generate config for your annotation schema from OWL by XSLT
allows to customize the restrictiveness of the annotations, either to the annotation schema plugged in via config (apriori), or the schema already used (a posteriori), or free
offers completion of user input of markup types, relation predicates and attribute names
hide the fdq-names (IRIs) of markup types, predicates and attributes behind labels (from OWL or RDFS), customizable
customization of highlighting faces
everything can be done with the keyboard an key-codes
several pluggable back-ends (under development)
manual based on GNU Texinfo, English (under development) and German
standoff-mode
is under active development. Here's the roadmap:
text comments
SPARQL back-end
SQL back-end
BRAT-like back-end
Only GNU Emacs is
required. After the installation of the editor the standoff-mode
package has to be installed. It was tested on Windows, Linux and Mac,
with versions 24.3 and 24.5.
If you want to store your markup in SQL-tables or as RDF-triples, a RDBMS or a SPARQL-endpoint is required.