previous up contents next
Left: Opening declarations Up: Style sheet for DATR Right: Closing declarations

Code style

  Put the name of the node being defined on a line on its own, followed by all the equations associated with, each on its own line, and each indented four spaces from the left margin. Put at least one blank line between each node definition. For example:

V:
    <cat> == v
    <sem> == pred1.

Adjective:
    <> == V
    <cat> == a.

Rationale: if you put the first equation on the same line as the node name then constant indentation of equations becomes highly problematic. The only solution that looks good is to indent every equation to the depth required by the longest node name in the theory. But this is a global matter, not one you can decide at the time you define a particular node. So it requires you to re-edit the whole file when you have completed it. It also has at least two further disadvantages: (i) it normally means that the indentation for equations is much greater than four spaces and this cramps the space available for material on the RHS and may make it necessary to split the RHS across two lines; and (ii) it makes the appearance of the file contingent on choice of node name - you may subsequently want to change Very_Long_Node_Name to Foo and thus destroy the basis for the 20 space indentation that you were previously obliged to use. The alternatives, to indent all non-first equations to some standard depth, or (worse) to indent the equations for each node differently depending on the length of the node name, both look awful.

Having two or more equations on the same line will normally make the code hard to read, as will failing to separate node definitions with blank lines. In exceptional circumstances, it may make sense to violate some or all of these recommendations. For example:

N1: <a0> == 0    <a1> == 1    <a2> == 0    <a3> == 1.
N2: <a0> == 1    <a1> == 1    <a2> == 0    <a3> == 1.
N3: <a0> == 0    <a1> == 1    <a2> == 1    <a3> == 1.
N4: <a0> == 0    <a1> == 1    <a2> == 0    <a3> == 0.

But this kind of case is rare in practice.

DATR allows you to call nodes, attributes and values whatever you like. However, it is worth giving your names some thought and adopting some principles and/or conventions that will help others to understand your code, either because they are self-explicating or because you include a comment explaining what your conventions are. A typical convention you might consider adopting is to represent abstract non-terminal nodes in capitals (e.g., VERB) but to represent the leaves of the inheritance tree (typically lexeme nodes) with initial capitals only (e.g., Love).

Give some thought also to the length of your attribute names. This may sound a peculiar suggestion but it has a very visible bearing on how the theorems of your theory will appear and, hence, on how intelligible they will be. A good policy to adopt is to use the same character length for all attributes that can appear in a given position in a path. To see why this is a good policy, compare the two following examples:

Puer:
    <mor nom sing> = puer
    <mor voc sing> = puer
    <mor acc sing> = puer um
    <mor nom plur> = puer ii
    <mor voc plur> = puer ii
    <mor acc plur> = puer oos.

Puer:
    <mor nomin sg> = puer
    <mor vc sg> = puer
    <mor accusative sg> = puer um
    <mor nomin plural> = puer ii
    <mor vc plural> = puer ii
    <mor accusative plural> = puer oos.

If you can, avoid long sentences that will require line breaks. If you cannot, then one acceptable strategy is to break after the == and flush the right hand side of the equation to the right margin:

Node:
    <an extremely long path full of attributes> ==
                      <another extremely long path full of attributes>.

Another acceptable strategy is to break the right hand side into suitable components and align these:

Node:
    <a long path full of attributes> == <"<first rhs component>"
                                         "<second rhs component>"
                                         "<third rhs component>">.

If your DATR analysis deals, however marginally, with more than one level of linguistic description, then you should probably use attributes like phn, mor, syn, sem, etc., as the first items in all the relevant paths. This will make your code easier for others to read and may well make it easier for you to develop and maintain it.

There is a price to be paid for putting comments on the same line as DATR code. If the % comment characters are not aligned throughout the file, then it will look a mess. But keeping them aligned is subject to all the problems associated with indentation discussed above. In particular, global substitutions of node, attribute, or value names will often destroy your alignment and require further tedious editing to restore it.

Aligning the == in equations is tempting and can sometimes improve readability. But it is usually impossible to maintain the alignment over more than a few nodes without leaving enormous gaps (that can themselves make the code less readable) and such an alignment is subject to the vagaries of subsequent changes to attribute names. Such alignment should thus be used sparingly and only when it makes an obvious contribution to the readability of the code.

Be orderly (in Grice's sense) in your presentation of the nodes in your theory. If the structure of the nodes approximates to a tree then one kind of orderly presentation would start with the root node and end with the leaves. But if the interest of the theory lies in the leaves, then an alternative presentation would have the leaves at the beginning. Give some thought to how the order of presentation can assist the human reader of your file in making sense of your analysis.

If your code contains utility nodes (like CASE or boolean connectives) or other nodes that serve a specialist function, then you may want to put them together near the end of the node definitions so that they do not clutter or obscure the logic of the substantive content of the file.

To separate sections of material in your file, use the sequence of spaces and percent symbols that appears immediately below. It has the advantage that you can simply copy it from your file header and it is consistent with the style set by that header (which the alternatives probably aren't).

% % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % % %

---------------------------------------------------------

previous up contents next
Left: Opening declarations Up: Style sheet for DATR Right: Closing declarations
Copyright © Roger Evans, Gerald Gazdar & Bill Keller, Tuesday 10 November 1998