Karin Achterholt, (1989) Phonological Underspecification in Phonology. Seminar paper: D. Gibbon: Methoden der Phonetik/Phonologie: Phonetische Beschreibungstechniken, Bielefeld: University of Bielefeld,
Francois Andry, Norman Fraser, Scott McGlashan, Simon Thornton, Nick Youd, (1992) Making DATR work for speech: lexicon compilation in SUNDIAL. Computational Linguistics 18, 3, 245-267.
This paper presents a modular inheritance-based tool which facilitates the rapid construction of linguistic knowledge bases. Simple lexical entries are added to an application-specific DATR lexicon which inherits morphosyntactic, syntactic, and lexico-semantic constraints from an application-independent set of structured base definitions. A lexicon generator expands the DATR lexicon out into a disjunctive normal form lexicon. This is then encoded either as an acceptance lexicon (in which the constraining features are bit-encoded for use in pruning word lattices), or as a full lexicon (which is used for assigning interpretations or for generating messages). Inheritance plays a vital role at each level in the compilation architecture.
Petra Barg, (1994) Automatic acquisition of DATR theories from observations. Duesseldorf: Heinrich-Heine University of Duesseldorf, Theories des Lexicons: Arbeiten des Sonderforschungsbereichs 282, 59,
The automatic acquisition of linguistic knowledge from examples or observations is a topic of increasing interest. An approach to this task is presented where the acquired knowledge is represented in the lexical knowledge representation language DATR. The basic components of the learning approach are a set of transformation rules that define possible transformations of a given DATR theory and a default-inference algorithm that reduces a monotonic DATR theory to a default theory. Since the overall approach is not restricted to any special kind of knowledge, the heuristic inference strategy requires criteria to evaluate the quality of a DATR theory with respect to a given set of observations. Different domains may select different criteria or give different priority to a set of criteria.
Petra Barg, (1996) Automatic inference of DATR theories. H-H. Bock & W. Polasek, Data Analysis and Information Systems: Statistical and Conceptual Approaches Berlin: Springer, 506-515.
[Proceedings of the 19th Annual Conference of the Gesellschaft fuer Klassifikation e.V., University of Basel.]
Diana Billigheimer, (1990) A natural language interface to a speech therapy database. Brighton: University of Sussex, MSc thesis,
This thesis describes a DCG-based natural language interface to a semantic network that encodes information about patients, therapists, and communication defects. The network is implemented in DATR. A DCG parser translates an English question into a "logical form" and an evaluation module then uses this as the basis for one or more queries to the semantic network. A further module then formats the theorems that result from such queries into something that can be recognised as an answer to the original English question.
Doris Bleiching, (1990) Das Wortfeld 'family' als semantisches Netz. Thesis for Staatsexamen (L.A., Sekundarstufe II), Bielefeld: University of Bielefeld,
Doris Bleiching, (1991) Default-Hierarchen in der deutschen Wortprosodie. Bielefeld: University of Bielefeld, ASL-TR-19-91,
Doris Bleiching, (1992) Prosodisches Wissen in Lexicon. G. Goerz, KONVENS-92 Berlin: Springer-Verlag, 59-68.
Doris Bleiching, (1994) Integration von Morphophonologie und Prosodie in ein hierarchisches Lexicon. Harald Trost, Vienna: Oesterreichische Gesellschaft fuer Artificial Intelligence, Proceedings of KONVENS-94 32-41.
Doris Bleiching, Guido Drexel, Dafydd Gibbon, (1996) Ein Synkretismusmodell fuer die deutsche Morphologie. Dafydd Gibbon, Berlin: Mouton de Gruyter, Proceedings of KONVENS-96 237-248.
Morphology models in computational linguistics have tended to be language-specific, in that the data structures and operations used have reflected the typology of individual languages. Starting with a discussion of the syncretistic properties of German inflectional morphology, a generic denotational semantics for known language-independent inflectional structures is outlined. This semantics underlies the design of a generative morphological lexicon compiler for spoken German, which projects 7000 stems extracted from a corpus of spoken language dialogues to 30,000 fully inflected forms and 120,000 morphological category mappings (after resolution of syncretism).
Dunstan Brown, (1994) Getting your priorities right: a network morphology approach to morphological stress. Guildford: University of Surrey, Unpublished paper presented to the Spring Meeting of the Linguistics Association of Great Britain, Salford,
Dunstan Brown, (1994) Network Morphology and morphophonological selection. Guildford: University of Surrey, Unpublished paper presented to the Autumn Meeting of the Linguistics Association of Great Britain, Middlesex,
Dunstan Brown, (1995) Network Morphology and the Russian verb [abstract]. A.E. Kibrik, I.M. Kobozeva, A.I. Kuznecova, T.B. Nazarova, Linguistics at the end of the 20th century: Achievements and perspectives 1, Moscow: Filologiceskij fakultet MGU imeni M.V. Lomonosova, 74-76.
Dunstan Brown, (1995) Setevaja morfologija i russkaja glagol'naja sistema. Vestnik Moskovskogo Universiteta, ser. 9, Filologija 6, 91-108.
Dunstan Brown, (1998) Stem indexing and morphophonological selection in the Russian verb. Ray Fabri, Albert Ortmann, T. Parodi, Models of Inflection Tuebingen: Niemeyer, 196-221.
Dunstan Brown, (1998) Defining `subgender': virile and devirilized nouns in Polish. Lingua 104, 187-233.
We analyse dependencies between grammatical categories within the Network Morphology framework. The dependencies are expressed by Category Dependency Constraints, and they determine the dependency of case on number, of gender on number, and of gender on case. Within this area, a particularly interesting challenge is the notion of `subgender', which is an additional gender distinction within a minimal subset of the paradigm. We consider the difficult case of the Polish masculine-personal nouns.
Dunstan Brown, (1998) Declension and conjugation. D.A. Cruse, F. Hundschnurscher, M. Job, P. Lutzeier, Lexicology: an international handbook on the nature and structure of words and vocabularies Berlin: Walter de Gruyter, 00-00.
Dunstan Brown, (1998) The general and the exceptional in Russian nominal morphology. Guildford: University of Surrey, PhD dissertation,
Dunstan Brown, Andrew Hippisley, (1994) Conflict in Russian genitive plural assignment: A solution represented in DATR. Journal of Slavic Linguistics 2, 1, 48-76.
Inflectional endings are assigned in languages by general principles, but these can come into conflict. The paper addresses the question of how such conflict is resolved. A particularly complex example is the Russian genitive plural, where there is a conflict between exponent assignment according to declension class and a default exponent assignment for soft-stem nouns. What is specially interesting is that the conflict here can be resolved by reference to subsystems over and above the paradigm, such as stress. An explicit account of the conflict and its mediation is presented, based on default inheritance. For this purpose the lexical knowledge representation language DATR is used. This allows one to demonstrate in the output provided that the correct forms are indeed predicted by the theory.
Dunstan Brown, Andrew Hippisley, (1995) DATR for linguists. University of Surrey, Unpublished paper,
Dunstan Brown, Greville Corbett, Norman Fraser, Andrew Hippisley, Alan Timberlake, (1996) Russian noun stress and network morphology. Linguistics 34, 1, 53-107.
This paper presents a network morphology analysis of Russian noun stress. Nouns have a default fixed stem stress, but some nouns have nondefault stress that may deviate in a way that is determined by the form's position within the paradigm; different declensions prefer particular patterns as their nondefault choices. Membership of a particular declension, it is argued, constrains the range of possible stress patterns. Stress is represented as a hierarchy with limited deviation in terms of number and, less often, case. Indices in the declension hierarchy are addressed to nodes in the stress hierarchy. These indices correspond to rank orderings that declensions have for stress patters. Lexical items inherit a default value for index rank but may override this.
Pierrette Bouillon, (1990) La morphologie automatiques du Francais avec DATR. Geneva: ISSCO, Unpublished manuscript,
This paper documents a rather comprehensive DATR fragment for the morphology of French adjectives, nouns and verbs.
Lynne Cahill, (1990) Syllable-based morphology for NLP. Brighton: University of Sussex, DPhil thesis,
Chapter 5 and Appendices A-D of this thesis show how expressions of the syllable sequence mapping language MOLUSC can be embedded in DATR theories so as to provide full accounts of the morphology and morphophonology of the Arabic, English and Sanskrit verbal systems. In this approach, DATR takes care of the distribution of morphemes whilst MOLUSC is responsible for their phonological realization.
Lynne Cahill, (1993) Morphonology in the lexicon. Sixth Conference of the European Chapter of the Association for Computational Linguistics 87-96.
This paper presents a means of defining morphonological phenomena in an inheritance based lexicon, making use of the theory behind the formal language MOLUSC, in which morphological alternations were defined as mappings between sequences of tree-structured syllables. The paper shows how such alternations can be defined in the inheritance based lexical representation language DATR, and how the phonological aspects can be built upon to create an integrated lexicon with representations that can be used by both the morphology and the phonology of a language.
Lynne Cahill, (1993) Some reflections on the conversion of the TIC lexicon into DATR. Ted Briscoe, Valeria de Paiva, Ann Copestake, Inheritance, defaults, and the lexicon Cambridge: Cambridge University Press, 47-57.
The Traffic Information Collator (TIC) is a prototype system which takes verbatim police reports of traffic incidents, interprets them, builds a picture of what is happening on the roads and broadcasts appropriate messages to motorists where necessary. Cahill & Evans (1990) describes the process of converting the main TIC lexicon (around 1000 words specific to the domain of traffic reports) into DATR. This paper reviews the strategy adopted in the conversion discussed in that paper, and discusses the results of converting the whole lexicon, together with statistics comparing efficiency and performance between the original lexicon and the DATR version.
Lynne Cahill, (1994) An inheritance-based lexicon for message understanding systems. Fourth ACL Conference on Applied Natural Language Processing 211-212.
Lynne Cahill, Roger Evans, (1990) An application of DATR: the TIC lexicon. ECAI-90 120-125.
Also in Evans & Gazdar (1990) The DATR Papers, Vol. 1, pp. 31-39. The Traffic Information Collator (TIC) is a natural language understanding system operating in the domain of road traffic incident reports. This paper describes the application of DATR to a fragment of the TIC's lexicon, and discusses a range of techniques which can be used to overcome the problems of practical lexical representation.
Lynne Cahill, Gerald Gazdar, (1997) A lexical analysis of numeral expressions in Dutch, English and German. Brighton: University of Sussex, Unpublished paper,
In this paper we present a synchronic lexical analysis of numeral expressions in the three major modern West Germanic languages, Dutch, English and German. We present an account which makes use of default inheritance, defining the similarities between the languages as default values which may be overridden for individual languages. We discuss how our account, which covers the syntax, morphology and phonology of numeral expressions, reflects the historical divergences that have taken place in the three languages from their common root.
Lynne Cahill, Gerald Gazdar, (1997) The inflectional phonology of German adjectives, determiners and pronouns. Linguistics 35, 2, 211-245.
This is the first of a series of papers that, taken together, will give an essentially complete account of inflection in standard German. In this paper we present that part of the account that covers adjectives, determiners and third person pronouns, one that captures all the regularities, subregularities and irregularities that are involved. The forms are defined in terms of their syllable structure, as proposed in Cahill (1990, 1993). The morphological treatment is based on ideas originally set out by Zwicky in the mid-1980s.
Lynne Cahill, Gerald Gazdar, (1998, in press) German noun inflection. Journal of Linguistics 35, 1, 00-00.
This is the second of a series of papers that, taken together, will give an essentially complete account of inflection in standard German.
Lynne Cahill, Gerald Gazdar, (1998, in press) The PolyLex architecture: multilingual lexicons for related languages. Traitement Automatique des Langues 38, ?, 00-00.
Lynne Cahill, Gerald Gazdar, (1998, in press) Allomorphy in PolyLex. A. Ralli, S. Scalise, Proceedings of the First Mediterranean Morphology Meeting ??: ??,
Lynne Cahill, Julie Carson-Berndsen, Gerald Gazdar, (1998, in press) Phonology-based lexical knowledge representation. Dafydd Gibbon, Frank van Eynde, Ineke Schuurman, Lexicon Development for Speech and Natural Language Processing Dordrecht: Kluwer, 00-00.
Greville Corbett, Norman Fraser, (1993) Network morphology: a DATR account of Russian nominal inflection. Journal of Linguistics 29, 113-142.
The paper presents an analysis of the inflectional morphology of Russian nominals which encodes information in terms of a network of nodes and facts. This approach, called network morphology, makes extensive use of default inheritance and is formalized in DATR. The analysis given has been tested and been shown to generate the correct forms for each of the regular declensional classes, and for a range of irregular items.
Greville Corbett, Norman Fraser, (1995) Computational linguistics meets typology [abstract]. A.E. Kibrik, I.M. Kobozeva, A.I. Kuznecova, T.B. Nazarova, Linguistics at the end of the 20th century: Achievements and perspectives 1, Moscow: Filologiceskij fakultet MGU imeni M.V. Lomonosova, 256-258.
Greville Corbett, Norman Fraser, (1996 (to appear)) Default genders. B. Unterbeck, Approaches to Gender Berlin: Mouton, 00-00.
Greville Corbett, Norman Fraser, (1996 (to appear)) Gender assignment: a typology and a model. Gunter Senft, Back to Basic Issues in Nominal Classification ?: ?, 00-00.
Guido Drexel, (1993) Repraesentation hierarchischer Lexika: DATR in einer objekt-orientierten Ungebung. MA Thesis, Bielefeld: University of Bielefeld,
Markus Duda, Gunter Gebhardi, (1994) DUTR -- A DATR-PATR interface formalism. Harald Trost, Vienna: Oesterreichische Gesellschaft fuer Artificial Intelligence, Proceedings of KONVENS-94 411-414.
This paper presents a *dynamic* interface between DATR and PATR.
Tomaz Erjavec, (1992) Treatments of Slovene verb morphology in inheritance models. MSc Thesis, Edinburgh: Centre for Cognitive Science, University of Edinburgh,
Nicholas Evans, Dunstan Brown, Greville Corbett, (1998) Emu divorce: a unified account of gender and noun class assignment in Mayali. Paper presented at the 34th Regional Meeting of the Chicago Linguistic Society 00-00.
Roger Evans, (1990) An introduction to the Sussex Prolog DATR system. Roger Evans, Gerald Gazdar, The DATR Papers Brighton: University of Sussex, Cognitive Science Research Paper CSRP 139, 63-71.
This paper documents installation and implementation-specific aspects of the Sussex Prolog DATR system. It explains how to use the compiler and the various ways in which a compiled DATR theory can be queried.
Roger Evans, (1992) Derivational morphology in DATR. Lynne Cahill, Richard Coates, Sussex Papers in General and Computational Linguistics Brighton: University of Sussex, Cognitive Science Research Paper CSRP 239, 55-69.
This paper presents a DATR analysis of some aspects of English derivational morphology, and demonstrate how the facilities of the language allow succinct description of derivational concepts. The aim is not to present a new theory of derivational morphology, but rather to show how existing ideas in the field can be expressed in terms of DATR's default and inheritance mechanisms. To this end, the analysis is based on a single, coherent, but informal account of the data, namely Bauer's "English Word-formation" (Cambridge University Press, 1983). The account presented is a description rather than a representation of derivational morphology. This entails that representational issues such as productivity and lexicalisation lie outside its scope.
Roger Evans, Gerald Gazdar, (1989) Inference in DATR. Fourth Conference of the European Chapter of the Association for Computational Linguistics 66-71.
Also in Evans & Gazdar (1990) The DATR Papers, Vol. 1, pp. 15-20. This paper provides a formal definition of the syntax of the DATR language and the theory of inference.
Roger Evans, Gerald Gazdar, (1989) The semantics of DATR. Anthony G. Cohn, Proceedings of the Seventh Conference of the Society for the Study of Artificial Intelligence and Simulation of Behaviour London: Pitman/Morgan Kaufmann, 79-87.
Also in Evans & Gazdar (1990) The DATR Papers, Vol. 1, pp. 21-30. This paper provides a formal definition of a semantics for the core of the DATR language (value sequences and evaluable paths are not covered) and shows how this semantics can be modelled using finite state automata.
Roger Evans, Gerald Gazdar, (1990) The DATR Papers. Brighton: University of Sussex, Cognitive Science Research Paper CSRP 139,
This volume brings together all the early Sussex-sourced papers relating to DATR (each of which is listed sepately in this bibliography). Three of these papers have been published elsewhere, but, for the other four, this technical report is likely to remain the only source. In addition to these seven papers, the volume contains nine natural language DATR lexicon fragments (on Arabic, Baule, English, German, Japanese, Latin and Tem); eighteen formal DATR examples that illustrate a wide variety of representational techniques; and the complete Prolog source code for the Sussex DATR system.
Roger Evans, Gerald Gazdar, (1996) DATR: A language for lexical knowledge representation. Computational Linguistics 22, 2, 167-216.
This paper argues that DATR, though minimalist in conception, is sufficiently expressive to represent concisely the structure of lexical information at a variety of levels of linguistic analysis. The paper provides an informal example-based introduction to DATR and to techniques for its use, including finite state transduction, the encoding of DAGs and lexical rules, and the representation of ambiguity and alternation. Sample analyses of phenomena such as inflectional syncretism and verbal subcategorisation are given which show how the language can be used to squeeze out redundancy from lexical descriptions.
Roger Evans, Gerald Gazdar, Lionel Moser, (1993) Prioritised multiple inheritance in DATR. Ted Briscoe, Valeria de Paiva, Ann Copestake, Inheritance, defaults, and the lexicon Cambridge: Cambridge University Press, 38-46.
Also "Proceedings of the Acquilex Workshop on Default Inheritance in the Lexicon", Technical Report No. 238, University of Cambridge Computer Laboratory, October 1991. A notion of prioritised multiple inheritance (PMI) is characterised and contrasted with the more familiar orthogonal multiple inheritance (OMI). DATR was designed to facilitate OMI analyses of natural language lexicons: it contains no special purpose facility for PMI and this has led some researchers to conclude that PMI analyses are beyond the expressive capacity of DATR. Here, the authors present three different techniques for implementing PMI entirely within DATR's existing syntactic and semantic resources. In presenting them, they draw attention to their respective advantages and disadvantages.
Roger Evans, Gerald Gazdar, David Weir, (1994) Using default inheritance to describe LTAG. Colloque International sur les grammaires d'Arbres Adjoints (TAG+3), TALANA-RT-94-01, Paris: TALANA, Universite Paris VII, Jussieu,
The authors investigate how the set of elementary trees of a Lexicalized Tree Adjoining Grammar (LTAG) can be represented in DATR. DATR's default mechanism is used to eliminate the need for a non-immediate dominance relation in the descriptions of surface LTAG entries. This allows tree structures to be embedded in the feature theory in a manner reminiscent of HPSG subcategorization frames, and hence also allows lexical rules to be expressed as relations over feature structures.
Roger Evans, Gerald Gazdar, David Weir, (1995) Encoding lexicalized tree adjoining grammars with a nonmonotonic inheritance hierarchy. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics 77-84..
This paper shows how DATR can be used to define an LTAG lexicon as an inheritance hierarchy with internal lexical rules. A bottom-up featural encoding is used for LTAG trees and this allows lexical rules to be implemented as covariation constraints within feature structures. Such an approach eliminates the considerable redundancy otherwise associated with an LTAG lexicon.
Cecile Fabre, Anne Le Draoulec, (1992) Organisation s'un lexique bilingue pour les verbes Anglais et Francais en langage DATR. MA Project Report, Paris: Universite Paris VII, Jussieu,
Kerstin Fischer, (1993) Kompositionelle Semantik am Beispiel der englischen denominalen Nominalkomposita. MA Thesis, Bielefeld: University of Bielefeld,
Kerstin Fischer, (1996) Distributed representation formalisms for discourse particles. Dafydd Gibbon, Berlin: Mouton de Gruyter, Proceedings of KONVENS-96 212-224.
There are so far no descriptively and explanatorily adequate approaches to discourse particles. As spoken language phenomena, traditional representations either consist of enumerations of their possible readings, or of a single, very abstract invariant meaning, so that they are of no use for automatic speech processing. The present paper provides a conception, based on analyses of large corpora, in which different levels of generalization constitute a generative mechanism. Two formalisms, Construction Grammar (Fillmore and Kay 1995) and the inheritance formalism ILEX/DATR (Gibbon 1992), which support the distributed representation of discourse particles, will be discussed and evaluated according to the present purposes.
Kofi Folikpo, Dafydd Gibbon, Shu-chuan Tseng, (1997) Prosodic inheritance and phonetic interpretation: lexical tone. Unpublished manuscript, Bielefeld: University of Bielefeld,
Starting from the paradigm of finite state phonology, a theory of the phonetic interpretation of lexical tone in West African languages is presented. Phonetic interpretation is modelled as a three-level time-type mapping from lexical representations into prosodic event structures which define temporal relations between tonal events (relative time type), and from these into absolute temporal coordinates for pitch values (absolute time type). We show how quantitative pitch contours for lexical tone sequences can be predicted compositionally on the basis of lexical tone, segmental constraints, tone alternation rules, and an asymptotic syllable-based temporal function which differs in several respects from temporal functions proposed for pitch contours in intonation languages.
Norman Fraser, (1994) Derivational morphology in DATR: a new proposal. Guildford: University of Surrey, Unpublished manuscript,
This paper draws attention to a novel way of structuring the derivational morphology problem in DATR by mapping sequences of semantic attributes (interpreted as nested modifiers) into derived forms.
Norman Fraser, (1995) Practical DATR. University of Surrey, Unpublished paper,
Norman Fraser, Greville Corbett, (1995) Gender, animacy, and declensional class assignment: a unified account for Russian. Geert Booij, Jaap van Marle, Yearbook of Morphology 1994 Dordrecht: Kluwer, 123-150.
This paper extends the DATR analysis presented in Corbett and Fraser (1993) to allow for the complex interactions of meaning, gender, declensional class and phonology in the assignment of gender in Russian.
Norman Fraser, Greville Corbett, (1997) Defaults in Arapesh. Lingua 103, 25-57.
Network Morphology is a formally explicit approacxh to morphology which distributes information across a network in which generalizations can be optimally expressed. Generalizations become available in specific cases by the operation of default inheritance. The paper explores the notion of `default' in morphology by means of a Network Morphology analysis of the noun classes and genders of Arapesh -- a language which relies on a sophisticated understanding of defaults for a satisfactory treatment (Aronoff, 1992). The analysis lends support to Aronoff's account of the Arapesh data. It also reveals a confusion in the use of the term `default' by linguists. In one usage of the term, the (`normal case') default is that which applies in the absence of blocking information; in the other, the (`exceptional case') default is that which applies when some exceptional factors prevent normal processes from applying.
Gerald Gazdar, (1990) An introduction to DATR. Roger Evans, Gerald Gazdar, The DATR Papers Brighton: University of Sussex, Cognitive Science Research Paper CSRP 139, 1-14.
DATR is a declarative language for representing a restricted class of inheritance networks, permitting both multiple and default inheritance. The principal intended area of application is the representation of lexical entries for natural language processing. The goal of the DATR enterprise is the design of a simple language that (i) has the necessary expressive power to encode the lexical entries presupposed by contemporary work in the unification grammar tradition, (ii) can express all the evident generalizations about such entries, (iii) has an explicit theory of inference, (iv) is computationally tractable, and (v) has an explicit declarative semantics. The present paper sketches the brief history of default inheritance approaches to the lexicon and provides an an extended tutorial example.
Gerald Gazdar, (1990) Ceteris paribus. Brighton: University of Sussex, Unpublished paper,
This paper uses the morphology of Latin nouns as an example on which to base an extended informal introduction to the DATR language, concentrating on default inheritance and the rules of inference. An appendix provides a full DATR treatment of Latin noun morphology involving 5 declensions and 18 subdeclensions.
Gerald Gazdar, (1992) Paradigm function morphology in DATR. Lynne Cahill, Richard Coates, Sussex Papers in General and Computational Linguistics Brighton: University of Sussex, Cognitive Science Research Paper CSRP 239, 43-53.
This paper shows how Stump's "paradigm function" (PFM) approach to inflectional morphology can be implemented in DATR. PFM analyses can be encoded in DATR without any loss in concision over Stump's own notation, but with a great gain in generality, since Stump's notation is ad hoc to PFM analyses of inflectional morphology. DATR is thus to be preferred to Stump's own notation on general methodological grounds. An appendix suggests that there may also be analytical grounds for preferring DATR in view of the difficulties that the Swahili object agreement facts cause for Stump's notation.
Dafydd Gibbon, (1989) PCS-DATR: A DATR implementation in PC Scheme. English/Linguistics Interim Report No. 3, Bielefeld: University of Bielefeld,
This paper documents the Bielefeld PC-Scheme DATR implementation. The latter is a menu-directed, window-oriented DATR development environment based on an interpreter. The paper includes an informal review of DATR, a guide to the installation and use of PCS-DATR, a description of implementation-specific aspects of the interpreter, a high-level explanation of how it works, and a set of example files.
Dafydd Gibbon, (1990) Prosodic association by template inheritance. Walter Daelemans, Gerald Gazdar, Proceedings of the Workshop on Inheritance in Natural Language Processing 65-81. Tilburg: ITK (Institute for Language Technology & AI),
The domain of morphophonological structures in natural language lexica is notoriously difficult to describe with standard formal approaches. The morphoprosodic subdomain, i.e. lexical suprasegmental structure (stress, tone, vowel harmony, vowel and consonant mutation) is one of the hardest parts to model explicitly and in a linguistically adequate fashion. In this paper, two examples -- the standard "benchmark" examples of subsets of Kikuyu tone and Arabic binyan systems -- are selected, and a new approach to lexical prosody description (morphoprosody) using prosodic inheritance with defaults (PI) is described and implemented in DATR.
Dafydd Gibbon, (1991) Lexical signs and lexicon structure: phonology and prosody in the ASL-Lexicon. Bielefeld: University of Bielefeld, Verbundprojekt ASL-MEMO-20-91/UBI,
Dafydd Gibbon, (1992) ILEX: a linguistic approach to computational lexica. Ursula Klenk, Computatio Linguae: Aufsaetze zur algorithmischen und quantitativen Analyse der Sprache (Zeitschrift fuer Dialektologie und Linguistik, Beiheft 73) Stuttgart: Franz Steiner Verlag, 32-53.
The present paper is an attempt to identify some of the linguistic criteria for lexicon development, and to present an integrated approach which addresses not only the question of the structure of individual lexical entries, but also the issue of the structure of the lexicon as a whole. A particularly neglected area is the integrated representation of the morphological and morphophonological generalisations in the lexicon. The ILEX approach (Inheritance Lexicon with EXceptions) was developed with the aim of ameliorating this situation on the basis of explicit linguistic and computational criteria of adequacy. ILEX models are currently implemented in DATR.
Dafydd Gibbon, (1993) The lexical representation of prosody. Bielefeld: University of Bielefeld, ELSNET Summer School on Prosody course booklet,
This 92-page course booklet provides an introduction to prosody and its role in the lexicon, and covers criteria for lexical representation, structural stress in English Compounds, tone, and multi-linear morphology. The use of DATR for representing lexical prosody is discussed and extensive examples are given, drawn from Arabic, Yacouba, Kikuyu, Baule and Tem.
Dafydd Gibbon, (1993) Generalised DATR for flexible access: Prolog specification. Deliverable VM-TP5.3-D1, Bielefeld: University of Bielefeld,
A representation language with quantification over DATR theorem constituents, EDQL (Extended DATR Query Language) is introduced, with variables which also permit EDQL to be interfaced with Prolog and other formalisms by structure-sharing. The prototype implementation and applications are briefly described.
Dafydd Gibbon, (1994) Generalised DATR inference for lexicon development and interfacing. Unpublished manuscript, Bielefeld: University of Bielefeld,
Dafydd Gibbon, (1997) Compositionality in the inheritance lexicon: English nouns. Unpublished manuscript, Bielefeld: University of Bielefeld,
Dafydd Gibbon, Firmin Ahoua, (1991) DDATR: un logiciel de traitement d'heritage par defaut pour la modelisation lexicale. Cahiers Ivoiriens de Recherche Linguistique (CIRL) 27, 5-59.
The aim of this paper is to present the properties of DATR and directions for the use of the DDATR software for developing and testing DATR descriptions. The DATR language is capable of integrating recent developments in the lexical domain in linguistics and computational linguistics. It is presented as a means of formalising linguistic theories in the lexical domain in a homogeneous and explicit manner. It offers not only a means of expressing linguistically significant generalisations with respect to the criterion of descriptive adequacy, but also a means of testing the validity, the coherence and the exhaustivity of complex generalised lexical descriptions.
Dafydd Gibbon, Doris Bleiching, (1991) An ILEX model for German compound stress in DATR. Paper presented and distributed at the FORWISS-ASL Workshop on Prosody in Man-Machine Communication
This paper notes a number of conditions on German compound stress and suggests a description in terms of the ILEX (Integrated Lexicon with EXceptions) model.
Andrew Hippisley, (1994) Default inheritance and Russian word formation: An account of Russian denominal adjectives represented in DATR. Guildford: University of Surrey, Manuscript of paper presented to the Spring Meeting of the Linguistics Association of Great Britain, Salford,
Andrew Hippisley, (1995) Expressive derivation in Russian represented in DATR [abstract]. A.E. Kibrik, I.M. Kobozeva, A.I. Kuznecova, T.B. Nazarova, Linguistics at the end of the 20th century: Achievements and perspectives 1, Moscow: Filologiceskij fakultet MGU imeni M.V. Lomonosova, 525-526.
Andrew Hippisley, (1996) Russian expressive derivation: a Network Morphology account. The Slavonic and East European Review 74, 2, 201-222.
Andrew Hippisley, (1997) Declarative derivation: a Network Morphology account of Russian word formation with reference to nouns denoting `person'. Guildford: University of Surrey, PhD dissertation,
Sabine Jacob, (1993) Entwicklung eines DATR-Lexikons zur UCG-basierten Analyse natuerlichsprachlicher deutscher Saetze. Nuremberg: Friedrich Alexander University of Erlangen Nuremberg, MSc thesis,
Elizabeth Jenkins, (1990) Enhancements to the Sussex Prolog DATR implementation. Roger Evans, Gerald Gazdar, The DATR Papers Brighton: University of Sussex, Cognitive Science Research Paper CSRP 139, 41-61.
This paper describes a range of enhancements to the original (1989) Sussex Prolog DATR implementation. These include DATR declarations (for atoms, for nodes, and for theorem dumps); DATR variables (an abbreviatory notation); a procedural interface; and an interface that allows DATR queries to be expressed in DATR syntax.
Elizabeth Jenkins, (1990) Japanese verbs in DATR. Roger Evans, Gerald Gazdar, The DATR Papers Brighton: University of Sussex, Cognitive Science Research Paper CSRP 139, 73-78.
This short paper presents a DATR analysis of the morphology of the Japanese verbal system which covers the inflection of the 11 regular verb types and the 3 irregular verbs.
William Keller, (1995) DATR theories and DATR models. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics 55-62.
This paper presents a formal semantics for DATR which treats DATR theories as collections of function definitions.
William Keller, (1996) An evaluation semantics for DATR theories. COLING-96 646-651.
This paper describes an operational semantics for DATR theories that axiomatises the relationship between DATR expressions and their values. The inference rules provide a clearer picture of the way in which DATR works, and should lead to a better understanding of the mathematical and computational properties of the language.
James Kilbury, (1992) Strict inheritance and the taxonomy of lexical types in DATR. Duesseldorf: University of Duesseldorf, Unpublished manuscript (revised version to appear in 1994),
This paper describes a technique that allows one to assign lexical types represented by DATR nodes to individual DAGs associated with lexemes. The result is obtained by extending a highly restricted subclass of DATR theories to reflect the distinction between strict and defeasible information.
James Kilbury, (1993) Paradigm-based derivational morphology. Duesseldorf: University of Duesseldorf, Unpublished manuscript,
This paper sketches an approach to derivational morphology that is based on the notion of the paradigm and provides new possibilities for an integrated treatment of inflection and derivation. The principal innovation lies in the use of cross-subcategorization to describe derivational combinations. The notion of a derivational closure is also introduced. Advantages of the approach for computational morphology involve both the representation and the processing of derivational information. Primary attention is directed at derivational morphotactics.
James Kilbury, Petra Naerger, Ingrid Renz, DATR as a lexical component for PATR. (1991) Fifth Conference of the European Chapter of the Association for Computational Linguistics 137-142.
The representation of lexical entries requires special means which basic PATR systems do not include. The language DATR, however, can be used to define an inheritance network serving as the lexical component. The integration of such a module into an existing PATR system leads to various problems which are discussed together with possible solutions.
James Kilbury, Petra Naerger, Ingrid Renz, (1992) New lexical entries for unknown words. Duesseldorf: University of Duesseldorf, Unpublished manuscript,
This paper presents an approach for simulating the acquisition of new lexical entries for unknown words, an issue that is central to NLP since no lexicon can ever be complete. Acquisition involves two main tasks. First, the appropriate information about an unknown word in a given linguistic context (i.e. sentence) is identified. It is shown that this task requires new general considerations about shared information in unification based representations. Second, the collected information is formulated in a new lexical entry according to a comprehensive theory of the lexicon which defines the form of lexical entries and the relations between them. This task is solved by a general algorithm that depends only on the form of the collected information and is independent of the content, i.e. treats all unknown words the same way.
James Kilbury, Petra Barg, Ingrid Renz, (1994) Simulation lexicalischen Erwerbs. Sascha W. Felix, Christopher Habel, Gert Rickheit, Kognitive Linguistik: Repraesentation und Prozesse Opladen: Westdeutscher Verlag, 251-271.
This paper is a (German) descendant of Kilbury, Naerger & Renz (1992). It presents a model for the processing of unknown words and the acquisition of corresponding lexical entries. The linguistic model was formulated in the unification-based paradigm as a computer simulation with the system QPATR. The central assumption is that the processing of unknown words is subject to the same principles as that of natural language in general. It is shown how information about unknown words is accumulated during parsing: an independent component using a DATR-based model of the lexicon builds new lexical entries for the unknown words and integrates these entries in the existing lexicon.
Adam Kilgarriff, (1993) Inheriting verb alternations. Sixth Conference of the European Chapter of the Association for Computational Linguistics 213-221.
This paper shows how the verbal lexicon can be formalised in a way that captures and exploits generalisations about the alternation behaviour of verb classes. An alternation is a pattern in which a number of words share the same relationship between a pair of senses. The alternations captured are ones where the different senses specify different relationships between syntactic complements and syntactic arguments, as between "bake" in "John is baking the cake" and "The cake is baking". The formal language used is DATR. The lexical entries built are are those of HPSG. The complex alternation behaviour shared between families of verbs is elegantly represented in a way that makes generalizations explicit, and offers practical benefits to computational lexicographers.
Adam Kilgarriff, (1995) Inheriting polysemy. Patrick Saint-Dizier, Evelyne Viegas, Computational Lexical Semantics Cambridge: CUP, 319-335.
There are many patterns of variation in word sense, or `sense alternations', which apply to classes of words in English. A description of the lexical resources of a language would ideally make the alternations explicit, exploit the generalisations about them to give a concise representation, present them in a consistent and uniform manner, and indicate how they interact with each other and with other varieties of information to be stored in the lexicon. The paper presents dictionary data illustrating some facts and generalisations about sense alternations and shows how they can be expressed in DATR.
Adam Kilgarriff, Gerald Gazdar, (1995) Polysemous relations. Frank Palmer, Grammar and meaning: essays in honour of Sir John Lyons Cambridge: CUP, 1-25.
This paper uses DATR to represent polysemous relations such as those that hold between the fibre, yarn, cloth and garment senses of a lexeme like 'silk'. Such polysemous relations are pervasive in the lexicon, and yet their subregular character has only rarely been recognized.
Hagen Langer, (1992) DELASOUL: Eine constraintbasierte Bescreibungssprache fuer lexicalische Repraesentationen. Bielefeld: University of Bielefeld, Verbundprojekt ASL-TR-26-92/UBI,
Hagen Langer, (1994) Reverse queries in DATR. COLING-94 2, 1089-1095.
DATR is a declarative representation language for lexical information and as such, in principle, neutral with respect to particular processing strategies. Previous DATR compiler/interpreter systems suppport only one access strategy that closely resembles the set of inference rules of the procedural semantics of DATR. In this paper, we present an alternative access strategy (reverse query strategy) for a non-trivial subset of DATR.
Hagen Langer, (1994) DATR without nodes and global inheritance. Osnabrueck: Universitaet Osnabrueck, Unpublished manuscript,
This paper investigates which elements of the DATR language essentially contribute to its expressive capabilities and which are dispensable for the purposes DATR has been developed for. A subset of DATR is considered, called local path DATR (LDATR), that eliminates the concepts of node and global inheritance by redefining them in a pseudo-bootstrapping manner in terms of local path inheritance alone. For an arbitrary standard DATR theory D, there is an LDATR theory L such that each theorem of D corresponds to an equivalent theorem of L. This is shown by giving general translation rules which map an arbitrary standard DATR theory onto its LDATR counterpart. The main result of the paper is that restricting DATR to the rules of inference I and IV, yields a DATR-equivalent formalism (and thus also a Turing-equivalent one).
Hagen Langer, Dafydd Gibbon, (1992) DATR as a graph representation language for ILEX speech oriented lexica. Bielefeld: University of Bielefeld, Verbundprojekt ASL-TR-43-92/UBI,
An approach to computational morphology and morphophonology based on DATR, a task-oriented implementation (DDATR), and a task-oriented modelling convention (ILEX: Integrated Lexicon with EXceptions) are described and discussed in terms of their adequacy for linguistic modelling in the context of constraint-based, incremental, and maximally deterministic speech recognition. It is shown that the approach meets these specifications, while in the case of other approaches proposed for the same purpose, in particular typed feature structure formalisms with distributed disjunction, either the specifications are not met, or their properties in respect of the specifications have not been described and are unknown.
Marc Light, Sabine Reinhard, Marie Boyle-Hinrichs, (1993) INSYST: an automatic inserter system for hierarchical lexica. Sixth Conference of the European Chapter of the Association for Computational Linguistics 471.
When using hierarchical formalisms for lexical information, the need arises to insert (i.e., classify) lexical items into these hierarchies. This includes at least the following two situations: (1) testing generalizations when designing a lexical hierarchy; (2) transferring large numbers of lexical items from raw data files to a finished lexical hierarchy when using it to build a large lexicon. Up until now, no automated system for these insertion tasks existed. INSYST (INserter SYSTem) can efficiently insert lexical items under the appropriate nodes in hierarchies. It currently handles hierarchies specified in the DATR formalism. The system uses a classification algorithm that maximizes the number of inherited features for each entry.
Marc Light, (1994) Classification in feature-based default inheritance hierarchies. Harald Trost, Vienna: Oesterreichische Gesellschaft fuer Artificial Intelligence, Proceedings of KONVENS-94 220-229.
[Also appeared as Technical Report 473, Computer Science Department, University of Rochester, 1993.] When one works with a system that utilizes inheritance hierarchies the following problem often arises. A new object is introduced and it must be integrated into a hierarchy: under which classes in the hierarchy should the new object be positioned? In this paper, the problem is formalized for feature-based default inheritance hierarchies. Since it turns out to be NP-complete, an approximation for it is presented. This algorithm is shown to be efficient and some of the possible problematic situations for the algorithm are examined. Although more analysis and experimentation are needed, these preliminary results show that the algorithm warrants such efforts.
Harald Luengen, (1992) A DATR description of Turkish noun inflection. Unpublished paper, Bielefeld: University of Bielefeld,
Paul McFetridge, Aline Villavicencio, (1995) A hierarchical description of the Portuguese verb. Campinas: Proceedings of the XIIth Brazilian Symposium on Artificial Intelligence 302-311.
Inge Mertins, (1993) Lexical Semantics: an ILEX-DATR account of English verbs of cooking. MA Thesis, Bielefeld: University of Bielefeld,
Lionel Moser, Multiple inheritance in DATR: a quick tour. 100-104. Richard Dallaway, Teresa Del Soldato, Lionel Moser, The Fourth White House Papers: Graduate Research in the Cognitive and Computing Sciences at Sussex Brighton: University of Sussex, Cognitive Science Research Paper CSRP 200, (1991)
Inheritance hierarchies with multiple inheritance have long been studied in AI as structures which have the potential to permit default reasoning. When a class or instance inherits from multiple parents, conflicting theorems may be provable. DATR is a knowledge representation language which supports path-based multiple inheritance, but is restricted to deterministic inference. In general, path-based inheritance requires that the inheritance for a given path be uniquely specified. In this paper I outline some recent research on representing default multiple inheritance within the constraints of deterministic inference such as is used in recent NLP lexical inheritance representations.
Lionel Moser, DATR paths as arguments. (1992) Brighton: University of Sussex, Cognitive Science Research Paper CSRP 215,
DATR is a lexical knowledge representation language which is designed to support the lexicon in an NLP system. Its syntax and semantics are designed to support the types of inference required in computational lexicography. It was not a design intention of the language to support general logic programming, yet in this paper we show that the types of inference permitted in the language do support a general type of logical inference. Drawing an analogy with Prolog, both are declarative languages, and each has its own inference engine or theorem prover, which are quite different. DATR allows at least a subset of Prolog-definable logic programs to be encoded.
Lionel Moser, Lexical constraints in DATR. (1992) Brighton: University of Sussex, Cognitive Science Research Paper CSRP 216,
DATR contains no special features to support testing of equality, negation, disjunction, or multiple inheritance. Nevertheless, given an appropriate interpretation it is possible, within DATR's existing syntax and semantics, to represent these operations. In this paper we review the technique known as `negative path extension', and show how it can be used to reconstruct negation, disjunction, and equality testing. We then show how these operations can be used to define what are essentially meta-level constraints on DATR lexical derivation.
Lionel Moser, More multiple inheritance in DATR. Brighton: University of Sussex, (1992) Manuscript,
In this paper we discuss the representation in DATR of two multiple inheritance paradigms: (a) prioritized multiple inheritance, and (b) skeptical multiple inheritance. The former has been presented in earlier work; in this paper we extend that work and show that another multiple inheritance paradigm, skeptical multiple inheritance, is also recontructible in DATR.
Lionel Moser, Evaluation in DATR is co-NP-hard. Brighton: University of Sussex, Cognitive Science Research Paper CSRP 240, (1992)
A lower bound of co-NP for the time complexity of DATR query evaluation is established by showing that an NP-complete language can be recognized in DATR, and that its complement can be as well. An upper bound of co-NP is established as well, thus showing that the complexity of DATR query evaluation is co-NP.
Lionel Moser, Simulating Turing machines in DATR. Brighton: University of Sussex, Cognitive Science Research Paper CSRP 241, (1992)
This paper shows (i) how an arbitrary Turing machine can be simulated in DATR, (ii) that the computational complexity of DATR is Turing equivalent, and hence (iii) that the termination of DATR query evaluation is undecidable.
Martina Pampel, (1992) Die Repraesentation lexicalischen phonologischen Wissens am Beispiel der Wortbetonung. MA thesis, Bielefeld: University of Bielefeld,
Anna Poch, (1992) Representacion del conocimiento lexico: un analisis con DATR. Barcelona: University of Barcelona, PhD thesis,
This thesis shows how DATR may be used to encode a lexicon for Hudson's (1990) Word Grammar analysis of English.
Sabine Reinhard, (1990) Adaquatheitsprobleme automatenbasierter Morphologiemodelle am Beispiel der deutschen Umlautung. Trier: University of Trier, MA thesis,
Computational linguistic morphological models must not only be able to describe concatenation operations correctly but also more complex association operations (e.g. umlaut and word stress) as well as the conditions which hold for occurrence of these operations. Finite state models are criticised on the grounds of their linguistic inadequacy or fragmentary character. The thesis exploits Gibbon's DATR-based 'prosodic inheritance' (PI) approach to morphology and morphophonology, and applies it to inflectional and derivational umlauting in German nouns. The approach has the properties of compact lexical representation, integrated treatment of concatenation and association operations, and elegant description of complex dependencies between morphological operations and morphological and syntactic conditions.
Sabine Reinhard, (1990) Verarbeitungsprobleme nichtlinearer Morphologien: Umlautbeschreibung in einem hierarchischen Lexikon. Burghard Rieger, Burkhard Schaeder, Lexikon und Lexikographie Hildesheim: Olms Verlag, 45-61.
This article is a shortened version of the author's MA thesis on the adequacy problems of automaton-based morphological models.
Sabine Reinhard, Dafydd Gibbon, (1991) Prosodic inheritance and morphological generalisations. Fifth Conference of the European Chapter of the Association for Computational Linguistics 131-136.
Prosodic inheritance (PI) morphology provides uniform treatment of both concatenative and non-concatenative morphological and and phonological generalisations using default inheritance. Models of an extensive range of German Umlaut and Arabic intercalation facts, implemented in DATR, show that the PI approach also covers "hard cases" more homogeneously and more extensively than previous computational treatments.
Suzanne Wolting, (1996) Representation of verb-alternations in an inheritance-based lexicon. Nico Weber, Semantik, Leikographie und Computeranwendungen Tuebingen: Max Niemeyer Verlag, 245-259.
The paper documents the representation of three-place causative verbs of transition, which license the PP-dative-alternation, in an inheritance-based lexicon. The theoretical background of the research is provided by verb analyses by Claudia Kunze that are given in terms of "two-level semantics" following Bierwisch and Wunderlich. One major result is that aspectual features such as directionality are responsible for the alternations considered.
Klaus Zechner, (1995) Runtime access from a PATR-II grammar to lexical information in DATR. Unpublished paper, Edinburgh: Centre for Cognitive Science, University of Edinburgh,
Klaus Zechner, (1995) Building interfaces from DATR to PATR. Unpublished paper, Edinburgh: Centre for Cognitive Science, University of Edinburgh,
This paper describes the implementation of an interface which allows runtime queries from a PATR-II grammar to a precompiled DATR lexicon during the parse of a sentence. The lexical information requested by the PATR system is then unified to the PATR parse tree. Alternatively, there is the option to create all the possible dictionary entries from a DATR lexicon before parsing starts. These entries are stored in a file and can be loaded together with a PATR-II grammar. The main advantages of this interface are (i) that it allows the grammar writer to make use of the powerful multiple and default inheritance mechanisms that DATR provides, and thus to elimnate the redundancy, and (ii) that the lexical information stored as a DATR theory can be accessed at parser runtime if and only if needed.