Master class, part of the BCN Summer School, July 1-12, 1996

lectures by (among others)

(Xerox Palo Alto Research Center)

**GIORGIO SATTA**

(University of Padua)

**ATRO VOUTILAINEN**

(University of Helsinki)

For any kind of mechanical processing of input, in whatever form, some kind of finite-state process is involved. Much theory has already been developed during the early days of computer science, much of it however very abstract, or at least not readily applicable to processing of natural language. The last few years have seen a surge of interesting publications that close the gap between the theory of finite-state techniques and the practice of computational linguistics.

During the course of our master class, students will be made familiar with
these new developments. Three prominent researchers will discuss a wide
range of topics, including some ideas just emerging in this field.
Apart from the lectures by the three invited speakers, some ongoing
research at the Humanities Computing Department (Alfa Informatica)
of the University of Groningen will be discussed, by **Gertjan van Noord**
and **Mark-Jan Nederhof**. To make the course
accessible to students without any prior knowledge of finite-state techniques,
we will start with some introductory lectures on formal language theory,
finite-state automata and transducers, regular languages, rational
transductions, etc.

A ``reader'' with papers from the speakers will become available during the event, price: DFL 5.

The master class will be held in the Educational Center of the University Hospital (AZG); room 15 (use the entrance at the Oostersingel and follow the signs to number 59). It comprises 5 sessions, each from 9:00 to 12:00. The schedule is:

**Mo, 8 July**- 9:00-12:00 Introduction to finite-state techniques (Nederhof)

**Tu, 9 July**- 9:00-12:00 FS techniques, properties and algorithms (Kay)

**We, 10 July**- 9:00-10:30 FS tree automata and transformation-based parsing (Satta)
- 10:30-12:00 FS transducers and constraint ranking (Satta)

**Th, 11 July**- 9:00-12:00 Surface-oriented reductionistic FS parsing (Voutilainen)

**Fr, 12 July**- 9:00-10:00 FS techniques in phonology (Kay)
- 10:00-11:00 A FS extension of context-free parsing (Nederhof)
- 11:00-12:00 Manipulation of FS automata implemented in Prolog (van Noord)

The registration fee for the master class also covers the other events of the summer school.

For more information concerning this master class, contact the coordinator

Mark-Jan NederhofFor registration and for more information concerning the summer school of which this master class is part, click here or contact

University of Groningen

Faculty of Arts

P.O. Box 716

NL-9700 AS Groningen

The Netherlands

markjan@let.rug.nl

Tel. +31-50-3635970

Fax. +31-50-3636855

Office of Graduate School BCN

Nijenburgh 4

NL-9747 AG Groningen

The Netherlands

bureau@bcn.rug.nl

Tel. +31-50-3634734

Fax. +31-50-3634740

## Martin Kay

The properties of classical finite-state automata and regular sets, as well as finite-state transducers and regular relations. Algorithms that implement the complete calculus of set theoretic operations on finite automata, plus some important additional ones such as minimization. Also algorithms for useful operations on finite transducers. The emphasis will be on methods that can be efficiently applied to large machines such as arise in phonology, morphology, and the lexicon.## Giorgio Satta

### Finite State Tree Automata and Transformation-Based Parsing

*(First lecture, 90min):*We present the paradigm of transformation-based parsing (Brill, 1993) and develop efficient parsing algorithms based on finite state tree automata.

This lecture will cover the following topics. Top-down tree automata, bottom-up tree automata. Tree regular expressions. Tree automata and tree pattern matching algorithms. Precomputation of tree transformations into tree automata. Transformation-based parsing algorithms. Overlapping redexes.

For the slides in postscript, see here.

### Finite State Transducers and Constraint Ranking

*(Second lecture, 90min):*We present the notion of constraint ranking as developed by recent phonological theories. Under the assumption that constraints are represented through regular expressions, we develop finite state transducer implementations of these theories.

This lecture will cover the following topics. Optimality Theory (Prince and Smolensky 1993). Constraint ranking and constraint violability. Optimality systems. Conditional intersection of regular languages. Computation of constraint violability through finite state transducers.

For the slides in postscript, see here.

## Atro Voutilainen

### Surface-oriented reductionistic finite-state parsing

These lectures (3 hours) outline recent work on FS parsing in Helsinki (Koskenniemi, Tapanainen, Voutilainen). Most of the attention is given to linguistic rather than algorithmic issues.

- Linguistic representation: morphological, syntactic and word boundary tags.
- Specification of grammatical representation. Grammar definition corpus.
- Representation of morphologically analysed (ambiguous) sentences and rules: regular expressions that are compiled into deterministic FSAs before parsing.
- Rule formalism: implication rules, rejection rules.
- After lexical analysis, parsing is reductionistic: illegitimate readings are discarded; no new readings are added. Parsing by intersection: all grammar automata are intersected with the (ambiguous) sentence automaton.
- How to write a realistic parsing grammar. Dealing with remaining ambiguities.

For the slides in postscript, see here.

## Gertjan van Noord

### FSA Utilities: Manipulation of Finite-state Automata implemented in Prolog

I will illustrate the FSA Utilities toolbox. The FSA Utilities toolbox is a Prolog implementation of a number of utilities to manipulate finite-state automata and finite-state transducers. Manipulations include determinization (both for finite-state recognizers and finite-state transducers), minimalization, composition, complementation, intersection, Kleene closure, etc. Furthermore various visualization tools are available to browse finite-state automata. The toolbox is implemented in SICStus Prolog.

The FSA Utilities toolbox is available under GNU General Public License from anonymous ftp. Alternatively you can obtain the toolbox via the World-wide Web. Click here or see ftp://ftp.let.rug.nl/pub/prolog-app/FSA/.

## Mark-Jan Nederhof

### A FS extension of context-free parsing

Context-free grammars have more generative capacity than finite-state formalisms. Regrettably, the complexity of parsing is non-linear in the general case. An exception is the subclass of deterministic context-free languages, which allow linear-time parsing. This property is very restrictive however. I show that this constraint can be considerably relaxed without the linear-time complexity being affected.

An event with related topics is Workshop on Extended finite state models of language at ECAI-96, Budapest, Hungary, August 11-12, 1996.