AELalight

To make it a little easier to write XML files containing transliterations and translations for AELalign, there is a Perl program AELalight. As input it takes a file containing the header of an XML file (as specified by the DTD of AELalign), and a simplified format for combinations of transliterations and translations. As output it produces a complete XML file that can be used as input to AELalign.

In the input file, the header is separated from the rest by a line containing "###" and nothing else. The lines before the line containing "###" are simply copied to the output, except that all strings of the form "#DATE#" are replaced by the current date.

The lines following the line containing "###" will be divided into empty and non-empty lines, where a non-empty line is defined to be a line containing one or more visible characters (as opposed to spaces, tabs, etc). Let me call a sequence of one or more consecutive non-empty lines a paragraph.

There are three kinds of paragraph. The first consists of a single line starting with "version =", followed by a version name. For example:

version = R

This line would be replaced by

<coord version="R" pos="@anon"/>

in the XML file.

The second kind of paragraph looks like this:

zero or more lines of transliteration ; zero or more lines of translation

I will call this a transliteration/translation paragraph. The two parts of such a paragraph are separated by a line containing a semi-colon ";" and nothing else. The first part consists of zero or more lines of transliteration, and the second part consists of zero or more lines of translation. Such a paragraph is replaced by

<textal> zero or more lines of transliteration </textal> <texttr> zero or more lines of translation </texttr>

in the XML file.

Inside of the transliteration and the translation in the input file, there may be positions, written as <, followed by a digit, possibly followed by more characters other than >, followed by >. An example is

some translation with <1.2> a position in the middle

which will be replaced by

some translation with <coord pos="1.2"/> a position in the middle

in the XML file.

There is also a third kind of paragraph, viz. one that is not of any the above forms (it neither starts with "version =" nor includes a line containing a semi-colon ";" and nothing else). Such a normal-text paragraph cannot be interpreted as meaningful for the purposes of AELalign and is simply ignored. This kind of paragraph can be used for free comments or parts of the text that one does not want to include in the XML file as yet.

An example of the content of an input file is given below.

<?xml version="1.0"?> <!DOCTYPE resource SYSTEM "AELalign.0.1.dtd"> <resource> <created> Created by Mark-Jan Nederhof, #DATE#. </created> <header name="Nederhof" url="http://www.dfki.uni-sb.de/~nederhof/example.xml"> <p> This document contains stuff. </p> </header> ### version=R <1.1> s pw wn(.w) ; <1.1> There once was a man, This line will be ignored. ^xw.n-^jnpw rn=f ; called Khunanup<no>The meaning of this proper name is explained by Allen (p. 356).</no>. sxtj pw n ^sxt-HmAt ; He was a peasant of the Wadi-Natrun.

Note: How a text is partitioned into paragraphs is largely a matter of taste. One reasonable constraint though is that the paragraphs should be large enough so that the change in word order when going from Egyptian to the translation should be rendered within a single paragraph, or put in a different way, a word in the transliteration and its translation should occur in one and the same paragraph. Also, if a certain position is included in both transliteration and translation, the two occurrences of the position should be in one and the same paragraph.

Under Unix, AELalight can be called in a number of ways:

AELalight inputfile outputfile
AELalight inputfile > outputfile
AELalight < inputfile > outputfile
cat inputfile | AELalight > outputfile
etc.

One application of AELalight was for the study of the Eloquent Peasant.