Concevoir un langage artificiel: Syntaxe
Auteur: Rick Morneau
Date MS: 07-26-1994
Date FL: 05-01-2019
Numéro FL: FL-00005C-00
Citation: Morneau, Rick. 1994. “Designing an Artificial
Langue: Syntax.” FL-00005C-00, Fiat
Lingua,
2019.
droits d'auteur: © 1994 Rick Morneau. Ce travail est sous licence
sous une attribution Creative Commons-
Non Commercial-Pas de Dérivés 3.0 Licence non transférée.
http://creativecommons.org/licenses/by-nc-nd/3.0/
Fiat Lingua est produit et maintenu par la Language Creation Society (LCS). Pour plus d'informations
à propos du LCS, visitez http://www.conlang.org/
Designing an Artificial Language:
Syntaxe
by Rick Morneau
August, 1992
Revised July 26, 1994
Copyright © 1992, 1994 by Richard A. Morneau,
all rights reserved.
1.0 INTRODUCTION
This essay is aimed at budding language designers who would like to learn something about
syntax in general, and about some of the syntactic variability that exists among the world’s many
langues naturelles. It is also aimed at those who would like to have a tool that they can use to
describe the syntax of their creations. By no means is this essay intended to be comprehensive –
any such attempt would be so long that no one would want to read it. Besides, I’ve got a life to
en direct. 🙂
To get this thing off to a good start, I’ll first talk a little about some of the major features of
syntax as they actually exist in natural languages. Suivant, I’ll discuss a formalism that you can use
to describe the syntax of your artificial language (henceforth AL), and I’ll illustrate its use by
describing the syntax of a small fragment of English. Enfin, I’ll use the formalism to describe a
simple but powerful syntax for a hypothetical AL.
2.0 LINGUISTIC TYPOLOGY AND BASIC WORD ORDER
A lot of work is currently being done in the area of linguistic typology and universals. Essentiellement,
this area of research tries to find patterns that exist across all natural languages. Comme il s'avère,
many universals are not truly universal, since exceptions to most patterns can often be found.
These exceptions, cependant, often indicate that a language is undergoing a change from one
pattern to another. Aussi, the exceptions themselves exhibit patterns indicating that the forms that
they can take are controlled by even more subtle universals.
Par conséquent, these patterns or tendencies can often reveal a lot about how languages change
et, plus important encore, how change can be limited into following along certain channels. Le
forms that human languages can take are actually quite limited, and the study of universals can
show us what these limitations are.
A study of linguistic universals can also be helpful to AL designers. This is especially true if you
are trying to develop a language that is quite different from the natural languages you are
familiar with. A study of universals can help you design a language that ends up being speakable
1
and learnable; c'est à dire., one that is compatible with the grey stuff most of us have between our ears. Si
toi “break the rules”, pour ainsi dire, your language may end up being unlearnable in any real
sens, and will end up being just a coding game rather than a real language.
2.1 SUBJECT, VERBE, AND OBJECT
One of the first patterns that typologists look at is the basic relationship between subject (S), verbe
(V) and object (Ô) in simple declarative sentences. Determining these patterns is not always that
simple, because many languages are inflected in such a way that they have a great deal of
freedom in ordering their words. But even these languages will have some restrictions, ou sera
tend to have dominant, preferred or unmarked word orders.
There are six possible orderings: VSO, ALORS, SOV, VOS, OVS, and OSV. It turns out that a very
large majority of the world’s languages fit within the first three categories; c'est à dire., where the subject
comes before the object. Voici quelques exemples:
SOV – Turkish, Tamil, Japanese, Tibetan, Quechua
This is the largest single grouping, and probably
accounts for slightly more than 40% of all
langues.
Sample sentence: John fish ate.
ALORS – Anglais, Swahili, Chinois, Indonesian
This is also a very large group, although not quite
as large as SOV. It probably accounts for
slightly less than 40% of all languages.
Sample sentence: John ate fish.
VSO – Welsh, Hawaiian, Berber, Classical Arabic
This is not a very large group, but it is still quite
significant. It probably accounts for about 15%
Sample sentence: Ate John fish.
of all languages.
Note that in all of the above, the subject comes before the object – only the verb’s position
changes. Also note that the percentages are very approximate. (Jusque récemment, most linguists felt
that SVO was the most common type. Toutefois, more recent knowledge seems to give SOV a
slight lead.)
The other groups contain relatively few languages, and many of them are likely to be languages
you’ve never heard of. Here are a few examples:
OVS – Guarijio (Azteco-Tanoan family, Mexico)
Hixkaryana (Carib family, Brazil)
Sample sentence: Fish ate John.
2
VOS – Fijian (Austronesian family, Fiji)
Terena (Arawakan family, Brazil)
Malagasy (Austronesian family, Madagascar)
Sample sentence: Ate fish John.
ETC – Jamamadi (Arawakan family, Brazil)
Language of Yoda, Jedi Master (Unknown language
famille, Dagobah star system)
Sample sentence: Fish John ate.
Ainsi, if you want your AL to be as typologically “typical” as possible, you will need to make it
SOV. If you want to make it similar to most European languages, then use SVO. Toutefois, if you
don’t want to play favorites but still want something that’s easy to learn, you can choose the more
neutral VSO format (qui, incidentally, is my own personal favorite :-). VSO is also easier for
human brains and computers to parse. Enfin, if you’re more interested in the exotic or the weird,
choose one of the last three word orders.
2.2 RELATIVE CLAUSES
Relative clauses are embedded sentences that modify nouns. Consider the following English
exemple:
1. The shirt (ce) you want is on the bed.
In this sentence, the relative clause is “you want” and it modifies the noun “shirt”. (Le
parentheses indicate that the relative pronoun “ce” is optional.) There are two important
observations we should make about this example: first, le mot “shirt” est, en effet, the direct
object of the verb “vouloir”; and second, le verbe “vouloir” does not have an explicit direct object.
The position where an object would normally go is called a _gap_.
It’s also possible for relative clauses to modify nouns that correspond to positions other than
direct object in the clause. Voici quelques exemples:
2. The police caught the man who robbed the bank.
(Ici, “homme” is the effective subject of the verb
“robbed”.)
3. This is the hammer (qui) he broke the window with.
=This is the hammer with which he broke the window.
(Ici, “hammer” is the effective object of the
preposition “avec”)
4. They examined the room (qui) the fire started in.
3
=They examined the room in which the fire started.
(Ici, “chambre” is the effective object of the preposition
“in”)
And so on. En anglais, any verb argument (c'est à dire., sujet, objet, indirect object or object of a
preposition) can be relativized, and when this happens, a gap is left in the clause. Incidentally, comme
illustrated in examples 3 and 4 above, English has the extremely rare and confusing habit of
splitting a compound relative pronoun into two parts, moving the first half to the end of the
clause, and optionally dropping the second half. Ainsi, we can have “in which” ou “(qui)…dans”. je
strongly recommend that you NOT allow such splitting in your design if one of your goals is
ease-of-learning. Either use single words that are inherently unsplittable, or keep the pieces
ensemble.
Gaps in relative clauses appear to be required by slightly less than half of the world’s languages.
A slight majority, cependant, either do not allow gaps or severely restrict them. Par exemple,
Palestinian and Egyptian Arabic allow gaps only if the noun being modified is the effective
subject of the relative clause, as in example 2 above. For all other verb arguments, a gap is not
allowed and must be filled by what is called a _resumptive pronoun_. Here is what Arabic
versions of sentences 1, 3 and 4 would look like using English words and English word order:
1. The shirt that you want it is on the table.
3. This is the hammer that he broke the window with it.
4. They examined the room that the fire started in it.
Note that in all three sentences, the resumptive pronoun “il” is required for the Arabic sentences
to be grammatical.
[Incidentally, the above applies to Egyptian and Palestinian Arabic. In Standard Arabic, the use
of resumptive pronouns is optional. Toutefois, in Standard Arabic, the relative pronouns are
inflected for gender and number. En effet, the resumptive pronoun is built into the relative
pronoun.]
In other languages, such as Irish, the use of resumptive pronouns is optional. Ici, mais, le
relative pronoun that introduces the clause (tel que “OMS”, “qui”, “ce”, “in which”, “avec
qui”, etc.) differs depending on whether the clause contains a gap or a resumptive pronoun.
Par exemple, languages like this will have two words or phrases for “in which”: one for use
when the clause contains a gap, and the other for use when the clause contains a resumptive
pronoun. Malheureusement, I have no information on how widespread this phenomenon is.
Enfin, a very small minority of languages, such as Persian, not only allow the use of resumptive
pronoms, but also allow gaps to be filled by modified or coordinated nouns. Ainsi, ce qui suit
sentence would be perfectly grammatical in Persian:
The police caught the man who he and his wife robbed the
bank.
4
Toutefois, this usage seems to be very rare.
2.3 THE NOUN PHRASE
Another ordering that typologists are concerned with is the relationship between adjectives (UN)
and the nouns (N) they modify in noun phrases. (I also include simple and complex numbers
within the adjective group, tel que “five”, “between ten and twenty”, etc.) Since we’re only
talking about two items, “UN” et “N”, there are only two ways to order them: AN or NA.
Toutefois, we can complicate things by including so-called heavy modifiers such as relative
clauses, descriptive prepositional phrases (“girls with red hair”), and prepositional arguments of
nominalized verbs (“destruction of the city”). I will include all such heavy modifiers in the same
category as relative clauses (R), since they all seem to pattern in the same way. Here are some
examples:
NAR – Thai, French, Hebrew, Swahili
Sample phrase: men big who eat quiche
ANR – Quechua, Anglais, Persian, russe
Sample phrase: big men who eat quiche
ARN – aucun
Sample phrase: big quiche eat who men
NRA – aucun
Sample phrase: men who eat quiche big
RNA – Basque, Abkhaz, Burmese
Sample phrase: quiche eat who men big
RAN – Turkish, Tamil, Korean
Sample phrase: quiche eat who big men
In the sample sentences, I have placed the object “quiche” and the relative pronoun “OMS” dans
their most likely positions, based on the most likely branching direction for the type (I’ll have
more to say about branching later). It can, cependant, appear before or after the relative clause. Dans
fait, in some languages, such as Hindi and Bengali, the relative pronoun appears before the noun
being modified and the clause appears after it! For languages like these, our sample sentence
would look something like this: “big who men quiche eat”. (For purists or students of Hindi who
are reading this, the clause must also be followed by a demonstrative pronoun, which actually
makes the gloss more like “big who men quiche eat that”.)
Quite a few languages, such as Turkish and Quechua, do not use relative pronouns, but instead
nominalize the verb (c'est à dire., convert it to a participle) in the relative clause to achieve the same
5
effet. The resulting participle can be inflected for tense. Ainsi, an English phrase such as “garçons
who broke windows” would be glossed in Turkish as “windows broking boys”. Neat, huh?
Notice that no languages place a relative clause between an adjective and a noun (types ARN and
NRA). I believe that this is an honest-to-god linguistic universal. Disregard it at your own peril.
Incidentally, it is important to note that the above discussion applies only to languages
that have adjectives. Since many languages use stative verbs instead of adjectives, “R” et “UN”
would be part of a single category. Some examples of this are Indonesian (where modifiers
follow the noun), and Chinese (where modifiers precede the noun). Ainsi, an English phrase like
“that stupid boy who breaks windows” would be glossed in Indonesian as “boy which be_stupid
which break windows”. In Chinese, it would sound something like “break windows which be-
stupid which boy”. (I have much more to say about stative verbs in my (very long!)
monograph Lexical Semantics.)
The other components of the noun phrase that are of interest to typologists are called
_specifiers_, and include articles (le, À, un), demonstratives (ce, those, etc.), quantifiers (chaque,
tous, chaque, etc.) and possessives (mon, their, John’s, etc.). These are called specifiers because they
precisely pinpoint one or more referents from among a set of possible referents. Adjectives and
Nombres, cependant, only narrow down the number of choices by creating a subset, but without
specifying particular referents. Considérer, par exemple, the difference between “big black dogs”
et “those big black dogs”.
Specifiers undergo the same variability in word order as adjectives with one exception:
it is possible for a relative clause to appear between a specifier and the noun it modifies. Ainsi,
even though types ARN and NRA do not seem to exist, the types SRN and NRS (where S =
specifier) do occur. In Indonesian, par exemple, the expression “those men who just left” would
apparaître comme “men who just left those”. Aside from this idiosyncrasy, specifiers can appear either
before an adjective (as in English), after an adjective (as in Vietnamese), or in either position (comme
in Swahili).
Incidentally, relative clauses and prepositional phrases can act semantically as either general
modifiers (“boys who like basketball”) or specifiers (“boy with the scar over his left eye”).
Toutefois, I’m not aware of any language that reflects this difference in syntax.
Enfin, I hope I haven’t given the impression that I’ve covered all of the possible forms of noun
phrase and relative clause that can exist in natural language. We’ve seen a few unusual cases,
such as English’s splittable relative pronouns and Persian’s odd resumptive structure. But this is
only the tip of the iceberg.
2.4 BRANCHING DIRECTION
There are certain ordering relationships that are so common, and which have so few exceptions
that they almost certainly indicate something very basic about the way our brains process
langue. En fait, when languages are found that appear to contradict such orderings, it is usually
6
the case that they are undergoing a transition from one pattern to another and haven’t yet “settled
bas”.
Par exemple, in VSO languages, the subject and object follow the verb (by definition). But we
also find that specifiers, adjectifs, genitives and relative clauses almost always follow the nouns
ils modifient, that adverbs and adjectival arguments almost always follow the adjectives they
modifier, and that noun phrases almost always follow the prepositions that govern them. En d'autre
mots, with very few exceptions, modifiers and arguments almost always FOLLOW the words
they modify or are governed by in VSO languages. (Linguists would say that modifiers follow
their _heads_ in VSO languages.) For SOV languages, the principle applies just as rigorously,
but in the opposite direction; c'est à dire., modifiers and arguments almost always PRECEDE the words
ils modifient, and the equivalent of prepositions (called _postpositions_) are preceded by the noun
phrases they govern. Autrement dit, modifiers precede their heads in SOV languages. Un autre
common way of describing this ordering is to say that VSO languages are predominantly right-
branching, and that SOV languages are predominantly left-branching.
Malheureusement, the relationship breaks down when we consider SVO languages, comme l'anglais.
Are these languages undergoing a transition from one form to another, or have they become
stuck in some kind of cul-de-sac that is neither left nor right-branching? For languages with long
written histories, we can often answer this question, and the answer seems to be that SVO is
transitional. Par exemple, Old English (spoken before 1200 AD) was SOV and heavily inflected.
It lost most of its inflections and made the switch to SVO at about the same time.
En outre, a lot has been learned recently about word order patterns. Par conséquent, linguists can
often make intelligent guesses about where SVO languages came from, simply by looking at
where they are now. Predicting the future, cependant, is essentially impossible. Languages
certainly have inertia, but it’s doubtful they have momentum.
There are some things, mais, that seem to be fairly certain. All natural languages are constantly
changing – not just in word order, but also in ways that I haven’t discussed here, such as in
phonology, morphology and semantics. Changes in word order, mais, seem to be bounded by
deux “pure” ou “ideal” endpoints: SOV and VSO. This probably explains why there are so few
truly pure VSO and SOV languages. Since all languages are constantly changing, it’s much more
likely that any particular language will be wandering around somewhere between the endpoints.
The point for AL designers is this: As a language designer, you have total control over what your
language will be. But even you cannot predict where your language will go. And since it will be
a NEW creation, it cannot have a history. The only freedom you have is to decide where it will
commencer.
3.0 A FORMALISM FOR DESCRIBING SYNTAX
Initialement, it had been my intent to provide a brief description of a few of the most widely used
linguistic formalisms, and to show how they could be applied to the description of an artificial
langue. I actually started work on this project using Government/Binding Theory, ce qui est
7
currently the most popular formalism among professional linguists. Toutefois, I quickly realized
that even a _brief_ description would require several dozen pages for each theory, and I doubted
very much if many people would have the fortitude to read it all. Ainsi, I had to come up with
something simpler. Heureusement, we don’t really need to delve into current linguistic theories,
since they must deal with the complexities and oddities of natural languages. ALs tend to be
simpler and more regular. Besides, et peut-être plus important encore, current linguistic theories are
continuously moving targets, always subject to change without notice.
Ainsi, I’m going to limit myself to a somewhat restricted view of syntax. Tout d'abord, I will not
discuss agreement/unification aspects of syntax, since these are normally handled with inflection
which may not even be present in your AL. Aussi, inflection is a morphological process, et
representing it along with word-order would be messy, time-consuming, boring, and not very
utile. Ainsi, I will limit myself primarily to a discussion of word order and how to represent it.
With this goal in mind, I will describe how to use simple, context-free, phrase structure rules.
The notational system I will use will be extremely simple – I will use a modified version of
Backus-Naur Form (BNF) because it is more powerful and less confusing than the system
normally used by linguists. Those among you who have worked with BNF will not learn
anything new here, especially if you’ve ever studied or worked in compiler design. You may find
it somewhat odd, mais, to see BNF applied to human language.
Enfin, I apologize to those readers who might be expecting a more up-to-date treatment of
syntax in terms of X-Bar Theory. Although X-Bar Theory provides desirable constraints on the
syntax of a NATURAL language, it is highly abstract and I was afraid that a lot of readers would
be turned off if I used it. Aussi, it’s quite possible that some AL designers may want to design a
language that is NOT constrained as natural languages are. Phrase structure rules will give them
this freedom – X-Bar Theory will not. Toutefois, I have not entirely abandoned X-Bar Theory,
since my sample syntax in section 4 was designed within the framework of X-Bar Theory, même
though I do not discuss it in those terms.
3.1 THE SYNTAX OF A SIMPLE ENGLISH FRAGMENT
Our first goal will be to learn how to use BNF to describe a simple subset of English. I will not
try to develop a complete BNF description of the entire English language because it is not
nécessaire, it would take too long, and I don’t think it’s possible. Plutôt, I’ll only work with a
rather small subset of English, one that is no more complex than is absolutely necessary to
illustrate all of the features of BNF needed to design an AL syntax. To achieve this goal, I will
start with very simple sentences and gradually increase their complexity.
Incidentally, my purpose here is to illustrate BNF – NOT to teach linguistics! Par conséquent, some of
the analyses below are not as linguistically precise as I would like, but to do the job right would
have taken much more time and would have required lots of digression into areas that are not
really relevant here.
For starters, consider the following simple sentences:
8
Sailors cuss.
Dogs bark.
Billy fell.
These sentences consist of two words each: a noun and a verb. To describe the structure of these
sentences in BNF, we would write:
sentence ::= noun verb
which can be read as “a sentence consists of a noun immediately followed by a verb”. Toutefois,
verbs can also have objects, as in the following examples:
Children like puppies.
Louise kissed Jimmy.
Tornadoes destroy buildings.
We can extend the syntax to deal with direct objects as follows:
sentence ::= noun verb (nom)
which can be read as “a sentence consists of a noun followed by a verb which may in turn be
followed by an optional noun”. Ainsi, in our BNF notation, we will use parentheses to indicate
that an item is optional.
Maintenant, the nouns that precede and follow the verb can be more complex than shown above, un péché
the following sentences:
Children like cute little puppies.
Silly dogs bark.
Little Billy stutters.
For these cases, the noun is modified by one or more adjectives. Autrement dit, one or more
adjectives are optional. In BNF notation, we would write:
sentence ::= {adjectif} noun verb ({adjectif} nom)
Ici, the curly braces mean “zero or more”. Ainsi, our definition of a sentence now reads: “À
sentence consists of zero or more adjectives followed by a noun, then a verb, and then an
optional ‘thing’ which itself consists of zero or more adjectives followed by a noun”.
Maintenant, let’s make things a little more complicated by adding articles:
The little boys saw a big angry dog.
A sad little girl watched the birds.
An angry man shouted.
We can add the articles “À”, “un” et “le” to our BNF representation as follows:
9
sentence ::= (article) {adjectif} noun verb ((article)
{adjectif} nom)
Note though, that we are repeating ourselves in that the same “chose” appears on both sides of the
verbe. It consists of an optional article, zero or more adjectives, and a noun. Comme il s'avère, ce
“chose” has a name – it’s called a _noun phrase_. Let’s take it out of the definition of sentence and
describe it separately, as follows:
sentence ::= noun_phrase verb (noun_phrase)
noun_phrase ::= (article) {adjectif} nom
Our syntax now reads like this: A sentence consists of a noun phrase followed by a verb followed
by an optional noun phrase. A noun phrase consists of an optional article followed by zero or
more adjectives, followed by a noun.
English allows demonstrative adjectives (“ce”, “ce”, “these” et “those”), possessive
adjectifs (“mon”, “son”, “notre”, “their”, etc.) and quantifiers (“chaque”, “les deux”, “chaque”, “tous”, etc.) À
appear in place of an article, as in the following sentences:
My mother read this book.
Her angry dog bit Danny.
That cat ate both mice.
Every good student does his homework.
We can deal with this by redefining the noun phrase, as follows:
noun_phrase ::= (specifier) {adjectif} nom
specifier ::= article | démonstratif | possessive |
quantifier
Ici, the vertical bar “|” can be read as “ou”. Ainsi, a specifier can be an article or a
demonstrative or a possessive or a quantifier. Aussi, if we really want to be thorough, nous pouvons
refine the definition of possessives as follows:
possessive ::= possessive_adjective | proper_noun’s
which can be read as: a possessive can be a possessive adjective (tel que “son”, “your”, “mon”,
etc.) or a proper noun followed by apostrophe-s (tel que “John’s”, “Boston’s”, “IBM’s”, etc.)
Suivant, let’s add numbers to our syntax. En anglais, numbers can only appear between a specifier
and an adjective, as in the following noun phrases:
all five boys
these three girls
dix
the two fat lazy cats
our seven goldfish
Ainsi, to account for numbers in English, we have to modify our definition of the noun phrase as
follows:
noun_phrase ::= (specifier) (nombre) {adjectif} nom
Maintenant, what do we do about pronouns? Comme il s'avère, an entire noun phrase can be replaced by a
single pronoun, as illustrated in the following:
John and Billy ate the whole apple pie. They ate it.
(“Ils” = “John and Billy”, “il” = “the whole apple
pie”)
Five angry dogs chased those three foolish boys. They chased
eux.
(“Ils” = “Five angry dogs”, “eux” = “those three
foolish boys”)
And so on. En anglais, pronouns cannot be directly modified by articles (*the she),
demonstratives (*that it), possessives (*her him), quantifiers (*each she) or numbers (*five they).
Autrement dit, an English pronoun stands alone and is equivalent to an entire noun phrase. Ainsi,
to account for pronouns, we have to redefine the noun phrase as follows:
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (nombre) {adjectif} nom
Maintenant, let’s summarize by showing the entire syntax of the fragment of English that we’ve dealt
with so far:
sentence ::= noun_phrase verb (noun_phrase)
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (nombre) {adjectif} nom
specifier ::= article | démonstratif | possessive |
quantifier
possessive ::= possessive_adjective | proper_noun’s
Dans tout, we’ve got five definitions. Each of these definitions is called a _production rule_, since it
defines how a structure within the language can be produced. Ainsi, our fragment of English
consists (so far) of five production rules.
11
Toutefois, we’ve still got a long way to go. By now, mais, you should be getting the idea. Ainsi
let’s speed things up a bit by handling two additional items all at once: indirect objects and
prepositional phrases, as in the following example:
John gave the children candy in the back of the bus on the
way to the park.
Handling the indirect object “the children” is easy. We simply add an optional noun phrase to our
definition of sentence:
sentence ::= noun_phrase verb ((noun_phrase) noun_phrase)
Note that the indirect object is doubly nested inside the parentheses, indicating that it must
precede the direct object, and that an indirect object cannot occur without a direct object.
The prepositional phrases are not so easy, since we are really talking about two different kinds of
phrase: a sentential prepositional phrase and a noun-modifying prepositional phrase. The phrase
“in the back” indicates where the action took place. The phrase “of the bus” simply further
defines the noun “dos”. Ainsi, the phrase “in the back” is a sentential prepositional phrase, alors que
“of the bus” is a noun-modifying prepositional phrase. De la même manière, “on the way” is sentential since
it indicates when the action took place, alors que “to the park” modifies the noun “chemin”. We can
handle these phrases as follows:
sentence ::= subject verb (objets) {prepositional_phrase}
sujet ::= noun_phrase
objets ::= (noun_phrase) noun_phrase
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (nombre) {adjectif} nom
{prepositional_phrase}
prepositional_phrase ::= preposition noun_phrase
Note that I added the constituents “sujet” et “objets” to make things a little easier to read. À
keep things simple, we will consider compound prepositions such as “from under”, “up to”, “sur
top of”, etc.. as if they were single words.
Maintenant, if you look carefully, you’ll see that something unusual is happening here. A noun phrase
can contain a prepositional phrase which, à son tour, contains another noun phrase. This kind of
circularity is called _recursion_, and is one of the features of language that makes it so flexible.
Essentiellement, recursion occurs when a lower level structure, such as a prepositional phrase, est
defined in terms of a higher level structure, such as a noun phrase.
12
Another example of recursion is the embedded sentence, as illustrated in the following examples:
He told me (ce) he wanted a new job.
That he needed so much money worried his friends.
Bill knew (ce) she broke the window.
He told me (ce) Bill knew (ce) she broke the window.
Note that an embedded sentence can never occur as an indirect object – it can only be either the
subject or direct object. Note also that the conjunction “ce” is required for an embedded
sentence that is the subject of a verb, even though it is optional for the direct object. Ainsi, nous
can represent this kind of embedded sentence as follows:
sujet ::= noun_phrase | “ce” sentence
objets ::= (noun_phrase) direct_object
direct_object ::= noun_phrase | (“ce”) sentence
Another recursive structure is the relative clause. Consider the following sentences:
The boy who broke the window apologized.
I saw the man who robbed the bank.
The textbook (that/which) he bought had a chapter on
linguistics.
John played the piano (that/which) his brother gave him.
Note that there are two types of relative clause shown above. In the first type, the relative
pronoun “OMS” links a noun phrase to the subject of the clause. Ainsi, “the boy” is the effective
subject of “broke”, et “the man” is the effective subject of “robbed”. In this type, the relative
pronoun is required. In the second type, the relative pronoun links a noun phrase to the object of
the direct clause. Ainsi, “the textbook” is the effective direct object of the verb “bought”, et “le
piano” is the effective direct object of the verb “gave”. In this type, the relative pronoun is
optional.
Note also that both nouns and pronouns (par exemple., “He who laughs last laughs best”) can be modified
by relative clauses. Ainsi, the addition to our syntax is at a very low level, as follows:
noun_phrase ::=
modified_noun
| pronoun (relative_clause)
modified_noun ::=
(specifier) (nombre) {adjectif} nom
{prepositional_phrase} (relative_clause)
relative_clause ::=
13
relative_pronoun verb (objets) {prepositional_phrase}
| (relative_pronoun) subject verb direct_object
{prepositional_phrase}
where a relative pronoun can be either “ce”, “OMS” ou “qui”. I will not discuss how to handle
other types of relative clause that can be formed with relative pronouns such as “whose”, “avec
whom”, “to which”, etc., but will leave it as an exercise for the interested reader.
Anyway, you should now have a pretty good idea about how to describe the syntax of an AL
using BNF. I won’t go any further with my analysis of English, since things are getting kind of
messy already, and I don’t think there’s much to gain by going any further. For those who are
interested, here’s a summary of the syntax of the English fragment that we’ve just analyzed:
sentence ::= subject verb (objets) {prepositional_phrase}
sujet ::= noun_phrase | “ce” sentence
objets ::= (noun_phrase) direct_object
direct_object ::= noun_phrase | (“ce”) sentence
prepositional_phrase ::= preposition noun_phrase
noun_phrase ::=
modified_noun
| pronoun (relative_clause)
modified_noun ::=
(specifier) (nombre) {adjectif} nom
{prepositional_phrase} (relative_clause)
specifier ::= article | démonstratif | possessive |
quantifier
possessive ::= possessive_adjective | proper_noun’s
relative_clause ::=
relative_pronoun verb (objets) {prepositional_phrase}
| (relative_pronoun) subject verb direct_object
{prepositional_phrase}
14
4.0 THE SYNTAX OF A HYPOTHETICAL ARTIFICIAL LANGUAGE
In this section, I’ll describe a simple yet highly effective syntax for an AL. It encompasses all of
the basic features, including recursion, needed for a language to be totally functional. The fact
that it is so simple, cependant, means that this language will depend heavily on the lexicon to
provide capabilities that are sometimes provided by syntax. Par exemple, instead of changing
word order and using auxiliaries to change a statement to a question (as is done in English), ce
language will simply append a question particle to the end of the sentence (as is done in
Japanese). Autrement dit, the syntax will be simple at the expense of the lexicon. Comme il s'avère,
this approach is inherently more flexible because the lexicon is infinitely expandable while the
syntax is not.
Malheureusement, simplicity has a price, and that price is boredom. The syntax I am about to show
you is extremely and undeniably boring. Toutefois, I am only trying to illustrate a minimal
configuration. I’m sure that most AL designers would want to expand upon it and come up with a
design that’s a little more exciting.
4.1 BASIC PRODUCTION RULES
My sample language will be purely right-branching. C'est, basic word order will be verb-
subject-object (VSO), and arguments, modifiers and specifiers will always follow their heads
(and the head of a sentence will always be a verb – by definition). I choose VSO mainly because
it appeals to me and because it illustrates structures that are different from English. I also like it
because it is inherently easier to parse, although this is not all that important unless you plan to
write computer programs to parse it. Other than this, there are no inherent advantages or
disadvantages over other word-orders. As we’ve already seen, every possible basic word-order
(VSO, ALORS, SOV, VOS, OVS and OSV) has counterparts among natural languages, and one is
not inherently “better” than the others.
The form of the sentence in the sample language will be:
sentence ::= verb {verb_modifier} {verb_argument}
{sentence_particle}
verb_modifier ::= adverb | tense_marker | etc..
Verb modifiers would be words equivalent to English adverbs, tel que “rapidement”, “demain”,
“juste”, etc.. They would also handle the tense and aspect of the verb, such as past progressive (“il
was going”), future perfect (“he will have gone”), simple present (“he goes”), etc..
Sentence particles would be used to modify the nature of the sentence. Par exemple, they could
convert a statement to a question or command, or they could indicate the attitude of the speaker
towards what he is saying.
15
Verb arguments would correspond to English subjects, objects and objects of prepositions, et
would take the forms of noun phrases, adjective phrases, and embedded sentences:
verb_argument ::= (case_tag) expression {argument_particle}
expression ::= noun_phrase | adjective_phrase | sentence
A case tag would be equivalent to many English prepositions that introduce verbal arguments,
tel que “À” dans “I went to Boston”. Note, mais, that they can also introduce subordinate
clauses, un péché “I saw her when I was in Boston”. Ici, “quand” is the case tag. A case tag is
optional if the argument is the subject or object of the verb; c'est à dire., it is optional if it is part of the
argument structure of the verb.
The argument particle would perform a function similar to the sentence particle, but will apply
only to the argument it follows. Par exemple, it can convert the argument to an interrogative or
add emphasis to it. Let me illustrate what we’ve got so far, using English words:
English Sample language
I like Boston. Like I Boston.
I went to Boston. Go did I to Boston.
He doesn’t know I went to Boston. Know not he go did I to
Boston.
Doesn’t he know I went to Boston? Know not he go did I to
Which book did he buy? Buy did he book which?
Boston huh?
où “did” is a verb modifier indicating that the preceding verb is in the simple past tense,
“huh” is a sentence particle indicating that the entire sentence is to be interpreted as an
interrogatif, et “qui” is a particle that converts the preceding noun phrase to an
interrogatif. Noter que, for all particles, the scope of the particle will depend on the particular
particle. Autrement dit, each particle will define its own scope. Note also that a particle
inherently terminates the item it applies to, unless it is immediately followed by another particle
that applies to the same item.
In case you’re wondering, I’m making a basic assumption about the nature of verbs in the above
structure. The verb basically has two kinds of arguments: those like the subject and object that
are required by the verb, and those that are marked by case tags (prepositions in English,
postpositions in Korean, inflections in Hungarian, etc.). Arguments that are required by a verb
are part of the valency and thematic structure of the verb. (I have much more to say about this
topic in my (very long!) monograph on Lexical Semantics.)
Maintenant, let’s fill in some of the blanks:
noun_phrase ::= noun {simple_noun_modifier}
{complex_noun_modifier}
16
simple_noun_modifier ::= adjective_phrase | nombre
| article | démonstratif
| possessive | quantifier
complex_noun_modifier ::= noun_phrase_tag noun_phrase
| noun_clause_tag sentence
adjective_phrase ::= adjective {adjective_modifier}
{adjective_argument} {adjective_particle}
adjective_modifier ::= adverb {adverb_particle}
adjective_argument ::= adjective_phrase_tag noun_phrase
Note that noun phrase tags are the equivalent of English prepositions in constructions such as
“Bring the book with the red cover” ou “Get the book on the table”.
Noun clause tags are relative pronouns that introduce relative clauses. For the embedded
sentence, you can require that it contain a gap or a resumptive pronoun, as we discussed earlier.
Take your pick. The above syntax allows the use of resumptive pronouns, but does not require
eux.
Note that this syntax DOES allow the type of relative clause that we discussed earlier and which
occurs in Persian:
The police caught the man who he and his wife robbed the
bank.
Whether or not you implement it in your own design is entirely up to you. Keep in mind though,
that if you decide to disallow it, you will have to add to the existing production rules of your
langue. Autrement dit, by disallowing a structure that makes perfectly good linguistic sense,
your syntax will actually become more complex.
Adjective arguments would handle cases like English “red with fever” et “blowing in the
vent”. Note, mais, that this syntax allows these arguments to be used more directly than in
Anglais. Par exemple, if English had a counterpart to this form, one could make a sentence like
“The blowing in the wind kite hit him on the head”.
Adjective and adverb particles would include emphasizers and de-emphasizers such as English
“very”, “plutôt”, “not too”, “quite”, etc..
Word order is somewhat freer than English, especially in simple noun modifiers. This is
intentional. It allows continuous refinement without the need for clumsy prepositional phrasing,
as is the case in English. Par exemple, the expression “his three of those five black dogs” would
17
be rendered as “dogs black five those three his”, qui, with English word order, would sound
comme “his three those five black dogs”.
Enfin, note that case tags, noun phrase tags, noun clause tags and adjective phrase tags all
correspond to English prepositions. Toutefois, each will be distinct, unlike English prepositions
which often fill multiple roles, and which often lead to ambiguities. I’ll have more to say later
about such ambiguities and how to prevent them.
4.2 GENERAL RULES
Sometimes it’s better to separately state a rule that applies to all of the structures of a language,
rather than clutter up the production rules. Par exemple, coordination is best described as a
general rule of the form:
COORDINATION RULE:
Where it makes sense, any constituent may be replaced by a
coordinated constituent (Utilisant “et”, “ou”, “mais non”,
etc.). Ainsi, for a constituent “X”, the following applies:
X ::= X {coordinating_conjunction X}
where a coordinating conjunction can be “et”, “ou”, “mais”, “mais non”, etc.. With this general rule,
our sample language can now handle constituents such as “John and Bill”, “She washed and I
dried the dishes”, “He bought the steaks at Joey’s Market or at Safeway”, “She prepared the
drinks and I washed the dishes.”, etc.. Coordination poses some special problems for syntax, et
I’ll have more to say about it later.
Another rule involves the use of particles:
PARTICLE RULE:
Where it makes sense, any constituent may be followed by one
or more particles that adds to the meaning of the
constituent and/or terminates it.
This rule allows us to unclutter the production rules defined in the previous section by
eliminating all references to particles, and also gives us the flexibility of adding particles where
they were not explicitly allowed.
Another very useful general rule allows us to create complex structures that are the syntactic
equivalents of lexical compounds. Consider the following English sentences:
They held an over-the-fence conversation.
Made-in-USA and built-with-confidence labels must be
prominently displayed.
18
He needed a powerful pick-me-up.
Mercenaries-for-hire ads appeared in several places in the
magazine.
In order to implement these hyphenated complexes, we could create the following general rule:
COMPLEX STRUCTURE RULE:
Complex nouns, verbs, adjectives and adverbs can be created
with an introductory tag followed by a verb argument, comme
follows:
complex_X ::= X_tag verb_argument
where X can be “nom”, “verbe”, “adjectif” ou “adverb”. Pour
exemple:
complex_adjective ::= adjective_tag verb_argument
A complex X can be used wherever an X appears in the syntax.
Par exemple, a complex adjective can appear in the place of
an adjective. Aussi, as with other structures, a terminating
particle can be appended if necessary.
Note that these complex structures are similar in nature to compound words. The main difference
is that they have precise structure and can be as long as you want. Aussi, if your morphology is
self-segregating (as I discussed in my earlier essay on morphology), then you could remove the
spaces between the words in the complex, creating in effect a single, polysynthetic word.
4.3 THE ULTIMATE SYNTAX FOR AN AL
Here is a complete listing of the production rules and general rules discussed above:
sentence ::= verb {verb_modifier} {verb_argument}
verb_modifier ::= adverb | tense_marker | etc..
verb_argument ::= (case_tag) expression
expression ::= noun_phrase | adjective_phrase | sentence
noun_phrase ::= noun {simple_noun_modifier}
{complex_noun_modifier}
simple_noun_modifier ::= adjective_phrase | nombre
19
| article | démonstratif
| possessive | quantifier
complex_noun_modifier ::= noun_phrase_tag noun_phrase
| noun_clause_tag sentence
adjective_phrase ::= adjective {adjective_modifier}
{adjective_argument}
adjective_modifier ::= adverb
adjective_argument ::= adjective_phrase_tag noun_phrase
COORDINATION RULE:
Where it makes sense, any constituent may be replace by a
coordinated constituent (Utilisant “et”, “ou”, “mais non”,
etc.). Ainsi, for a constituent “X”, the following applies:
X ::= X {coordinating_conjunction X}
PARTICLE RULE:
Where it makes sense, any constituent may be followed by one
or more particles that adds to the meaning of the
constituent and/or terminates it.
COMPLEX STRUCTURE RULE:
Complex nouns, verbs, adjectives and adverbs can be created
with an introductory tag followed by a verb argument, comme
follows:
complex_X ::= X_tag verb_argument
where X can be “nom”, “verbe”, “adjectif” ou “adverb”. UN
terminating particle can be appended if necessary.
4.4 SAMPLE SENTENCES
Here are some sample sentences that illustrate the above rules. Le “À” sentences are normal
English sentences. Le “b” sentences are the equivalents of the “À” sentences using the new
syntax:
20
À. The dog bit the boy.
b. Bite did dog the boy the. (did = past tense verb
modifier)
À. Did the dog bite the boy?
b. Bite did dog the boy the huh? (huh = sentential
question particle)
À. The book is on the table.
b. Be book the on table the. (on = “place on” = case
OR
tag)
b. Be_on book the table the. (be_on = transitive
verbe)
À. Did the boy who broke the window
run away?
b. Run_away did boy the who break (who = agentive noun
did window the huh?
clause tag)
À. I hate reading technical manuals.
b. Hate I read (je) manual technical
lots. (lots = plural
specifier)
For those of you who are uncomfortable with this word order, you can get something closer to
English by moving verb modifiers, simple noun modifiers and adjective modifiers before the
word they modify, rather than after. Autrement dit, all light modifiers will precede their heads,
and all heavy modifiers will follow them. You’ll still have VSO word order, but it will be less
“pure”, and modification will be more like English.
5.0 MISCELLANEOUS TOPICS IN SYNTAX
In the following sections, I will discuss some of the loose ends that would not have fit well in any
of the preceding sections.
5.1 COORDINATION
Coordination is the linking together of two or more constituents with the same structure,
creating, en effet, a single complex structure. The linking elements are called _coordinating
conjunctions_. A language can have simple coordinating conjunctions, comme l'anglais “et”,
“ou” et “mais”, as well as compound ones, comme l'anglais “soit…ou…”, “les deux…et…” et “not
seulement…but also…and finally…”.
21
My reason for discussing coordination is that coordinated structures are often ambiguous, et
you may want to take steps to prevent such ambiguity in your design. Considérer, par exemple, le
following sentence:
John wants to buy the painting of the fireplace and the
Persian carpet.
Does John want to buy one item (a painting in which one can see both a carpet and a fireplace),
or two items (a carpet and a painting)? Note that the ambiguity is present only in the written
sentence. When spoken, a different stress, timing and intonational pattern (c'est à dire., a different
_prosodic_ pattern) would be used for each meaning.
As a language designer, you must decide whether this kind of ambiguity is to be allowed in your
AL. If your major goal is to create a _spoken_ language, then you may feel secure in ignoring the
problem, and let context and prosody resolve potential ambiguities. Si, cependant, the speakers of
your AL will have different native languages, then a solution based on prosody may not be
practical, since natural languages differ considerably in the ways they use intonation, stress and
timing. You can, bien sûr, define the prosodic features of your AL and force students to learn
eux, but in doing so, you will make your AL much harder to learn.
An additional consideration could be whether or not you want your AL to be computer-tractable.
C'est, are you trying to design an AL that is readily amenable to analysis on a computer? If you
do want your AL to be computer-tractable, then you don’t really have a choice. Malheureusement,
computers are not very good at resolving ambiguity, whether spoken or written.
Ainsi, assuming you do NOT want coordination ambiguities in your AL (pour quelque raison que ce soit), comment
can you prevent them? Comme il s'avère, one possible solution becomes readily apparent if we
slightly paraphrase our problem sentence, as follows:
John wants to buy both the painting of the fireplace and
the Persian carpet.
OR
John wants to buy not only the painting of the fireplace
but also the Persian carpet.
Autrement dit, use of complex coordinating conjunctions is inherently less ambiguous, parce que
each constituent that is being linked is explicitly identified.
Before presenting a general solution, cependant, let’s look at what happens when one coordinated
structure is embedded inside another one. Considérons l'exemple suivant:
John wants to buy the painting of the fireplace and the
Persian carpet or the rocking chair.
22
Évidemment, when you embed a coordinated structure inside another one, the potential for
ambiguity becomes even greater. Ainsi, if we want to eliminate any ambiguity, we must mark
both the beginning and the end of a coordinated structure. En outre, if three or more items are
coordinated, the inner one(s) cannot be marked in the same way as the outer ones. Ainsi, if we wish
to eliminate all possible ambiguity, regardless of the number of constituents or the depth of
embedding, then each conjunction must have three forms: a start form, a continuation form and
an end form. There are many ways to implement this. An easy one to learn would be to have a
root morpheme to indicate the basic meaning of the conjunction, and an affix to differentiate the
three forms. Par exemple:
bi- “commencer” affix ko- “continuer” affix
no affix – “fin” and let “kwa” = “and”
and our new coordination rule would look like this:
X ::= (coordinating_conjunction) X
{coordinating_conjunction X}
Exemples:
apples and oranges = (bikwa) apples kwa oranges
apples and oranges and pears and plums =
(bikwa) apples kokwa oranges kokwa pears kwa plums
As shown in the examples, you can specify in your syntax that the initial “bikwa” is optional
unless it is needed to prevent an ambiguity. This has two implications: 1. “bikwa” must ALWAYS
be used to commence an embedded coordinated structure; 2. quand “kwa” appears alone, il
always links the constituent that immediately follows it with the constituent that MOST
CLOSELY precedes it. (Incidentally, note that the affixes can be re-used in creating other words.
En fait, le “bi-” affix is very similar to Esperanto’s “ek-” affix.)
Maintenant, if these words existed in English, here’s how we would have written our problem sentence:
À. If the carpet is in the painting:
John wants to buy the painting of (bikwa) the fireplace
kwa the Persian carpet.
b. If the carpet and painting are separate:
John wants to buy bikwa the painting of the fireplace
kwa the Persian carpet.
In the first sentence, “bikwa” is optional, since the default interpretation is to link “the Persian
carpet” with the noun phrase that immediately precedes it, qui, dans ce cas, est “the fireplace”.
23
If the second sentence sounds too strange to you, simply replace “bikwa” with the English term
“not only”, et “kwa” with the English term “but also”. Ainsi:
John wants to buy not only the painting of the
fireplace but also the Persian carpet.
Étonnamment, English CAN have unambiguous coordination, even when one coordinated
structure is embedded in another, as in the following example:
He baked not one or two pies or even three or four pies
but actually ten pies!
In natural languages, mais, coordination syntax can be inconsistent, et, par conséquent,
ambiguous. As an AL designer, you can enforce consistency and totally eliminate the possibility
of coordination ambiguities, without in any way requiring unnatural syntax.
5.2 OTHER ATTACHMENT AMBIGUITIES
Coordinated structures are not the only ones that can suffer from ambiguity. Dans certains cas, À
syntactic marker can have more than one interpretation, as in the following sentence:
John saw the man with a telescope.
Did John use a telescope to see the man, or did the man have the telescope? Dans ce cas, le
problem is with the English word “avec”, which can be interpreted in the sense of either
_accompaniment_ or _instrument_. The easiest solution is simply to use different words for the
different meanings. Ainsi, your AL could have a case tag meaning “Utilisant” and a noun phrase tag
pour “accompanied by”.
There are other cases of ambiguity, cependant, where the solution is not as obvious or as simple.
Considérons l'exemple suivant:
John watched the man observing the crowd with a
telescope.
Ici, the telescope is clearly being used as an instrument, but by whom? It’s interesting and
perhaps instructive to look at how natural languages deal with this ambiguity. En anglais, toi
could rephrase the sentence as follows (assuming John has the telescope):
John watched with a telescope the man observing the
crowd.
I’m not even sure if this sentence is grammatical in English, but even so, there’s no reason why
you can’t make it grammatical in your AL. The problem, mais, is that we are FORCING the
speaker to use a different word order to prevent an ambiguity, and I don’t believe that this is a
24
practical solution. In my opinion, a design is flawed if structures are created to be used only
when ambiguity can occur. A much better solution would be to design the syntax and lexicon so
that the ambiguity can NEVER occur.
Comme il s'avère, there are several natural languages in which the above ambiguity would not
se produire. These languages mark the verb (usually by specially inflecting it) to indicate that an
instrumental object follows the direct object. In Swahili, par exemple, the above ambiguous
sentence would take one of the following two forms depending on its meaning:
John watched-using the man observing the crowd a
telescope.
OR
John watched the man observing-using the crowd a
telescope.
(Although Swahili does not have articles, I’ve included them so that the results sound more
natural to the English-speaking reader.) En effet, the instrumental marking on the verb changes
the argument structure of the verb, making it ditransitive with an instrumental second object.
This approach will solve the problem, but many designers may not like it, perhaps considering it
too exotic or difficult to learn. It would also get very messy if a verb has several arguments. Et
even natural languages that use this approach, such as Swahili and Mokilese, do it for only a
small number of case roles. Although interesting, this approach does not appear to be a practical,
general solution. (Aussi, I’ve often wondered how Swahili speakers perceive the use of this
construction. It appears to be inflectional and that’s the way linguists treat it. Toutefois, it doesn’t
allow one to utter an afterthought instrumental role, un péché “He broke the window….uh, with a
hammer”. Perhaps it’s not really inflectional, but derivational.)
Another way of solving this problem is to explicitly terminate a structure with a particle. Such a
particle would, en effet, be similar to a right parenthesis. Each structure would have its own
terminating particle (par exemple., one for sentences, one for verb arguments, one for relative clauses,
etc.) Such a particle would terminate the immediately preceding applicable structure and prevent
anything that follows from attaching to it. Ainsi, si “John” is using the telescope, our test sentence
would be something like this:
John watched the man observing the crowd end-sen
with a telescope.
où “end-sen” is a particle that terminates the immediately preceding (embedded) sentence.
Ainsi, “with a telescope” can only attach to the main verb “watched”. (I assume here that “avec”
is a case tag that can only attach to a verb – NOT a noun phrase tag that could attach to “homme”.)
Toutefois, I do not consider this a good solution for the reason I mentioned earlier – we are
adding something that is only used to prevent ambiguity, and which won’t be needed most of the
temps. Aussi, I have not been able to find a single example of this approach among natural
langues. If it does, En fait, violate linguistic universals, then it may be effectively unlearnable.
Autrement dit, any form of explicit “parenthesization” could only be uttered with forethought
25
and analysis – it could not be uttered automatically. I imagine that this is the essential difference
between a real language and a non-linguistic coding game.
Another possible solution to this problem is to use a different case tag (c'est à dire., preposition or
postposition) for each possible interpretation. Since this is a lexical solution rather than a
syntactic or morphological one, some designers may find it easier to swallow. One way to
implement it would be to specially mark the case tag (c'est à dire., have a distinct case tag) if it attaches
to the main verb of the sentence, and leave it unmarked if it attaches to the immediately
preceding verb (or vice-versa). Par exemple, an English gloss using this technique would sound
something like this:
John watched the man observing the crowd main-with a
telescope.
OR
John watched the man observing the crowd with a
telescope.
Noter que “avec” in the second sentence is not marked – it’s attachment defaults to the verb
“observing” since it is closest.
Toutefois, by marking the case tag to indicate that it attaches to the MAIN verb, you will not be
able to attach to a middle verb, as in the following triple embedding:
John saw the man who observed the lady who watched the
crowd with a telescope.
You can, bien sûr, provide markers for case tags that attach to inner verbs, but I wouldn’t bother.
(My feeling is that people who use sentences like this deserve to be misunderstood. 🙂 Note also
that this approach is very much like the one used in Swahili and Mokilese, except that the case
tag is marked rather than the verb, and that the inherent messiness of dealing with multiple case
roles is avoided.
Probably the best way, one which would work even with multiple embeddings, would be to mark
the case tag with a part of the verb it attaches to. Ainsi:
John saw the man who observed the lady who watched the
crowd saw-with a telescope.
John saw the man who observed the lady who watched the
crowd obs-with a telescope.
All of the above discussion provides fodder for some interesting observations about syntax in
general. Since language allows recursion, syntax is inherently two-dimensional. Malheureusement,
since words are uttered one after another, the speech stream is inherently linear. Ainsi, when we
are using language, we are trying to force a two-dimensional entity into a one-dimensional mold.
The result is sometimes ambiguity. Note though, that real ambiguity is quite rare. The human
brain can bring other weapons to bear on syntactic ambiguity, such as semantic context and
26
world knowledge. Par conséquent, if you are designing your language for exclusive use by
humains, you can afford to ignore this potential for syntactic ambiguity. Si, cependant, you want
your language to be computer-tractable, then you must deal with the problem.
5.3 DO WORDS HAVE SYNTAX?
In the preceding essay in this series, I discussed ways to design the surface morphology of an
artificial language. Comme il s'avère, syntax and morphology have quite a lot in common. En fait,
some linguists are have proposed theories that describe morphology as if it were essentially an
extension of syntax. In these theories, rules that apply to the relationship between heads and
modifiers in syntax also apply to the relationship between heads and modifiers in morphology.
The difference is that heads and modifiers in syntax are complete words, whereas in morphology
they are morphemes.
I am not going to try to describe these theories here, since they are both complex and somewhat
controversial. All I would like to do here is to very briefly show how the syntactic formalism
we’ve been using can be applied to the shapes of words.
If your AL is at all agglutinating, as seems to be true to some degree of most natural languages,
then you will want to be able to create new and more complex words by combining more basic
primitives. One way that this is commonly done is through a process called _derivational
morphology_, in which a basic or root concept is modified to create a new word by adding a
morphème. Par exemple, Anglais “sad” -> “sadden” -> “tristesse”, “critic” -> “critique” ->
“criticize”, etc.. I do not want to discuss the semantics of derivational morphology here, since I
discuss it in much greater detail in my monograph Lexical Semantics. Plutôt, I just want to
show you how we can apply the formalism described above to the shapes of words.
Let’s start by looking at _open class_ words. En anglais, these are words such as nouns,
adjectifs, verbs and most adverbs. They are called _open class_ because new words can enter
and leave these groups with relative ease. Their counterpart, _closed class_ words, are words like
Prépositions, particles and specifiers which change much more slowly. We could define an open
class word of a hypothetical AL as having the following “syntax”:
open_class_word ::= {modifier} root (classifier)
part_of_speech
Such a structure is called _morphotactic_ because it describes the way the morphemes are
arranged or “touche” each other. In the above example, a modifier would be a morpheme that
narrows down the possible interpretation of the root. Par exemple, a modifier could apply the
meaning of “femaleness” to the basic word “chien” to create the new word “bitch”. Different
modifiers would be used to create other distinctions such as between “mutt”, “puppy”,
“purebred”, “male dog”, etc.. For nouns, a classifier can indicate a position of the root in a
hierarchy. Par exemple, “chien” is in the class “mammifère”. For verbs, it can indicate the thematic
roles and argument structure. Par exemple, the words for “escape” et “libérer” would have the
27
same root and part of speech, but would have different classifiers, because one is inherently
reflexive and the other is inherently causative. Aussi, if each root has a default classifier
associated with it, then the classifier does not have to be specified. This will help somewhat to
keep common words relatively short.
We can also describe the phonological “forme” of the individual morphemes, which linguists
would call the _phonotactics_ of the morphemes. Par exemple:
modifier ::= CV
root ::= CVCC
classifier ::= VCC
part_of_speech ::= V
where C is a consonant and V is a vowel. Ainsi, if the modifier “mi” indique “femaleness”, le
root “temb” means “chien” within the class of mammals indicated by the classifier
“anc” (pronounce ‘c’ like ‘ch’ in ‘church’), et “o” is the part-of-speech indicator for nouns, alors
le mot “mitembanco” would mean “bitch”. If the default classifier for the root “temb” est “anc”,
alors “anc” can be dropped, shortening the result to “mitembo”.
You can extend your word syntax to include other kinds of words. Par exemple, an anaphor
could be created from any open class word by using only the classifier and part of speech. Ainsi,
the pronoun “il”, when referring to a female dog (or any mammal), would be “anco”. En fait, dans
this particular scheme, le mot “anco” would also be the word for “mammifère”.
Enfin, you can use different forms for closed class words. Par exemple, specifiers could have
the form:
specifier ::= {modifier} specifier_class
specifier_class ::= CSV
where S is a semivowel. Ici, one possible modifier could be a numeral, and the specifier class
could indicate ordinality. Ainsi, if the modifier “à” means “Sept”, et “son” indique
ordinality, then the word “honte” would mean “seventh”.
And so forth. I won’t say any more about word design here, since I will talk much more about it
in my monograph Lexical Semantics. I simply wanted to whet your appetite and show you how
words can in fact have “syntax”.
[Postscript: Many thanks to Jacques Guy and Dan Maxwell for their helpful comments!]
End of Essay
28