Designing an Artificial Language: Syntax
Author: Rick Morneau
MS Date: 07-26-1994
FL Date: 05-01-2019
FL Number: FL-00005C-00
Citation: Morneau, Rick. 1994. «Designing an Artificial
Language: Syntax.» FL-00005C-00, Fiat
Lingua,
2019.
Copyright: © 1994 Rick Morneau. This work is licensed
under a Creative Commons Attribution-
NonCommercial-NoDerivs 3.0 Unported License.
http://creativecommons.org/licenses/by-nc-nd/3.0/
Fiat Lingua is produced and maintained by the Language Creation Society (LCS). For more information
about the LCS, visit http://www.conlang.org/
Designing an Artificial Language:
Syntax
by Rick Morneau
August, 1992
Revised July 26, 1994
Copyright © 1992, 1994 by Richard A. Morneau,
all rights reserved.
1.0 INTRODUCTION
This essay is aimed at budding language designers who would like to learn something about
syntax in general, and about some of the syntactic variability that exists among the world’s many
natural languages. It is also aimed at those who would like to have a tool that they can use to
describe the syntax of their creations. By no means is this essay intended to be comprehensive –
any such attempt would be so long that no one would want to read it. Besides, I’ve got a life to
live. 🙂
To get this thing off to a good start, I’ll first talk a little about some of the major features of
syntax as they actually exist in natural languages. Next, I’ll discuss a formalism that you can use
to describe the syntax of your artificial language (henceforth AL), and I’ll illustrate its use by
describing the syntax of a small fragment of English. Finally, I’ll use the formalism to describe a
simple but powerful syntax for a hypothetical AL.
2.0 LINGUISTIC TYPOLOGY AND BASIC WORD ORDER
A lot of work is currently being done in the area of linguistic typology and universals. Basically,
this area of research tries to find patterns that exist across all natural languages. As it turns out,
many universals are not truly universal, since exceptions to most patterns can often be found.
These exceptions, however, often indicate that a language is undergoing a change from one
pattern to another. Also, the exceptions themselves exhibit patterns indicating that the forms that
they can take are controlled by even more subtle universals.
Consequently, these patterns or tendencies can often reveal a lot about how languages change
and, more importantly, how change can be limited into following along certain channels. The
forms that human languages can take are actually quite limited, and the study of universals can
show us what these limitations are.
A study of linguistic universals can also be helpful to AL designers. This is especially true if you
are trying to develop a language that is quite different from the natural languages you are
familiar with. A study of universals can help you design a language that ends up being speakable
1
and learnable; i.e., one that is compatible with the grey stuff most of us have between our ears. If
you «break the rules», so to speak, your language may end up being unlearnable in any real
sense, and will end up being just a coding game rather than a real language.
2.1 SUBJECT, VERB, AND OBJECT
One of the first patterns that typologists look at is the basic relationship between subject (S), verb
(V) and object (O) in simple declarative sentences. Determining these patterns is not always that
simple, because many languages are inflected in such a way that they have a great deal of
freedom in ordering their words. But even these languages will have some restrictions, or will
tend to have dominant, preferred or unmarked word orders.
There are six possible orderings: VSO, SVO, SOV, VOS, OVS, and OSV. It turns out that a very
large majority of the world’s languages fit within the first three categories; i.e., where the subject
comes before the object. Here are some examples:
SOV – Turkish, Tamil, Japanese, Tibetan, Quechua
This is the largest single grouping, and probably
accounts for slightly more than 40% of all
languages.
Sample sentence: John fish ate.
SVO – English, Swahili, Chinese, Indonesian
This is also a very large group, although not quite
as large as SOV. It probably accounts for
slightly less than 40% of all languages.
Sample sentence: John ate fish.
VSO – Welsh, Hawaiian, Berber, Classical Arabic
This is not a very large group, but it is still quite
significant. It probably accounts for about 15%
Sample sentence: Ate John fish.
of all languages.
Note that in all of the above, the subject comes before the object – only the verb’s position
changes. Also note that the percentages are very approximate. (Until recently, most linguists felt
that SVO was the most common type. However, more recent knowledge seems to give SOV a
slight lead.)
The other groups contain relatively few languages, and many of them are likely to be languages
you’ve never heard of. Here are a few examples:
OVS – Guarijio (Azteco-Tanoan family, Mexico)
Hixkaryana (Carib family, Brazil)
Sample sentence: Fish ate John.
2
VOS – Fijian (Austronesian family, Fiji)
Terena (Arawakan family, Brazil)
Malagasy (Austronesian family, Madagascar)
Sample sentence: Ate fish John.
OSV – Jamamadi (Arawakan family, Brazil)
Language of Yoda, Jedi Master (Unknown language
family, Dagobah star system)
Sample sentence: Fish John ate.
Thus, if you want your AL to be as typologically «typical» as possible, you will need to make it
SOV. If you want to make it similar to most European languages, then use SVO. However, if you
don’t want to play favorites but still want something that’s easy to learn, you can choose the more
neutral VSO format (which, incidentally, is my own personal favorite :-). VSO is also easier for
human brains and computers to parse. Finally, if you’re more interested in the exotic or the weird,
choose one of the last three word orders.
2.2 RELATIVE CLAUSES
Relative clauses are embedded sentences that modify nouns. Consider the following English
example:
1. The shirt (that) you want is on the bed.
In this sentence, the relative clause is «you want» and it modifies the noun «shirt». (The
parentheses indicate that the relative pronoun «that» is optional.) There are two important
observations we should make about this example: first, the word «shirt» is, in effect, the direct
object of the verb «want»; and second, the verb «want» does not have an explicit direct object.
The position where an object would normally go is called a _gap_.
It’s also possible for relative clauses to modify nouns that correspond to positions other than
direct object in the clause. Here are some examples:
2. The police caught the man who robbed the bank.
(Here, «man» is the effective subject of the verb
«robbed».)
3. This is the hammer (which) he broke the window with.
=This is the hammer with which he broke the window.
(Here, «hammer» is the effective object of the
preposition «with»)
4. They examined the room (which) the fire started in.
3
=They examined the room in which the fire started.
(Here, «room» is the effective object of the preposition
“in»)
And so on. In English, any verb argument (i.e., subject, object, indirect object or object of a
preposition) can be relativized, and when this happens, a gap is left in the clause. Incidentally, as
illustrated in examples 3 and 4 above, English has the extremely rare and confusing habit of
splitting a compound relative pronoun into two parts, moving the first half to the end of the
clause, and optionally dropping the second half. Thus, we can have «in which» or «(which)…in». I
strongly recommend that you NOT allow such splitting in your design if one of your goals is
ease-of-learning. Either use single words that are inherently unsplittable, or keep the pieces
together.
Gaps in relative clauses appear to be required by slightly less than half of the world’s languages.
A slight majority, however, either do not allow gaps or severely restrict them. For example,
Palestinian and Egyptian Arabic allow gaps only if the noun being modified is the effective
subject of the relative clause, as in example 2 above. For all other verb arguments, a gap is not
allowed and must be filled by what is called a _resumptive pronoun_. Here is what Arabic
versions of sentences 1, 3 and 4 would look like using English words and English word order:
1. The shirt that you want it is on the table.
3. This is the hammer that he broke the window with it.
4. They examined the room that the fire started in it.
Note that in all three sentences, the resumptive pronoun «it» is required for the Arabic sentences
to be grammatical.
[Incidentally, the above applies to Egyptian and Palestinian Arabic. In Standard Arabic, the use
of resumptive pronouns is optional. However, in Standard Arabic, the relative pronouns are
inflected for gender and number. In effect, the resumptive pronoun is built into the relative
pronoun.]
In other languages, such as Irish, the use of resumptive pronouns is optional. Here, though, the
relative pronoun that introduces the clause (such as «who», «which», «that», «in which», «with
which», etc.) differs depending on whether the clause contains a gap or a resumptive pronoun.
For example, languages like this will have two words or phrases for «in which»: one for use
when the clause contains a gap, and the other for use when the clause contains a resumptive
pronoun. Unfortunately, I have no information on how widespread this phenomenon is.
Finally, a very small minority of languages, such as Persian, not only allow the use of resumptive
pronouns, but also allow gaps to be filled by modified or coordinated nouns. Thus, the following
sentence would be perfectly grammatical in Persian:
The police caught the man who he and his wife robbed the
bank.
4
However, this usage seems to be very rare.
2.3 THE NOUN PHRASE
Another ordering that typologists are concerned with is the relationship between adjectives (A)
and the nouns (N) they modify in noun phrases. (I also include simple and complex numbers
within the adjective group, such as «five», «between ten and twenty», etc.) Since we’re only
talking about two items, «A» and «N», there are only two ways to order them: AN or NA.
However, we can complicate things by including so-called heavy modifiers such as relative
clauses, descriptive prepositional phrases («girls with red hair»), and prepositional arguments of
nominalized verbs («destruction of the city»). I will include all such heavy modifiers in the same
category as relative clauses (R), since they all seem to pattern in the same way. Here are some
examples:
NAR – Thai, French, Hebrew, Swahili
Sample phrase: men big who eat quiche
ANR – Quechua, English, Persian, Russian
Sample phrase: big men who eat quiche
ARN – none
Sample phrase: big quiche eat who men
NRA – none
Sample phrase: men who eat quiche big
RNA – Basque, Abkhaz, Burmese
Sample phrase: quiche eat who men big
RAN – Turkish, Tamil, Korean
Sample phrase: quiche eat who big men
In the sample sentences, I have placed the object «quiche» and the relative pronoun «who» in
their most likely positions, based on the most likely branching direction for the type (I’ll have
more to say about branching later). It can, however, appear before or after the relative clause. In
fact, in some languages, such as Hindi and Bengali, the relative pronoun appears before the noun
being modified and the clause appears after it! For languages like these, our sample sentence
would look something like this: «big who men quiche eat». (For purists or students of Hindi who
are reading this, the clause must also be followed by a demonstrative pronoun, which actually
makes the gloss more like «big who men quiche eat that”.)
Quite a few languages, such as Turkish and Quechua, do not use relative pronouns, but instead
nominalize the verb (i.e., convert it to a participle) in the relative clause to achieve the same
5
effect. The resulting participle can be inflected for tense. Thus, an English phrase such as «boys
who broke windows» would be glossed in Turkish as «windows broking boys». Neat, huh?
Notice that no languages place a relative clause between an adjective and a noun (types ARN and
NRA). I believe that this is an honest-to-god linguistic universal. Disregard it at your own peril.
Incidentally, it is important to note that the above discussion applies only to languages
that have adjectives. Since many languages use stative verbs instead of adjectives, «R» and «A»
would be part of a single category. Some examples of this are Indonesian (where modifiers
follow the noun), and Chinese (where modifiers precede the noun). Thus, an English phrase like
«that stupid boy who breaks windows» would be glossed in Indonesian as «boy which be_stupid
which break windows». In Chinese, it would sound something like «break windows which be-
stupid which boy». (I have much more to say about stative verbs in my (very long!)
monograph Lexical Semantics.)
The other components of the noun phrase that are of interest to typologists are called
_specifiers_, and include articles (the, a, an), demonstratives (this, those, etc.), quantifiers (each,
all, every, etc.) and possessives (my, their, John’s, etc.). These are called specifiers because they
precisely pinpoint one or more referents from among a set of possible referents. Adjectives and
numbers, however, only narrow down the number of choices by creating a subset, but without
specifying particular referents. Consider, for example, the difference between «big black dogs»
and «those big black dogs».
Specifiers undergo the same variability in word order as adjectives with one exception:
it is possible for a relative clause to appear between a specifier and the noun it modifies. Thus,
even though types ARN and NRA do not seem to exist, the types SRN and NRS (where S =
specifier) do occur. In Indonesian, for example, the expression «those men who just left» would
appear as «men who just left those». Aside from this idiosyncrasy, specifiers can appear either
before an adjective (as in English), after an adjective (as in Vietnamese), or in either position (as
in Swahili).
Incidentally, relative clauses and prepositional phrases can act semantically as either general
modifiers («boys who like basketball») or specifiers («boy with the scar over his left eye»).
However, I’m not aware of any language that reflects this difference in syntax.
Finally, I hope I haven’t given the impression that I’ve covered all of the possible forms of noun
phrase and relative clause that can exist in natural language. We’ve seen a few unusual cases,
such as English’s splittable relative pronouns and Persian’s odd resumptive structure. But this is
only the tip of the iceberg.
2.4 BRANCHING DIRECTION
There are certain ordering relationships that are so common, and which have so few exceptions
that they almost certainly indicate something very basic about the way our brains process
language. In fact, when languages are found that appear to contradict such orderings, it is usually
6
the case that they are undergoing a transition from one pattern to another and haven’t yet «settled
down».
For example, in VSO languages, the subject and object follow the verb (by definition). But we
also find that specifiers, adjectives, genitives and relative clauses almost always follow the nouns
they modify, that adverbs and adjectival arguments almost always follow the adjectives they
modify, and that noun phrases almost always follow the prepositions that govern them. In other
words, with very few exceptions, modifiers and arguments almost always FOLLOW the words
they modify or are governed by in VSO languages. (Linguists would say that modifiers follow
their _heads_ in VSO languages.) For SOV languages, the principle applies just as rigorously,
but in the opposite direction; i.e., modifiers and arguments almost always PRECEDE the words
they modify, and the equivalent of prepositions (called _postpositions_) are preceded by the noun
phrases they govern. In other words, modifiers precede their heads in SOV languages. Another
common way of describing this ordering is to say that VSO languages are predominantly right-
branching, and that SOV languages are predominantly left-branching.
Unfortunately, the relationship breaks down when we consider SVO languages, such as English.
Are these languages undergoing a transition from one form to another, or have they become
stuck in some kind of cul-de-sac that is neither left nor right-branching? For languages with long
written histories, we can often answer this question, and the answer seems to be that SVO is
transitional. For example, Old English (spoken before 1200 AD) was SOV and heavily inflected.
It lost most of its inflections and made the switch to SVO at about the same time.
Furthermore, a lot has been learned recently about word order patterns. As a result, linguists can
often make intelligent guesses about where SVO languages came from, simply by looking at
where they are now. Predicting the future, however, is essentially impossible. Languages
certainly have inertia, but it’s doubtful they have momentum.
There are some things, though, that seem to be fairly certain. All natural languages are constantly
changing – not just in word order, but also in ways that I haven’t discussed here, such as in
phonology, morphology and semantics. Changes in word order, though, seem to be bounded by
two «pure» or «ideal» endpoints: SOV and VSO. This probably explains why there are so few
truly pure VSO and SOV languages. Since all languages are constantly changing, it’s much more
likely that any particular language will be wandering around somewhere between the endpoints.
The point for AL designers is this: As a language designer, you have total control over what your
language will be. But even you cannot predict where your language will go. And since it will be
a NEW creation, it cannot have a history. The only freedom you have is to decide where it will
start.
3.0 A FORMALISM FOR DESCRIBING SYNTAX
Originally, it had been my intent to provide a brief description of a few of the most widely used
linguistic formalisms, and to show how they could be applied to the description of an artificial
language. I actually started work on this project using Government/Binding Theory, which is
7
currently the most popular formalism among professional linguists. However, I quickly realized
that even a _brief_ description would require several dozen pages for each theory, and I doubted
very much if many people would have the fortitude to read it all. So, I had to come up with
something simpler. Fortunately, we don’t really need to delve into current linguistic theories,
since they must deal with the complexities and oddities of natural languages. ALs tend to be
simpler and more regular. Besides, and perhaps more importantly, current linguistic theories are
continuously moving targets, always subject to change without notice.
So, I’m going to limit myself to a somewhat restricted view of syntax. First of all, I will not
discuss agreement/unification aspects of syntax, since these are normally handled with inflection
which may not even be present in your AL. Also, inflection is a morphological process, and
representing it along with word-order would be messy, time-consuming, boring, and not very
useful. Thus, I will limit myself primarily to a discussion of word order and how to represent it.
With this goal in mind, I will describe how to use simple, context-free, phrase structure rules.
The notational system I will use will be extremely simple – I will use a modified version of
Backus-Naur Form (BNF) because it is more powerful and less confusing than the system
normally used by linguists. Those among you who have worked with BNF will not learn
anything new here, especially if you’ve ever studied or worked in compiler design. You may find
it somewhat odd, though, to see BNF applied to human language.
Finally, I apologize to those readers who might be expecting a more up-to-date treatment of
syntax in terms of X-Bar Theory. Although X-Bar Theory provides desirable constraints on the
syntax of a NATURAL language, it is highly abstract and I was afraid that a lot of readers would
be turned off if I used it. Also, it’s quite possible that some AL designers may want to design a
language that is NOT constrained as natural languages are. Phrase structure rules will give them
this freedom – X-Bar Theory will not. However, I have not entirely abandoned X-Bar Theory,
since my sample syntax in section 4 was designed within the framework of X-Bar Theory, even
though I do not discuss it in those terms.
3.1 THE SYNTAX OF A SIMPLE ENGLISH FRAGMENT
Our first goal will be to learn how to use BNF to describe a simple subset of English. I will not
try to develop a complete BNF description of the entire English language because it is not
necessary, it would take too long, and I don’t think it’s possible. Instead, I’ll only work with a
rather small subset of English, one that is no more complex than is absolutely necessary to
illustrate all of the features of BNF needed to design an AL syntax. To achieve this goal, I will
start with very simple sentences and gradually increase their complexity.
Incidentally, my purpose here is to illustrate BNF – NOT to teach linguistics! As a result, some of
the analyses below are not as linguistically precise as I would like, but to do the job right would
have taken much more time and would have required lots of digression into areas that are not
really relevant here.
For starters, consider the following simple sentences:
8
Sailors cuss.
Dogs bark.
Billy fell.
These sentences consist of two words each: a noun and a verb. To describe the structure of these
sentences in BNF, we would write:
sentence ::= noun verb
which can be read as «a sentence consists of a noun immediately followed by a verb». However,
verbs can also have objects, as in the following examples:
Children like puppies.
Louise kissed Jimmy.
Tornadoes destroy buildings.
We can extend the syntax to deal with direct objects as follows:
sentence ::= noun verb (noun)
which can be read as «a sentence consists of a noun followed by a verb which may in turn be
followed by an optional noun». Thus, in our BNF notation, we will use parentheses to indicate
that an item is optional.
Now, the nouns that precede and follow the verb can be more complex than shown above, as in
the following sentences:
Children like cute little puppies.
Silly dogs bark.
Little Billy stutters.
For these cases, the noun is modified by one or more adjectives. In other words, one or more
adjectives are optional. In BNF notation, we would write:
sentence ::= {adjective} noun verb ({adjective} noun)
Here, the curly braces mean «zero or more». Thus, our definition of a sentence now reads: «a
sentence consists of zero or more adjectives followed by a noun, then a verb, and then an
optional ‘thing’ which itself consists of zero or more adjectives followed by a noun».
Now, let’s make things a little more complicated by adding articles:
The little boys saw a big angry dog.
A sad little girl watched the birds.
An angry man shouted.
We can add the articles «a», «an» and «the» to our BNF representation as follows:
9
sentence ::= (article) {adjective} noun verb ((article)
{adjective} noun)
Note though, that we are repeating ourselves in that the same «thing» appears on both sides of the
verb. It consists of an optional article, zero or more adjectives, and a noun. As it turns out, this
«thing» has a name – it’s called a _noun phrase_. Let’s take it out of the definition of sentence and
describe it separately, as follows:
sentence ::= noun_phrase verb (noun_phrase)
noun_phrase ::= (article) {adjective} noun
Our syntax now reads like this: A sentence consists of a noun phrase followed by a verb followed
by an optional noun phrase. A noun phrase consists of an optional article followed by zero or
more adjectives, followed by a noun.
English allows demonstrative adjectives («this», «that», «these» and «those»), possessive
adjectives («my», «his», «our», «their», etc.) and quantifiers («each», «both», «every», «all», etc.) to
appear in place of an article, as in the following sentences:
My mother read this book.
Her angry dog bit Danny.
That cat ate both mice.
Every good student does his homework.
We can deal with this by redefining the noun phrase, as follows:
noun_phrase ::= (specifier) {adjective} noun
specifier ::= article | demonstrative | possessive |
quantifier
Here, the vertical bar «|» can be read as «or». Thus, a specifier can be an article or a
demonstrative or a possessive or a quantifier. Also, if we really want to be thorough, we can
refine the definition of possessives as follows:
possessive ::= possessive_adjective | proper_noun’s
which can be read as: a possessive can be a possessive adjective (such as «his», «your», «my»,
etc.) or a proper noun followed by apostrophe-s (such as «John’s», «Boston’s», «IBM’s», etc.)
Next, let’s add numbers to our syntax. In English, numbers can only appear between a specifier
and an adjective, as in the following noun phrases:
all five boys
these three girls
10
the two fat lazy cats
our seven goldfish
So, to account for numbers in English, we have to modify our definition of the noun phrase as
follows:
noun_phrase ::= (specifier) (number) {adjective} noun
Now, what do we do about pronouns? As it turns out, an entire noun phrase can be replaced by a
single pronoun, as illustrated in the following:
John and Billy ate the whole apple pie. They ate it.
(«They» = «John and Billy», «it» = «the whole apple
pie»)
Five angry dogs chased those three foolish boys. They chased
them.
(«They» = «Five angry dogs», «them» = «those three
foolish boys”)
And so on. In English, pronouns cannot be directly modified by articles (*the she),
demonstratives (*that it), possessives (*her him), quantifiers (*each she) or numbers (*five they).
In other words, an English pronoun stands alone and is equivalent to an entire noun phrase. Thus,
to account for pronouns, we have to redefine the noun phrase as follows:
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (number) {adjective} noun
Now, let’s summarize by showing the entire syntax of the fragment of English that we’ve dealt
with so far:
sentence ::= noun_phrase verb (noun_phrase)
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (number) {adjective} noun
specifier ::= article | demonstrative | possessive |
quantifier
possessive ::= possessive_adjective | proper_noun’s
In all, we’ve got five definitions. Each of these definitions is called a _production rule_, since it
defines how a structure within the language can be produced. Thus, our fragment of English
consists (so far) of five production rules.
11
However, we’ve still got a long way to go. By now, though, you should be getting the idea. So
let’s speed things up a bit by handling two additional items all at once: indirect objects and
prepositional phrases, as in the following example:
John gave the children candy in the back of the bus on the
way to the park.
Handling the indirect object «the children» is easy. We simply add an optional noun phrase to our
definition of sentence:
sentence ::= noun_phrase verb ((noun_phrase) noun_phrase)
Note that the indirect object is doubly nested inside the parentheses, indicating that it must
precede the direct object, and that an indirect object cannot occur without a direct object.
The prepositional phrases are not so easy, since we are really talking about two different kinds of
phrase: a sentential prepositional phrase and a noun-modifying prepositional phrase. The phrase
«in the back» indicates where the action took place. The phrase «of the bus» simply further
defines the noun «back». Thus, the phrase «in the back» is a sentential prepositional phrase, while
«of the bus» is a noun-modifying prepositional phrase. Similarly, «on the way» is sentential since
it indicates when the action took place, while «to the park» modifies the noun «way». We can
handle these phrases as follows:
sentence ::= subject verb (objects) {prepositional_phrase}
subject ::= noun_phrase
objects ::= (noun_phrase) noun_phrase
noun_phrase ::= pronoun | modified_noun
modified_noun ::= (specifier) (number) {adjective} noun
{prepositional_phrase}
prepositional_phrase ::= preposition noun_phrase
Note that I added the constituents «subject» and «objects» to make things a little easier to read. To
keep things simple, we will consider compound prepositions such as «from under», «up to», «on
top of», etc. as if they were single words.
Now, if you look carefully, you’ll see that something unusual is happening here. A noun phrase
can contain a prepositional phrase which, in turn, contains another noun phrase. This kind of
circularity is called _recursion_, and is one of the features of language that makes it so flexible.
Basically, recursion occurs when a lower level structure, such as a prepositional phrase, is
defined in terms of a higher level structure, such as a noun phrase.
12
Another example of recursion is the embedded sentence, as illustrated in the following examples:
He told me (that) he wanted a new job.
That he needed so much money worried his friends.
Bill knew (that) she broke the window.
He told me (that) Bill knew (that) she broke the window.
Note that an embedded sentence can never occur as an indirect object – it can only be either the
subject or direct object. Note also that the conjunction «that» is required for an embedded
sentence that is the subject of a verb, even though it is optional for the direct object. Thus, we
can represent this kind of embedded sentence as follows:
subject ::= noun_phrase | «that» sentence
objects ::= (noun_phrase) direct_object
direct_object ::= noun_phrase | («that») sentence
Another recursive structure is the relative clause. Consider the following sentences:
The boy who broke the window apologized.
I saw the man who robbed the bank.
The textbook (that/which) he bought had a chapter on
linguistics.
John played the piano (that/which) his brother gave him.
Note that there are two types of relative clause shown above. In the first type, the relative
pronoun «who» links a noun phrase to the subject of the clause. Thus, «the boy» is the effective
subject of «broke», and «the man» is the effective subject of «robbed». In this type, the relative
pronoun is required. In the second type, the relative pronoun links a noun phrase to the object of
the direct clause. Thus, «the textbook» is the effective direct object of the verb «bought», and «the
piano» is the effective direct object of the verb «gave». In this type, the relative pronoun is
optional.
Note also that both nouns and pronouns (e.g., «He who laughs last laughs best») can be modified
by relative clauses. Thus, the addition to our syntax is at a very low level, as follows:
noun_phrase ::=
modified_noun
| pronoun (relative_clause)
modified_noun ::=
(specifier) (number) {adjective} noun
{prepositional_phrase} (relative_clause)
relative_clause ::=
13
relative_pronoun verb (objects) {prepositional_phrase}
| (relative_pronoun) subject verb direct_object
{prepositional_phrase}
where a relative pronoun can be either «that», «who» or «which». I will not discuss how to handle
other types of relative clause that can be formed with relative pronouns such as «whose», «with
whom», «to which», etc., but will leave it as an exercise for the interested reader.
Anyway, you should now have a pretty good idea about how to describe the syntax of an AL
using BNF. I won’t go any further with my analysis of English, since things are getting kind of
messy already, and I don’t think there’s much to gain by going any further. For those who are
interested, here’s a summary of the syntax of the English fragment that we’ve just analyzed:
sentence ::= subject verb (objects) {prepositional_phrase}
subject ::= noun_phrase | «that» sentence
objects ::= (noun_phrase) direct_object
direct_object ::= noun_phrase | («that») sentence
prepositional_phrase ::= preposition noun_phrase
noun_phrase ::=
modified_noun
| pronoun (relative_clause)
modified_noun ::=
(specifier) (number) {adjective} noun
{prepositional_phrase} (relative_clause)
specifier ::= article | demonstrative | possessive |
quantifier
possessive ::= possessive_adjective | proper_noun’s
relative_clause ::=
relative_pronoun verb (objects) {prepositional_phrase}
| (relative_pronoun) subject verb direct_object
{prepositional_phrase}
14
4.0 THE SYNTAX OF A HYPOTHETICAL ARTIFICIAL LANGUAGE
In this section, I’ll describe a simple yet highly effective syntax for an AL. It encompasses all of
the basic features, including recursion, needed for a language to be totally functional. The fact
that it is so simple, however, means that this language will depend heavily on the lexicon to
provide capabilities that are sometimes provided by syntax. For example, instead of changing
word order and using auxiliaries to change a statement to a question (as is done in English), this
language will simply append a question particle to the end of the sentence (as is done in
Japanese). In other words, the syntax will be simple at the expense of the lexicon. As it turns out,
this approach is inherently more flexible because the lexicon is infinitely expandable while the
syntax is not.
Unfortunately, simplicity has a price, and that price is boredom. The syntax I am about to show
you is extremely and undeniably boring. However, I am only trying to illustrate a minimal
configuration. I’m sure that most AL designers would want to expand upon it and come up with a
design that’s a little more exciting.
4.1 BASIC PRODUCTION RULES
My sample language will be purely right-branching. That is, basic word order will be verb-
subject-object (VSO), and arguments, modifiers and specifiers will always follow their heads
(and the head of a sentence will always be a verb – by definition). I choose VSO mainly because
it appeals to me and because it illustrates structures that are different from English. I also like it
because it is inherently easier to parse, although this is not all that important unless you plan to
write computer programs to parse it. Other than this, there are no inherent advantages or
disadvantages over other word-orders. As we’ve already seen, every possible basic word-order
(VSO, SVO, SOV, VOS, OVS and OSV) has counterparts among natural languages, and one is
not inherently «better» than the others.
The form of the sentence in the sample language will be:
sentence ::= verb {verb_modifier} {verb_argument}
{sentence_particle}
verb_modifier ::= adverb | tense_marker | etc.
Verb modifiers would be words equivalent to English adverbs, such as «quickly», «tomorrow»,
«just», etc. They would also handle the tense and aspect of the verb, such as past progressive («he
was going»), future perfect («he will have gone»), simple present («he goes»), etc.
Sentence particles would be used to modify the nature of the sentence. For example, they could
convert a statement to a question or command, or they could indicate the attitude of the speaker
towards what he is saying.
15
Verb arguments would correspond to English subjects, objects and objects of prepositions, and
would take the forms of noun phrases, adjective phrases, and embedded sentences:
verb_argument ::= (case_tag) expression {argument_particle}
expression ::= noun_phrase | adjective_phrase | sentence
A case tag would be equivalent to many English prepositions that introduce verbal arguments,
such as «to» in «I went to Boston». Note, though, that they can also introduce subordinate
clauses, as in «I saw her when I was in Boston». Here, «when» is the case tag. A case tag is
optional if the argument is the subject or object of the verb; i.e., it is optional if it is part of the
argument structure of the verb.
The argument particle would perform a function similar to the sentence particle, but will apply
only to the argument it follows. For example, it can convert the argument to an interrogative or
add emphasis to it. Let me illustrate what we’ve got so far, using English words:
English Sample language
I like Boston. Like I Boston.
I went to Boston. Go did I to Boston.
He doesn’t know I went to Boston. Know not he go did I to
Boston.
Doesn’t he know I went to Boston? Know not he go did I to
Which book did he buy? Buy did he book which?
Boston huh?
where «did» is a verb modifier indicating that the preceding verb is in the simple past tense,
«huh» is a sentence particle indicating that the entire sentence is to be interpreted as an
interrogative, and «which» is a particle that converts the preceding noun phrase to an
interrogative. Note that, for all particles, the scope of the particle will depend on the particular
particle. In other words, each particle will define its own scope. Note also that a particle
inherently terminates the item it applies to, unless it is immediately followed by another particle
that applies to the same item.
In case you’re wondering, I’m making a basic assumption about the nature of verbs in the above
structure. The verb basically has two kinds of arguments: those like the subject and object that
are required by the verb, and those that are marked by case tags (prepositions in English,
postpositions in Korean, inflections in Hungarian, etc.). Arguments that are required by a verb
are part of the valency and thematic structure of the verb. (I have much more to say about this
topic in my (very long!) monograph on Lexical Semantics.)
Now, let’s fill in some of the blanks:
noun_phrase ::= noun {simple_noun_modifier}
{complex_noun_modifier}
16
simple_noun_modifier ::= adjective_phrase | number
| article | demonstrative
| possessive | quantifier
complex_noun_modifier ::= noun_phrase_tag noun_phrase
| noun_clause_tag sentence
adjective_phrase ::= adjective {adjective_modifier}
{adjective_argument} {adjective_particle}
adjective_modifier ::= adverb {adverb_particle}
adjective_argument ::= adjective_phrase_tag noun_phrase
Note that noun phrase tags are the equivalent of English prepositions in constructions such as
«Bring the book with the red cover» or «Get the book on the table”.
Noun clause tags are relative pronouns that introduce relative clauses. For the embedded
sentence, you can require that it contain a gap or a resumptive pronoun, as we discussed earlier.
Take your pick. The above syntax allows the use of resumptive pronouns, but does not require
them.
Note that this syntax DOES allow the type of relative clause that we discussed earlier and which
occurs in Persian:
The police caught the man who he and his wife robbed the
bank.
Whether or not you implement it in your own design is entirely up to you. Keep in mind though,
that if you decide to disallow it, you will have to add to the existing production rules of your
language. In other words, by disallowing a structure that makes perfectly good linguistic sense,
your syntax will actually become more complex.
Adjective arguments would handle cases like English «red with fever» and «blowing in the
wind». Note, though, that this syntax allows these arguments to be used more directly than in
English. For example, if English had a counterpart to this form, one could make a sentence like
«The blowing in the wind kite hit him on the head».
Adjective and adverb particles would include emphasizers and de-emphasizers such as English
«very», «rather», «not too», «quite», etc.
Word order is somewhat freer than English, especially in simple noun modifiers. This is
intentional. It allows continuous refinement without the need for clumsy prepositional phrasing,
as is the case in English. For example, the expression «his three of those five black dogs» would
17
be rendered as «dogs black five those three his», which, with English word order, would sound
like «his three those five black dogs».
Finally, note that case tags, noun phrase tags, noun clause tags and adjective phrase tags all
correspond to English prepositions. However, each will be distinct, unlike English prepositions
which often fill multiple roles, and which often lead to ambiguities. I’ll have more to say later
about such ambiguities and how to prevent them.
4.2 GENERAL RULES
Sometimes it’s better to separately state a rule that applies to all of the structures of a language,
rather than clutter up the production rules. For example, coordination is best described as a
general rule of the form:
COORDINATION RULE:
Where it makes sense, any constituent may be replaced by a
coordinated constituent (using «and», «or», «but not»,
etc.). Thus, for a constituent «X», the following applies:
X ::= X {coordinating_conjunction X}
where a coordinating conjunction can be «and», «or», «but», «but not», etc. With this general rule,
our sample language can now handle constituents such as «John and Bill», «She washed and I
dried the dishes», «He bought the steaks at Joey’s Market or at Safeway», «She prepared the
drinks and I washed the dishes.», etc. Coordination poses some special problems for syntax, and
I’ll have more to say about it later.
Another rule involves the use of particles:
PARTICLE RULE:
Where it makes sense, any constituent may be followed by one
or more particles that adds to the meaning of the
constituent and/or terminates it.
This rule allows us to unclutter the production rules defined in the previous section by
eliminating all references to particles, and also gives us the flexibility of adding particles where
they were not explicitly allowed.
Another very useful general rule allows us to create complex structures that are the syntactic
equivalents of lexical compounds. Consider the following English sentences:
They held an over-the-fence conversation.
Made-in-USA and built-with-confidence labels must be
prominently displayed.
18
He needed a powerful pick-me-up.
Mercenaries-for-hire ads appeared in several places in the
magazine.
In order to implement these hyphenated complexes, we could create the following general rule:
COMPLEX STRUCTURE RULE:
Complex nouns, verbs, adjectives and adverbs can be created
with an introductory tag followed by a verb argument, as
follows:
complex_X ::= X_tag verb_argument
where X can be «noun», «verb», «adjective» or «adverb». For
example:
complex_adjective ::= adjective_tag verb_argument
A complex X can be used wherever an X appears in the syntax.
For example, a complex adjective can appear in the place of
an adjective. Also, as with other structures, a terminating
particle can be appended if necessary.
Note that these complex structures are similar in nature to compound words. The main difference
is that they have precise structure and can be as long as you want. Also, if your morphology is
self-segregating (as I discussed in my earlier essay on morphology), then you could remove the
spaces between the words in the complex, creating in effect a single, polysynthetic word.
4.3 THE ULTIMATE SYNTAX FOR AN AL
Here is a complete listing of the production rules and general rules discussed above:
sentence ::= verb {verb_modifier} {verb_argument}
verb_modifier ::= adverb | tense_marker | etc.
verb_argument ::= (case_tag) expression
expression ::= noun_phrase | adjective_phrase | sentence
noun_phrase ::= noun {simple_noun_modifier}
{complex_noun_modifier}
simple_noun_modifier ::= adjective_phrase | number
19
| article | demonstrative
| possessive | quantifier
complex_noun_modifier ::= noun_phrase_tag noun_phrase
| noun_clause_tag sentence
adjective_phrase ::= adjective {adjective_modifier}
{adjective_argument}
adjective_modifier ::= adverb
adjective_argument ::= adjective_phrase_tag noun_phrase
COORDINATION RULE:
Where it makes sense, any constituent may be replace by a
coordinated constituent (using «and», «or», «but not»,
etc.). Thus, for a constituent «X», the following applies:
X ::= X {coordinating_conjunction X}
PARTICLE RULE:
Where it makes sense, any constituent may be followed by one
or more particles that adds to the meaning of the
constituent and/or terminates it.
COMPLEX STRUCTURE RULE:
Complex nouns, verbs, adjectives and adverbs can be created
with an introductory tag followed by a verb argument, as
follows:
complex_X ::= X_tag verb_argument
where X can be «noun», «verb», «adjective» or «adverb». A
terminating particle can be appended if necessary.
4.4 SAMPLE SENTENCES
Here are some sample sentences that illustrate the above rules. The «a» sentences are normal
English sentences. The «b» sentences are the equivalents of the «a» sentences using the new
syntax:
20
a. The dog bit the boy.
b. Bite did dog the boy the. (did = past tense verb
modifier)
a. Did the dog bite the boy?
b. Bite did dog the boy the huh? (huh = sentential
question particle)
a. The book is on the table.
b. Be book the on table the. (on = «place on» = case
OR
tag)
b. Be_on book the table the. (be_on = transitive
verb)
a. Did the boy who broke the window
run away?
b. Run_away did boy the who break (who = agentive noun
did window the huh?
clause tag)
a. I hate reading technical manuals.
b. Hate I read (I) manual technical
lots. (lots = plural
specifier)
For those of you who are uncomfortable with this word order, you can get something closer to
English by moving verb modifiers, simple noun modifiers and adjective modifiers before the
word they modify, rather than after. In other words, all light modifiers will precede their heads,
and all heavy modifiers will follow them. You’ll still have VSO word order, but it will be less
«pure», and modification will be more like English.
5.0 MISCELLANEOUS TOPICS IN SYNTAX
In the following sections, I will discuss some of the loose ends that would not have fit well in any
of the preceding sections.
5.1 COORDINATION
Coordination is the linking together of two or more constituents with the same structure,
creating, in effect, a single complex structure. The linking elements are called _coordinating
conjunctions_. A language can have simple coordinating conjunctions, such as English «and»,
«or» and «but», as well as compound ones, such as English «either…or…», «both…and…» and «not
only…but also…and finally…».
21
My reason for discussing coordination is that coordinated structures are often ambiguous, and
you may want to take steps to prevent such ambiguity in your design. Consider, for example, the
following sentence:
John wants to buy the painting of the fireplace and the
Persian carpet.
Does John want to buy one item (a painting in which one can see both a carpet and a fireplace),
or two items (a carpet and a painting)? Note that the ambiguity is present only in the written
sentence. When spoken, a different stress, timing and intonational pattern (i.e., a different
_prosodic_ pattern) would be used for each meaning.
As a language designer, you must decide whether this kind of ambiguity is to be allowed in your
AL. If your major goal is to create a _spoken_ language, then you may feel secure in ignoring the
problem, and let context and prosody resolve potential ambiguities. If, however, the speakers of
your AL will have different native languages, then a solution based on prosody may not be
practical, since natural languages differ considerably in the ways they use intonation, stress and
timing. You can, of course, define the prosodic features of your AL and force students to learn
them, but in doing so, you will make your AL much harder to learn.
An additional consideration could be whether or not you want your AL to be computer-tractable.
That is, are you trying to design an AL that is readily amenable to analysis on a computer? If you
do want your AL to be computer-tractable, then you don’t really have a choice. Unfortunately,
computers are not very good at resolving ambiguity, whether spoken or written.
So, assuming you do NOT want coordination ambiguities in your AL (for whatever reason), how
can you prevent them? As it turns out, one possible solution becomes readily apparent if we
slightly paraphrase our problem sentence, as follows:
John wants to buy both the painting of the fireplace and
the Persian carpet.
OR
John wants to buy not only the painting of the fireplace
but also the Persian carpet.
In other words, use of complex coordinating conjunctions is inherently less ambiguous, because
each constituent that is being linked is explicitly identified.
Before presenting a general solution, however, let’s look at what happens when one coordinated
structure is embedded inside another one. Consider the following example:
John wants to buy the painting of the fireplace and the
Persian carpet or the rocking chair.
22
Obviously, when you embed a coordinated structure inside another one, the potential for
ambiguity becomes even greater. Thus, if we want to eliminate any ambiguity, we must mark
both the beginning and the end of a coordinated structure. Furthermore, if three or more items are
coordinated, the inner one(s) cannot be marked in the same way as the outer ones. So, if we wish
to eliminate all possible ambiguity, regardless of the number of constituents or the depth of
embedding, then each conjunction must have three forms: a start form, a continuation form and
an end form. There are many ways to implement this. An easy one to learn would be to have a
root morpheme to indicate the basic meaning of the conjunction, and an affix to differentiate the
three forms. For example:
bi- «begin» affix ko- «continue» affix
no affix – «end» and let «kwa» = “and»
and our new coordination rule would look like this:
X ::= (coordinating_conjunction) X
{coordinating_conjunction X}
Examples:
apples and oranges = (bikwa) apples kwa oranges
apples and oranges and pears and plums =
(bikwa) apples kokwa oranges kokwa pears kwa plums
As shown in the examples, you can specify in your syntax that the initial «bikwa» is optional
unless it is needed to prevent an ambiguity. This has two implications: 1. «bikwa» must ALWAYS
be used to commence an embedded coordinated structure; 2. when «kwa» appears alone, it
always links the constituent that immediately follows it with the constituent that MOST
CLOSELY precedes it. (Incidentally, note that the affixes can be re-used in creating other words.
In fact, the «bi-» affix is very similar to Esperanto’s «ek-» affix.)
Now, if these words existed in English, here’s how we would have written our problem sentence:
a. If the carpet is in the painting:
John wants to buy the painting of (bikwa) the fireplace
kwa the Persian carpet.
b. If the carpet and painting are separate:
John wants to buy bikwa the painting of the fireplace
kwa the Persian carpet.
In the first sentence, «bikwa» is optional, since the default interpretation is to link «the Persian
carpet» with the noun phrase that immediately precedes it, which, in this case, is «the fireplace».
23
If the second sentence sounds too strange to you, simply replace «bikwa» with the English term
«not only», and «kwa» with the English term «but also». Thus:
John wants to buy not only the painting of the
fireplace but also the Persian carpet.
Surprisingly, English CAN have unambiguous coordination, even when one coordinated
structure is embedded in another, as in the following example:
He baked not one or two pies or even three or four pies
but actually ten pies!
In natural languages, though, coordination syntax can be inconsistent, and, as a result,
ambiguous. As an AL designer, you can enforce consistency and totally eliminate the possibility
of coordination ambiguities, without in any way requiring unnatural syntax.
5.2 OTHER ATTACHMENT AMBIGUITIES
Coordinated structures are not the only ones that can suffer from ambiguity. In some cases, a
syntactic marker can have more than one interpretation, as in the following sentence:
John saw the man with a telescope.
Did John use a telescope to see the man, or did the man have the telescope? In this case, the
problem is with the English word «with», which can be interpreted in the sense of either
_accompaniment_ or _instrument_. The easiest solution is simply to use different words for the
different meanings. Thus, your AL could have a case tag meaning «using» and a noun phrase tag
for «accompanied by».
There are other cases of ambiguity, however, where the solution is not as obvious or as simple.
Consider the following example:
John watched the man observing the crowd with a
telescope.
Here, the telescope is clearly being used as an instrument, but by whom? It’s interesting and
perhaps instructive to look at how natural languages deal with this ambiguity. In English, you
could rephrase the sentence as follows (assuming John has the telescope):
John watched with a telescope the man observing the
crowd.
I’m not even sure if this sentence is grammatical in English, but even so, there’s no reason why
you can’t make it grammatical in your AL. The problem, though, is that we are FORCING the
speaker to use a different word order to prevent an ambiguity, and I don’t believe that this is a
24
practical solution. In my opinion, a design is flawed if structures are created to be used only
when ambiguity can occur. A much better solution would be to design the syntax and lexicon so
that the ambiguity can NEVER occur.
As it turns out, there are several natural languages in which the above ambiguity would not
occur. These languages mark the verb (usually by specially inflecting it) to indicate that an
instrumental object follows the direct object. In Swahili, for example, the above ambiguous
sentence would take one of the following two forms depending on its meaning:
John watched-using the man observing the crowd a
telescope.
OR
John watched the man observing-using the crowd a
telescope.
(Although Swahili does not have articles, I’ve included them so that the results sound more
natural to the English-speaking reader.) In effect, the instrumental marking on the verb changes
the argument structure of the verb, making it ditransitive with an instrumental second object.
This approach will solve the problem, but many designers may not like it, perhaps considering it
too exotic or difficult to learn. It would also get very messy if a verb has several arguments. And
even natural languages that use this approach, such as Swahili and Mokilese, do it for only a
small number of case roles. Although interesting, this approach does not appear to be a practical,
general solution. (Also, I’ve often wondered how Swahili speakers perceive the use of this
construction. It appears to be inflectional and that’s the way linguists treat it. However, it doesn’t
allow one to utter an afterthought instrumental role, as in «He broke the window….uh, with a
hammer». Perhaps it’s not really inflectional, but derivational.)
Another way of solving this problem is to explicitly terminate a structure with a particle. Such a
particle would, in effect, be similar to a right parenthesis. Each structure would have its own
terminating particle (e.g., one for sentences, one for verb arguments, one for relative clauses,
etc.) Such a particle would terminate the immediately preceding applicable structure and prevent
anything that follows from attaching to it. Thus, if «John» is using the telescope, our test sentence
would be something like this:
John watched the man observing the crowd end-sen
with a telescope.
where «end-sen» is a particle that terminates the immediately preceding (embedded) sentence.
Thus, «with a telescope» can only attach to the main verb «watched». (I assume here that «with»
is a case tag that can only attach to a verb – NOT a noun phrase tag that could attach to «man».)
However, I do not consider this a good solution for the reason I mentioned earlier – we are
adding something that is only used to prevent ambiguity, and which won’t be needed most of the
time. Also, I have not been able to find a single example of this approach among natural
languages. If it does, in fact, violate linguistic universals, then it may be effectively unlearnable.
In other words, any form of explicit «parenthesization» could only be uttered with forethought
25
and analysis – it could not be uttered automatically. I imagine that this is the essential difference
between a real language and a non-linguistic coding game.
Another possible solution to this problem is to use a different case tag (i.e., preposition or
postposition) for each possible interpretation. Since this is a lexical solution rather than a
syntactic or morphological one, some designers may find it easier to swallow. One way to
implement it would be to specially mark the case tag (i.e., have a distinct case tag) if it attaches
to the main verb of the sentence, and leave it unmarked if it attaches to the immediately
preceding verb (or vice-versa). For example, an English gloss using this technique would sound
something like this:
John watched the man observing the crowd main-with a
telescope.
OR
John watched the man observing the crowd with a
telescope.
Note that «with» in the second sentence is not marked – it’s attachment defaults to the verb
«observing» since it is closest.
However, by marking the case tag to indicate that it attaches to the MAIN verb, you will not be
able to attach to a middle verb, as in the following triple embedding:
John saw the man who observed the lady who watched the
crowd with a telescope.
You can, of course, provide markers for case tags that attach to inner verbs, but I wouldn’t bother.
(My feeling is that people who use sentences like this deserve to be misunderstood. 🙂 Note also
that this approach is very much like the one used in Swahili and Mokilese, except that the case
tag is marked rather than the verb, and that the inherent messiness of dealing with multiple case
roles is avoided.
Probably the best way, one which would work even with multiple embeddings, would be to mark
the case tag with a part of the verb it attaches to. Thus:
John saw the man who observed the lady who watched the
crowd saw-with a telescope.
John saw the man who observed the lady who watched the
crowd obs-with a telescope.
All of the above discussion provides fodder for some interesting observations about syntax in
general. Since language allows recursion, syntax is inherently two-dimensional. Unfortunately,
since words are uttered one after another, the speech stream is inherently linear. Thus, when we
are using language, we are trying to force a two-dimensional entity into a one-dimensional mold.
The result is sometimes ambiguity. Note though, that real ambiguity is quite rare. The human
brain can bring other weapons to bear on syntactic ambiguity, such as semantic context and
26
world knowledge. Consequently, if you are designing your language for exclusive use by
humans, you can afford to ignore this potential for syntactic ambiguity. If, however, you want
your language to be computer-tractable, then you must deal with the problem.
5.3 DO WORDS HAVE SYNTAX?
In the preceding essay in this series, I discussed ways to design the surface morphology of an
artificial language. As it turns out, syntax and morphology have quite a lot in common. In fact,
some linguists are have proposed theories that describe morphology as if it were essentially an
extension of syntax. In these theories, rules that apply to the relationship between heads and
modifiers in syntax also apply to the relationship between heads and modifiers in morphology.
The difference is that heads and modifiers in syntax are complete words, whereas in morphology
they are morphemes.
I am not going to try to describe these theories here, since they are both complex and somewhat
controversial. All I would like to do here is to very briefly show how the syntactic formalism
we’ve been using can be applied to the shapes of words.
If your AL is at all agglutinating, as seems to be true to some degree of most natural languages,
then you will want to be able to create new and more complex words by combining more basic
primitives. One way that this is commonly done is through a process called _derivational
morphology_, in which a basic or root concept is modified to create a new word by adding a
morpheme. For example, English «sad» -> «sadden» -> «sadness», «critic» -> «criticism» ->
«criticize», etc. I do not want to discuss the semantics of derivational morphology here, since I
discuss it in much greater detail in my monograph Lexical Semantics. Instead, I just want to
show you how we can apply the formalism described above to the shapes of words.
Let’s start by looking at _open class_ words. In English, these are words such as nouns,
adjectives, verbs and most adverbs. They are called _open class_ because new words can enter
and leave these groups with relative ease. Their counterpart, _closed class_ words, are words like
prepositions, particles and specifiers which change much more slowly. We could define an open
class word of a hypothetical AL as having the following «syntax»:
open_class_word ::= {modifier} root (classifier)
part_of_speech
Such a structure is called _morphotactic_ because it describes the way the morphemes are
arranged or «touch» each other. In the above example, a modifier would be a morpheme that
narrows down the possible interpretation of the root. For example, a modifier could apply the
meaning of «femaleness» to the basic word «dog» to create the new word «bitch». Different
modifiers would be used to create other distinctions such as between «mutt», «puppy»,
«purebred», «male dog», etc. For nouns, a classifier can indicate a position of the root in a
hierarchy. For example, «dog» is in the class «mammal». For verbs, it can indicate the thematic
roles and argument structure. For example, the words for «escape» and «release» would have the
27
same root and part of speech, but would have different classifiers, because one is inherently
reflexive and the other is inherently causative. Also, if each root has a default classifier
associated with it, then the classifier does not have to be specified. This will help somewhat to
keep common words relatively short.
We can also describe the phonological «shape» of the individual morphemes, which linguists
would call the _phonotactics_ of the morphemes. For example:
modifier ::= CV
root ::= CVCC
classifier ::= VCC
part_of_speech ::= V
where C is a consonant and V is a vowel. Thus, if the modifier «mi» indicates «femaleness», the
root «temb» means «dog» within the class of mammals indicated by the classifier
«anc» (pronounce ‘c’ like ‘ch’ in ‘church’), and «o» is the part-of-speech indicator for nouns, then
the word «mitembanco» would mean «bitch». If the default classifier for the root «temb» is «anc»,
then «anc» can be dropped, shortening the result to “mitembo».
You can extend your word syntax to include other kinds of words. For example, an anaphor
could be created from any open class word by using only the classifier and part of speech. Thus,
the pronoun «it», when referring to a female dog (or any mammal), would be «anco». In fact, in
this particular scheme, the word «anco» would also be the word for «mammal».
Finally, you can use different forms for closed class words. For example, specifiers could have
the form:
specifier ::= {modifier} specifier_class
specifier_class ::= CSV
where S is a semivowel. Here, one possible modifier could be a numeral, and the specifier class
could indicate ordinality. Thus, if the modifier «ku» means «seven», and «nya» indicates
ordinality, then the word «kunya» would mean «seventh».
And so forth. I won’t say any more about word design here, since I will talk much more about it
in my monograph Lexical Semantics. I simply wanted to whet your appetite and show you how
words can in fact have «syntax».
[Postscript: Many thanks to Jacques Guy and Dan Maxwell for their helpful comments!]
End of Essay
28