A Map to the Syntax of All Spoken Languages
Reading Level: C2 (advanced)
Over the course of the last decade I have been working on a list of universal structures for as many of the world's languages as possible. I have now completed most of that work in both fields of syntax and semantics, as the two fields are deeply intertwined. From this work it is obvious that semantics affects syntax. For example, the motion of a verb affects an object with velocity, direction, rotation, thereby giving rise to prepositional and/or locational structures tied with such verbs. We can even see the mutual influence of culture and syntax, which I will touch on in this article.
Is Syntax Worth Learning?
With a better understanding of how all languages are structured, you can use this information to decipher how a language works, regardless of what you know about the language, where it's spoken, its size, whether it's written or not, or what it sounds like.
Learning to speak a foreign language fluently as native speakers do requires understanding how to encode the world around you using the infrastructure of the particular language. Most grammars focus on surface grammar rules, whereas this paper discusses the underlying syntactic-semantic structure of human communication, free from the constraints of any particular language. This is a top-down approach.
The semantic representation of the world can be broken down into a couple dozen broad semantic fields depending on the level of granularity you choose. I recommend as little granularity as possible so that easily understood patterns emerge. The semantic entities within the string of communication have relationships with each other that we label with syntactic roles.
When we observe and describe the world, we construct semantic entities and string them together in an arbitrary order to construct a sentence. Our choice of putting one word before another depends on the constraints of specific language surface rules, such as the rules of eng
English or rus
Russian. In this paper I seek the description of universal roles completely independent of such surface rules.
The surface (grammar) rules for each language are actually well-defined in academic literature. It is the link between the observation of the real world and a syntactic-semantic construct that is not yet well-defined. One could say that each language can be deconstructed in reverse back to its core construct and then reconstructed into another human language as an improved definition of "translation".
Defining Parts of Speech
We have all heard of nouns and verbs. But we need to reevaluate how we think of these concepts. They are not concrete concepts but rather fluid concepts. Think of them as occurring at several positions on a continuum. We cannot say that every sentence must contain a verb, or that every language has nouns, but we can say that every sentence in every language has an element occurring somewhere on this continuum. This continuum is represented graphically under Noun or Verb? below.
At one end of the continuum are stable states (nouns) and at the other end are unstable states (verbs). Between these two states a variety of things can occur, including various forms of adjectives and adverbs.
This means that any event that is uttered contains something on this continuum and that the entities on this continuum are contained within a "speech time".
Coordinating conjunctions fall outside of this continuum and instead connect between two speech times. Some languages encode coordinating conjunctions into the verbal structure grammatically (e.g. kor
Korean, for which insertion rules would be required for that specific grammar), in which case no separate coordinating conjunction must exist outside the speech time.
Defining Event, Reference, and Speech Time
When somebody talks, this is speech time which I label as 詎
.
Reference time 指
is the relation between speech time 詎
and the talked about action 場
. Therefore, we can set up the following relationships:
1. ∀(φ)↦∃(詎)
The complete structure is ∀(φ)
which maps to the existence ∃
of an speech time utterance 詎
.
2. ∃(詎)↦指
The utterance maps to a reference. This reference is the relationship between the action and the speech time.
3. 指↦指⊏場
The reference contains the events within the speech.
4. 場↦場⊔指⊔場′
The event itself has a beginning point and an ending point containing the reference.
The following lines show the timeline of an event 1
from left to right, and possible observations 2
and discussions 3
of that event. The reference point does not overtly encode tense, which is determined by knowing the combination of the event, the reference, and the speech time.
1 - ..... 場 ..... - ..... 場′ ..... -
(the beginning and end of the event)
2 指 ..... 指 ..... 指 ..... 指 ..... 指
(reference time: past, present, future?)
3 詎 ..... - ..... 詎 ..... - ..... 詎
(possible speech times)
The start and end of an event are considered instantaneous and semelfactive having no duration at all, which is why the time of speech occurs either before, between, or after these points in time. Some complete events are also semelfactive such as the bubble burst
with no particular start or end. Semantics of this type cannot appear in durative form: bursting
, unless the semantics are borrowed in a metaphorical sense: [X ate so much that he appeared to be] bursting at the seams
, or as an iterative action: X keeps popping all the bubbles
. Whereas pop
is lexically entirely different from burst
, it is actually the complementary transitive verb for the same event, and therefore lexicalized as the same word in other human languages.
Defining Syntactic Relationships
The problem with syntactic trees is that a hierarchical relationship is assumed to exist between all parts of speech within a sentence. This is not universal. Some languages display agreement between dependent clauses and head clauses, and other such variations (e.g. as found in dak
Lakhota and dbl
Dyirbal).
To develop a truly universal descriptive system of human language, we need to establish a completely free-standing syntactic infrastructure. This structure must not have any reliance on word order. This means everything has a specific role and is marked as such. This reduces sub-clauses and recursion.
The hierarchical structure of X̅ (X-bar) syntax and phrase structure is not conducive to such an independent structure. The independent structure allows agreement across any element within the action 場
.
Syntactic description should be as simple as possible. The problem with such grammars as Minimalist Grammar and Minimalist Insertion Grammar is that highly convoluted explanations are required just to process the simplest of concepts. I think of these grammars as surface grammars for specific languages rather than a universal tool for observing and describing the world, so it is important to understand what kind of grammar is appropriate for what you're trying to achieve.
Defining Subjects and Objects
The terms subject and object are misnomers and are convenient labels used by the layman to describe syntax and grammatical relations. These terms only describe surface realizations of the grammar and have no relationship with the underlying roles which are defined by semantics. Therefore, subjects and objects have no relationship with semantics.
The terms "subject" and "object" refer only to the arbitrary position of any syntactic role in relation to the verb core. Some languages have strict ordering of all elements in a sentence, such as deu
German, so trying to label the position of "subject" and "object" in such languages is complex.
The fact that I can say "the glass shattered" with 'glass' in the subject position has nothing to do with the agent / patient / experential / etc. roles in the sentence. This is not a passive sentence, but rather an unmarked ergative sentence (made up of certain kinds of intransitive verbs). The 'glass' didn't do anything of its own volition and therefore could not be the agent or source of such an occurrence. Despite this, it occurs in the "subject" position of the sentence in eng
English , and therefore adopts surface rules specific to that language.
Compare the following example of "A car accident happened" in several languages, and pay attention to whether the accident is a subject or object:
rus: Произошла авария.
proizošla avarija.
Syntax: verb + subject
zho: 發生了車禍。
fasheng-le chehuo
Syntax: verb + object
kor: 사고가 일어났어요.
sago-ga irŏnassŏyo.
Syntax: theme + verb
eng: A car accident happened.
Syntax: subject + Verb
eng
English surface rules do not allow: *Happened a car accident.
zho
Chinese surface rules do not allow: *車禍發生了。
chehuo fasheng-le
Syntax: subject + verb
Although semantics is exactly the same in every language, the syntax produces different surface realizations for each language. However, in a universal syntactic-semantic framework, the above sentence describes only one and the same event: the existence ∃
(occurrence or happening) of a theme (a car accident). We can write this as follows:
場↦{∃ THEME}
Since this event is existent in nature and not bound by telicity, the tense that manifests in each language is unstable. The reference time is now, and the theme does exist, therefore the aspect is completed. This completed aspect gives rise to the use of a perfective verb in rus
Russian and a past tense in eng
English.
We can add granularity to the syntactic function, by using a reference time of now 現
thus:
場↦現{∃ THEME}
A car accident happened.
If the car accident was still in process of happening (though impossible if this were a semelfactive occurrence) at the moment of speech time, we would define the verb as durative v̿
:
場↦現{∃v̿ THEME}
A car accident is occurring (as I speak).
Since every element is a single character and can be arranged in any order, this gives our processing algorithms great range of freedom. We can extract any kind of possible collocation at the deepest and most meta level of syntax. Rearranging all the elements in any order does not change the semantics or the syntax:
場↦{THEME ∃v̿}現
A car accident is occurring (as I speak).
In every language, this theme will be assigned to a different grammatical position: a subject behind the verb in Russian, an object behind the verb in Chinese, a theme before the verb in Korean, a subject before the verb in English. These rules are defined by each language individually and represent a "translation" from the real-world observation.
The effects of whether something occurs in subject or object position on a particular culture is questionable and inconclusive. I believe that speakers of any language can easily understand or acquire comprehension of the underlying roles without distortion from their own language or culture.
Defining Independent Syntactic Entities
At the core of every action 場
is the verb phrase. In Role and Reference Grammar adjuncts may be added outside of the verb phrase, but we treat certain kinds of adjuncts here as functions of the core verb which I demonstrate below.
This means that a verb of motion can take arguments such as agent, location, comitative, etc. Please note that not every preposition in European languages maps neatly to these roles, nor do the many cases that manifest in Uralic languages. Roles are dependent upon the underlying semantics of the sentence. Therefore, semantics drives syntactic structure and the relationship is a very tight one.
If this verb of motion is not time-bound (i.e. telic v͆
) and is a dynamic verb v͌
, I encode this as follows:
場↦{v͌ AGENT LOCATION COMITATIVE}
(again, all the elements have free order)
A possible translation of this phrase into eng
English is:
eng
A man (AGENT) is walking to a store (LOCATION) with a friend (COMITATIVE).
We have not defined reference time here, but it appears that the action and the speech time coincide 場⊔指⊔場′
. The phrase of words "to a store" is treated as a single "noun-like" entity here, and in fact does appear as only one word in many languages, if we can define word as: a string of letters not interrupted by a space. This same "noun-like" entity also applies to "with a friend".
The accusative/directional "to a store" is also an example of an achievement.
Here is an alternative structure using a telic verb v͆
with a possible eng
English translation.
場↦{v͆ AGENT LOCATION COMITATIVE TIME)}
eng: A man (AGENT) walks (and arrives) at a store (LOCATION) with a friend (COMITATIVE) after five minutes (TIME).
This is an example of an accomplishment.
Or: eng
A man arrived at a store with [his] friend after walking for five minutes.
Labels such as determinants DET
are language specific rules and too granular, but these can be derived by the underlying structure. Here it would be most appropriate to translate this as 'his friend' rather than 'a friend' in eng
. Most languages would omit this DET
label, whereas some languages may refer back to the agent using a reflexive, such as rus
свой.
Since I have discarded the use of the terms "subject" and "object", how about the terms "noun" and "verb"? What evidence do we have that nouns and verbs are concrete concepts? In the next section I discuss how to re-define these terms.
Noun or Verb?
What we call nouns (things) and verbs (actions) appear on a continuum between stationary and moving. Concepts change easily between nouns and verbs and everywhere in between. Adjectives exist at an intermediary point between nouns and verbs. Adjectives can behave either like verbs (predicates) or like nouns (nominal adjectives). Adjectives can also take on active and passive roles just like verbs (compare English 'interesting' and 'interested').
Every semantic concept in human language therefore has a continuum state between noun and verb. Not all concepts are realized in every role. For example, we can speak of "hope", "to be hopeful", and "to hope" thereby fulfilling all three roles. But we cannot speak of "window", "to be windowed", and "to window", unless we change the way the verb operates. Some languages may have a single word for "to put in windows" based on the noun "window". In fact, I can create such a verb in English by saying "We'll be windowing the new house today." Though not necessarily this specific word, it happens to be something that prolific writers like Stephen King use for dramatic effect in their writing as literary devices.
In some languages adjectives stand independently between nouns and verbs:
stable
N ..... ..... A ..... ..... V
unstable
In some languages, adjectives function more like nouns:
stable
N ..... A ..... ..... ..... V
unstable
In many languages, adjectives function more like verbs:
stable
N ..... ..... ..... A ..... V
unstable
It appears that most languages are able to adjust the positions of N/A/V on this continuum farther to the left or right.
Both nouns and verbs can manifest in many forms. Verbs can be existential (∃) as the most stable form, closer to nouns. Verbs can be stative v͇
, acting like adjectives. And verbs can be unstable or dynamic v͌
, as actions. Here is where they appear on this continuum (and now replacing adjective with stative verb):
stable
N ..... ∃ ..... ..... v͇ ..... v͌
unstable
Existential ∃
, stative verbs v͇
, and dynamic verbs v͌
can all take the core position in a sentence and they do not invoke one another. These elements usually require one or more arguments, although zero arguments are common as well. Stable entities (nouns) rarely appear in isolation, and therefore always manifest as an argument of a verb. In non-agglutinative languages such as eng
, these arguments manifest as noun phrases consisting of multiple words.
{core_verb(argument1,argument2,argument3)}
Nouns (and non-functional adverbs, "adjuncts") seem to be the only thing that can appear as arguments, though adjuncts tend to be independent. Everything else requires arguments, and therefore are considered "core verbs".
One may ask how to deal with compound verb phrases if these verbs do not invoke one another. This is resolved via verb functions, which is where most adverbs end up in our syntactic description, and is described in great detail under Verb Functions below.
Here is an example of a semantic entity that manifests across the spectrum.
When we say "it is dark", in terms of 'rays of light' we can deduce that darkness is a temporary state. If we refer to an object's attribute then it is a permanent state. In English we use the "be" verb with the adjective to indicate the stative nature (note: "-stat-" means "being" or "standing"; compare Italian "stato" (been) and Latin "status"). We can treat "be dark" either as a predicate (acting as a verb), or as an adjective (which is what "dark" is classified as in English). I label the two words together "be dark" as v͇
. I label adjectives that appear together with nouns as nominal adjectives å
.
stable
N ..... ∃ ..... å ..... v͇ ..... v͌
unstable
N = darkness
∃ = there is darkness, let there be darkness
(this includes hortatives)
å = dark (as in 'a dark thing')
v͇ = be dark, get dark, become dark, feels dark
v͌ = to darken
Also known as a predicate adjective, v͇
can take on various predicate verbs in English, usually related to the senses (seems, finds, feels, looks, sounds, tastes, smells
) or change of state (gets, becomes
) or behavior (acts, resembles, appears
). Some of these verbs take a predicate noun δ
as a possible argument. Van Valin has a complete list that also includes verbs such as costs
in this category.
eng: I find it strange.
'I' is the experiencer; 'to find strange' the stative verb v͇
; 'it' the theme.
In many of the world's languages, a v͇
takes on aspect rather than tense (but tense can be added). Aspect refers to whether a change in state as happened and tense refers to when an event happened regardless of any change.
For example, in many languages, to express that "it is now dark" is the same as "it has already become dark" and may use a past tense or perfect aspect, meaning that the process of becoming dark has finished. English makes use of the word "now" semantically as aspect rather than tense, whereas many languages would only use the word "now" temporally. Without the word "now", the English "it is dark" disregards aspect and only focuses on the temporary state during the tense of the verb.
Languages that lack aspect express "it is dark now" using past tense to refer to the completed change in the present state. The English
now
is your cue for knowing this completed change and is equivalent to saying "already". Sincenow
as a function of change is encoded in the core verb, translating or saying thisnow
in many other languages would be considered an error or a redundancy.
This feature is commonly found in East Asian languages. Japanese marks tense, so it applies past tense to verbs to indicate this change.
jap: 疲れる
tsukareru
to get/be tired
jap: 疲れました
tsukaremashita--past tense
I'm tired now/already.
Chinese has aspect by using 了, so it applies this to indicate the change in state:
zho: 我累
wo lei
I tire / I am tiring.
zho: 我累了
wo lei-le
I'm tired now/already.
In Slavic languages, nominal adjectives can take a large variety of endings. Predicate adjectives in Slavic languages appear as "short-form" (simplified versions of nominal adjectives):
rus: согласный человек
soglasnyj čelovek
an agreeable person (nominal adjective)
rus: Я согласен
ja soglasen
I(male) [am-in-agreement] (predicate adjective)
[ad_random
]
Arguments and Valency
Verbs are said to carry an amount of valency which then tells us how many arguments exist tied to that verb. The valency is usually the lowest number, since arguments can be added to the base number.
Intransitive verbs have a valency of 1 (the agent, the experiencer, or in ergative sentences the patient -- frequently occurring in subject position across languages).
Transitive and ditransitive verbs have objects, therefore having a minimum valency of 2.
Zero Valency
It can be argued that impersonal constructions describing weather in English such as "it's sunny" that require "it" in a subject position are in fact, zero valency verbs.
In some cases it is not so easily defined. Many languages describe "it's raining" as "rain falling" with a noun+verb construct, others as simply verb, yet others as a "state":
1. {v͌()}
(Active verb with zero arguments)
Examples: kat: წვიმს (ʦ⁼ᶹims)
lit: Lỹja
slk: Prší
ron: Plouă
ell: Βρέχει (vréxi)
swa: inanyesha
dru: uda-udal-e
trk: q[m]uyux
tgl: u[mu]ulan
2. {v͌(THEME)}
(Active verb with a theme argument)
Examples: srp: Pȁdā kȉša
nan: 落雨 (lo̍h-hōo)
3. {v͇()}
(Stative verb with zero arguments)
Examples: ssf: quraz-iza
xsy: 'o[mo]ral-ila
(The xsy
SaiSiyat example contains an active verb infix, but the perfective ending gives it a parallel stative verb structure as the ssf
Thau.)
4. {∃()}
(Existential with zero or one argument)
(seeking examples)
If the rain falls [itself] from the clouds, is it an ergative construction? If raining is a state, can it actually be a stative verb without any arguments? Or does rain just exist as an existential in some cultures?
Arguments as Roles
Agents, patients, beneficiaries (benrecs), experientials, predicate nouns, themes, and comitatives have already been explained. Causative topics are discussed under Verb Functions.
Also including humans are: vocative, genitive, and self. Objects can also be oblique, partitive, and incorporative (a verb like fly
that incorporates another object as a tool in order to execute the verb).
In addition to these are locational and directionals: ablative, locative, and spatial/directional.
Temporals are given an argument where they are not redundant and not already expressed as functions of the verb. Certain kinds of adverbs that describe method/manner not encoded in the verb are positioned as an argument.
Arguments can also occur as an amount, a degree, or as a measure.
It was necessary to also add environmental Natural Causes as an argument.
Verb Functions
If we look specifically at eng
English, many verb functions require ad hoc constructions of other verbs (or modals) that cause the core verb to change into an infinitive or a gerund, or perhaps a base form with no change at all. English frequently mades use of adverbs to .
Phrase Functions
Includes: 1) conditional, if; 2) purpose followed by a verb phrase; 3) reason for doing; 4) derived clause, sub-clause, {that, which}; 5) therefore; 6) because; 7) and; 8) not; 9) but; 10) either; 11) or.
Intention Functions
Includes: 1) will; 2) prepare, about to do; 3) want; 4) would; 5) should; 6) attempt/try/fail; 7) certainly; 8) absolutely; 9) voluntarily, on purpose, purposefully; 10) involuntarily, on accident, happened by itself; 11) deliberately, with motive/intention; 12) accidentally, without motive/intention; 13) willingly; 14) agree; 15) unwillingly agree; 16) forcibly agree (under duress).
Potentiality
Includes: 1) able; 2) could; 3) might; 4) must, need, necessary; 5) may, supposedly; 6) possibly, possible that.
Valency-Increasing Functions (Jussives and Causatives)
We add a "topic" to the sentence, another kind of agent that is not the actual agent of the core verb. The topic is encoded as an argument and the functions encoded with the verb.
These include: 1) help (an agent do something); 2) make (an agent do something), by command/order; 3) have (an agent do something), by request; 4) get (an agent to do something voluntarily), by asking softly or persuasively; 5) let (an agent do something), by allowing; 6) manipulate/force (an agent to do something), by telling; 7) show (an agent how to do something), though this frequently does not manifest as a causative in many languages, but rather "do something + beneficiary onlooker".
Though neither jussive or causative in nature, indirect speech is also a valency-increasing function.
Examples:
Valency 1:
場↦{STATIVE(V) AGENT(1)} He teaches.
Valency 2:
場↦{STATIVE(V) AGENT(1) THEME(2)} He teaches math.
場↦{CAUSATIVE-STATIVE(V) CAUSER(1) AGENT(2)} I asked him to teach.
Valency 3:
場↦{STATIVE(V) AGENT(1) THEME(2) BENEFICIARY(3)} He teaches children math.
場↦{CAUSATIVE-STATIVE(V) CAUSER(1) AGENT(2) THEME(3)} I got him to teach math.
Valency 4:
場↦{CAUSATIVE-STATIVE(V) CAUSER (1) AGENT(2) THEME(3) BENEFICIARY(4)} I helped him teach the children math.
Complex Valency:
場↦{CAUSATIVE-CAUSATIVE-STATIVE(V) CAUSER(1) CAUSER(2) AGENT(3) METHOD(4) THEME(5) BENEFICIARY(6)})} I allowed her to show him how to teach the children math.
Suggestion
This may decrease valency, depending on the semantics.
Includes: 1) let's (hortative, suggestion to do something), 2) suggest and recommend doing.
Cognitive
Many of these cognitive verbs, although able to stand as independent core verbs, are almost always coupled with core verbs. In English we find that the core verbs are frequently implied but omitted in discourse.
Includes: 1) hope/wish; 2) fear/afraid of; 3) assume/consider; 4) decide to; 5) forget/remember to; 6) believe; 7) think of/would like to; 8) like to; 9) dare to; 10) claim to; 11) admit to; 12) deny; 13) allege/etc.
Custom
Includes: 1) habit of doing, used to doing; 2) used to do.
Appearance
Appearance includes direct and indirect perception of events. You may only know of an event indirectly through someone else, or merely heard about or of it. Some events happen clandestinely or intentionally made to look false, and such events have a range of perception that can be expressed.
Includes: 1) apparition/apparent (sounds like, looks like, tastes like, smells like, feels like, appears to be, know/heard of, know/heard that); 2) expectedly; 3) unexpectedly; 4) obviously; 5) falsely, pretend to; 6) secretly; 7) supposedly secretly; 8) obviously secretly; 9) deliberately secretly.
Relational
Sometimes the verb binds the agent and patient/beneficiary/comitative through reciprocation.
Includes: 1) expressly apart (agent and patient separated); 2) expressly together (agent and patient); 3) expressly reciprocal (one another, with each other).
Manner of Action
There are a myriad of ways an action unfolds, from the inchoative to the perfective and every step in between (imperfective).
Includes: 1) iterative, action happens again; 2) continuously iterative, again and again, repeated action, including repeated semelfactive verbs (hitting, smacking, ...); 3) simultaneously, during, when, while; 4) not simultaneously, afterwards, and then (syntax:
場1 ≢ 場2: 場1 and then 場2, 場1 before 場2, 場2 after 場1
; 5) stay doing, keep doing; 6) start to, begin to; 7) stop doing (not a command, as in "stop smoking"), see also "purpose followed by a verb phrase" for structures like "to stop to smoke", meaning "in order to"); 8) finish doing, all the way done.
Frequency
Other than those mentioned in the previous section, frequency or explicit mention of time of action is often encoded in separate words in the sentence. If these additional words are not mere reinforcements of the core verb structure (such as "aspectual change" by eng
"now"), then we encode the adverb phrase as an additional argument.
TAM
As demonstrated, aspect and modality have already been encoded in the categories above. What is missing is tense which is encoded in the complex relationship between reference time 指
and the action 場
. Since we encode this relationship, explicit marking of tense is not a necessary grammatical function.
Languages that explicitly encode tense translate the reference time in arbitrary ways specific to that particular language, which is beyond the scope of this paper.
Registers
Registers often deal with social class. There are several things to note here:
- Gender;
- Social position
- Respect levels
- Clusivity
Registers frequently manifest on pronouns, but can also be encoded into verbs in that completely separate verbs are required in different registers (e.g. in jav
Javanese, ind
Indonesian, tha
Thai, vie
Vietnamese, kor
Korean, jap
Japanese, and sometimes even eng
English). In some cases, social position can cause pronouns not to be used.
Gender is an interesting topic. We have already witnessed degenderization in languages like swe
Swedish, where it has happened grammatically and naturally over time. So although some languages have a male/female mixed register (e.g. esp
Spanish, hrv
Croatian), it is important to account for a neuter register, both as speaker and listener.
In many languages, gender plays a role either as speaker, or as listener, or as both (compare usage in pol
Polish and ara
Arabic).
Inclusive and exclusive mostly occurs on the 1st person pronouns. An example of this is the use of "we" used by a married couple in English. Though unmarked in English, it conveys an "exclusive" nuance. Some language families such as Austronesian have separate words for these pronouns. In addition to inclusive and exclusive pronouns, it is important to account for possible registers as married, divorced, or a "potential inclusive" for partnerships.
Semantic Fields
The goal of this grammar is to achieve advanced capability at sorting any language in terms of complexity, vocabulary, collocations, and with this new level of granularity, develop new Natural Language Processing tools. We have already made use of our syntax-semantic mapping in Glossika's Machine Learning framework. The framework not only allows us to track complex relationships in tense and aspect, but also more accurate nuances in human discourse.
Being too granular however leads to problems when running algorithms. They fail to produce the expected results. If your data is too granular, then you need to build a larger database of training data. At our initial stages, we have over-tagged everything in anticipation for more training data, and ignore specific types of granularity when running algorithms.
When considering the lexicon of any language, just by taking a glance at any dictionary, one finds that the typical "word" entry has multiple meanings attached to it. One of these meanings may have its own lexical entry elsewhere in the dictionary under a different word, known as a "synonym". For example, "get up" may be listed under "get", though these should be considered separate lexical entries, but a synonym may be found under "wake up" and even "awake", though each of these uses may differ in terms of valency and transitivity, perhaps ergativity (whether marked or not).
The number of words in a dictionary does not equal the number of meanings found in human communication.
How many "meanings" exist in human communication? This is extremely difficult to answer. But as we move to a more detailed markup of our database, in theory we get closer to knowing the answer. The problem is that we cannot be too granular (there are literally a million objects in our world that each have a name -- so we find it meaningless to encode every noun in existence like the exact names for all the pieces and parts and transistors and atoms found in every machine and computing device, not to mention every other thing known to man; instead, we can lump like things together into a specific class of nouns). More importantly, we look at verbs, their extrapolation from unstable to stable (e.g. the verb "to freeze" has a corresponding noun "ice" -- even though the two look nothing like each other in eng
, they do in zho
and many other languages), and then map their possible valencies. Valencies tend to adopt specific types or "fields" of semantics rather than specific "objects". For example, if a verb acts upon David, then why can't it act upon Mary and John, and him or her, ad infinitum? If this is so, we can deduce that the verb acts upon "living people". Can it act upon plants and animals as well? Are plants and animals in complementary distribution (semantically) with living people? We can prove otherwise, which gives rise to each their own semantic "field".
Although it seems that Anna Wierzbicka's semantic primitives would be a great fit for our goal, and do prove useful for determining the specific breakdown of all kinds of complex ideas, they play no role in our organization of semantic fields, listed here:
Existence 在
, measures and numbers 量
, times and dates 時
, astronomy 宇
, geology/geometry/geography and inorganic matter 地
, directions and positions 向
, shapes and things 型
, clothing 衣
, motion 動
, food 食
, biological organisms 生
, the body 身
, botany 植
, zoology 獸
, physical senses 感
, knowledge and learning 智
, communication 傳
, volition/needs/success/action 願
, social constructs and trade 社
, the mind and entertainment and beliefs 心
. We have mapped dozens of sub-fields for each of these.
If we split up each lexical entry in a given language by each definition with an ascribed valency matrix, we find that the individual semantic meanings in a given language far exceed the number of lexical entries by an order of magnitude. After doing so, we find that a finite number of base syntactic patterns emerge and by grouping semantics into specific fields, a finite number of syntactic-semantic patterns that represent a complete map of human communication, independent of human languages.
It is precisely these groupings into "patterns" that enable humans to communicate fluently in any language. Mastering these sets of patterns enables the human to manipulate sentences, and therefore ideas. Mastering more granular "vocabulary" enables the human to expand expression and speak ever more precisely.
This is how Machine Learning works as well. Pattern recognition at low granularities pass through more convolutions until more and more detail is added to the machine's "understanding". In other words, we have taken the methods by which machines learn and reverse-engineered human language to discover the underlying patterns that drive fluency and expression in humans. This is what the Glossika algorithms deliver on our training platform.
Influences
Those who have had the biggest influence on this work include the following individuals:
Daniel Everett, Robert Van Valin, Michael Tomasello, Thomas Givon, Tim Hunter, Jeffrey Lidz, Tim Fernando, Maria Bittner, Anna Wierzbicka.