Writing systems tend to be lossy format records of spoken language.
Standard languages were created from natural languages
In fact, most spoken languages have so many varieties of accents and dialects, that writing systems grew out of a necessity to accommodate a large variety of lects and unify how everybody spoke to each other.
Once these standards of writing became more widespread, it was common to learn how to speak like the standard, and thus was born standard spoken languages. However, in real life, most standard languages are not spoken that much by average folk.
An example of this can be found in the Philippines. Tagalog has been the most widely spoken language, and it was upon this that the national standard was based, going by the name Filipino. But Filipino also incorporates a lot of things not found in Tagalog, for example, standardized words from English and earlier borrowings from Spanish.
I have found from personal experience that Tagalog does not have a high register of speech. For official purposes Filipino will be used. But outside of official purposes, people still need a higher register of speech, especially among strangers, and English seems to fill this void. As such, most public notices tend to be written out in English, as it sounds more official than Tagalog.
The languages of Europe have undergone standardization over the last couple centuries. Most people in Finland speak a regional dialect that has shorter words than the Finnish national standard language. Most people in Germany are bilingual in Standard High German and their local language or dialect. The same can be said of Italy, France, and Spain. Slovenia is a complex hodgepodge of dialect variety in spite of one standard writing system. Variation in language differs from country to country, so that countries like Poland and Czechia have less variation than some of their smaller neighbors. Despite their size, both Russia and the United States tend to have very little to no variation in their respective languages.
Two Things Writing Systems are Used For
There are two things that writing does. The first is record the sounds of the language, which we can only understand when we hear the sounds to unlock the meanings. The second is record the meanings of what we want to convey.
Some people may believe that English writing is a sound-based system whereas Chinese is a meaning-based system. This is not true. Both are just about equals for several reasons.
English words are based on a very old spelling, so at most they are a very loose guide to how words are pronounced. Chinese, likewise, is a very old writing system where more than 90% of characters were based on glyphs that represent specific sounds, in this case, syllables. So both English and Chinese are loosely based on sound, and are both equally difficult to learn how to read and write.
I've observed that Chinese-speaking children learn how to read more of their language in less time than English-speaking children simply because English has an ever larger and growing vocabulary the higher you go in school, whereas Chinese recombines characters already learned in primary school as roots in larger words with more complex ideas, so there is less focus on vocabulary in Chinese at higher stages of learning and more focus on abstract concepts and meanings.
Few adult users of English or Chinese actually look at the inner details of words, such as the individual letters in a word, or individual strokes in Chinese characters. In fact, as you read these words now, you're reading whole words or even phrases at a time. We've trained our minds to rely less on the actual spoken value of words and rely more on the meanings that are conveyed. So Chinese and English are very much alike in this regard, and they both require a lot of memorization to be able to read aloud or write correctly.
Most writing systems that keep up to date with the spoken language generally need to be updated once every 100 years.
In computer science you may have heard of the term lossy format. This is most obvious when looking at pixelated pictures of things. Since the resolution is bad, if you make the picture larger, you just can't see it very clearly. You need to zoom out in order to see the picture clearly, but even then, it will still be somewhat of a lossy format.
Most writing systems are lossy formats. We can't possibly encode all the nuances of meaning and all the details of our pronunciation into a singular writing system. When we encode language with the International Phonetic Alphabet, what we do is find the least common denominator among all speakers of that particular language. If we try to record an even larger area that includes many variations, dialects, or other languages, it becomes harder and harder to encode with this so-called common denominator.
The common denominator cannot have a lot of details about an individual's speech patterns, because that would differ from someone else who speaks a different variety of this language. Many languages discovered in recent years are given their writing in precisely the same way. For example, when we first started writing down the indigenous languages of Taiwan, we borrowed the letter {q} from the International Phonetic Alphabet, pronounced /q/, to write this very sound for all of the indigenous languages of Taiwan. Some sounds like /lh/ cannot be found on the keyboard, so the indigenous language Thao spells it out {lh} whereas another language Hlaarua spells it out {hl}.
We also find these regional differences in English. For example, as I've been speaking you may have noticed a non-descript North American accent. With just a couple words such as "better innovative solutions" you would recognise this fact by my use of the flap and my stress placement.
However, if I were to switch to some non-descript British accent, if I were to say "better innovative solutions" you would immediately recognise this other accent.
However, the words that we write down do not reflect these differences and it makes it very easy for us to communicate in writing across borders where speech differs. These differences are much greater in other languages. Some languages have approximatley the same number of differences such as Serbian, Bosnian, Croatian, and Montenegrin, but yet are considered different languages due to political borders.
Writing in a lossy format is a great way for people to communicate. It's faster and requires less effort, and we stay focused on communicating rather than trying to focus on specific details of how I personally speak which is usually irrelevant to the message being conveyed.
For the foreign language learner, this is detrimental. Since so much effort is spent trying to pick up the correct accent, stress, pronunciation, prosody, etc, and very little of this can be derived from the writing alone. If you were taught that English {t} is always /tʰ/, as most Chinese children are taught, then they will not understand the word "better" unless I pronounce it "bettʰer".
So a sentence such as "I finished editing the article earlier today" can be super confusing to someone who's never been exposed to the spoken language. And most students would be at a loss to even find the words in a dictionary, not to mention what the sentence even means.
We can easily parse the sentence when we hear it based on stress and word boundaries. But to someon just learning English, they don't know where the word boundaries are because none of their textbooks have taught them how to parse based on stress. In fact, the only way they can figure this sentence out is how we do it natively: through extensive exposure.
Phonemic IPA: /aɪ ˈfɪnɪʃt ˈɛdɪtɪŋ ði ˈartɪkəl ˈərliər təˈdeɪ/
Phonetic IPA: [aːʲfɪ́nɪʃt⁼ɛːdɪɾɪŋd̪iʲáʴɾɪkɫ̩ə́ːʴliəʴɾədéːʲ]
The English learner hears: "I finish daddying D are diggler leered day" or something equally confusing or incoherent. What this learner needs are clearly defined boundaries to each word and everything pronounced in the Phonemic line in order to really understand clearly. This kind of learner would just be at a loss listening to anybody speaking at native fluency.
I strongly recommend that teachers of any foreign language try to avoid speaking in hypercorrect pronunciations, such as the phonemic transcription above, because this reinforces deletes all surface rules for the learner and reinforces bad habits. It's as if you were teaching them the underlying base form of the language rather than the real spoken language itself.
Some languages like Russian, Korean, English, Thai and many others have so many rules of going from the writing system to the way people actually speak, it can be very overwhelming for the student of these languages.
There is another way that we can record the sounds of the language using the International Phonetic Alphabet. The method I just described above is called a phonemic spelling of the language.
The second method puts more details into how the sounds change in a language. This second method is called phonetic. Phonetic transcriptions should show you the changes that native speakers make when they pronounce words. This is how we record languages in IPA here at Glossika. Typically we choose the way someone from the capital of their country pronounces words in a natural and relaxed way.
For example, in the example sentence above you can see how the /t/ in "editing" and "article" has changed into flaps [ɾ] in the phonetic transcription line.
As a result, using the IPA transription in Glossika can help you identify problems in your own pronunciation.
If you'd like to know more about IPA, please visit the Glossika Phonics channel. If you'd like to see IPA in action, log in to your Glossika account and turn on any language that has an IPA transcription available.