List of Language Resources
In addition to accumulating thousands of paper books and photographing thousands of pages from national libraries, I also make frequent use of a subset of the following websites and online tools which I'm sure will help many readers, especially those actively involved in the science of Linguistics.
Contents
Linguistics
Language Description
Machine Translation
Maps, Media, Fun
Phonology
Pidgins, Creoles, Other Languages
Reading Tools
Semantics, Corpus, Etymology
Syntax
Transcription Tools
Typology
Languages
African (Sub-Saharan) Language Families
Altaic Language Family
Non-Altaic Languages
Afroasiatic Language Family
American (North, Central, South) Language Families
Australian Language Families
Austronesian Language Family
Formosan Branch Languages
Malayo-Polynesian Branch Languages
Oceanic & Polynesian Languages
Caucasian Languages: NE, NW, Kartvelian
Dravidian Language Family
Indo-European Language Family
Armenian
Balto-Slavic Branch Languages
Celtic Branch Languages
Germanic Branch Languages
Greek and Albanian
Indo-Iranian Branch Languages
Italic/Romance Branch Languages
Non-European Languages
Mon-Khmer Language Family
Sino-Tibetan Language Family
Tai-Kadai Language Family
Trans-New Guinea Language Family
Uralic Language Family
African (Sub-Saharan) Language Families
- Bantu Basic Vocabulary Database.
- Pronouns in 500 African languages. Les marques personnelles dans les langues africaines.
- Reference Lexicon of the Languages of Africa Le projet RefLex a pour objectif de mettre à la disposition de la communauté scientifique un corpus lexical de référence pour les langues d'Afrique, ainsi que des outils de traitement et d'analyse adaptés à ce corpus.
- Glossika Courses:
Hausa Fluency Training (coming soon)
Samburu Fluency Training (coming soon)
Swahili Fluency Training
Xhosa Fluency Training (coming soon)
Yoruba Fluency Training (coming soon)
Zulu Fluency Training (coming soon) - Virtual Zulu Website starts singing as soon as you enter.
Altaic Language Family
- Glossika Courses:
Azerbaijani Fluency Training
Chuvash Fluency Training (coming soon)
Kazakh Fluency Training
Kyrgyz Fluency Training (coming soon)
Mongolian Fluency Training
Tatar Fluency Training (coming soon)
Turkish Fluency Training
Turkmen Fluency Training (coming soon)
Uyghur Fluency Training (coming soon)
Uzbek Fluency Training
Yakut Fluency Training (coming soon)
Non-Altaic Languages
- Handbook of Japanese Verbs Get the 1000 most frequent Japanese Language Proficiency Test (JLPT) verbs in reverse lookup, grouped by transitive / intransitive, to assist the student in acquiring Kanji and vocabulary.
- Glossika Courses:
Japanese Fluency Training
Korean Fluency Training
Afroasiatic Language Family
- Arabic Learner Corpus.
- Arabic - Quranic Corpus.
- Glossika Courses:
Amharic Fluency Training (coming soon)
Berber Fluency Training (coming soon)
Egyptian Arabic Fluency Training
Levantine Arabic Fluency Training (coming soon)
Maltese Fluency Training (coming soon)
Modern Standard Arabic Fluency Training
Oromo Fluency Training (coming soon)
Somali Fluency Training (coming soon)
Syrian Arabic Fluency Training (coming soon)
Tigrinya Fluency Training (coming soon)
American (North, Central, South) Language Families
- California Language Archive at University of Berkeley
- Corpus in Caddoan Languages Indiana University.
- Glossika Courses:
Aymara Fluency Training (coming soon)
Cherokee Fluency Training (coming soon)
Cree Fluency Training (coming soon)
Guarani Fluency Training (coming soon)
Kichwa Fluency Training (coming soon)
Nahuatl Fluency Training (coming soon)
Navajo Fluency Training (coming soon)
Ojibwe Fluency Training (coming soon)
Quechua Fluency Training (coming soon) - South American Indigenous Language Structures (SAILS).
- línguas e culturas indígenas sul-americanas links to resources for over a hundred South American languages.
- South American Phonological Inventory Database at Berkeley University: to share data regarding the phonological inventories of South American indigenous languages for purposes of linguistic research and education. Phonological inventories of specific languages can be accessed using the map browse function, the sortable name table, or the phoneme search function. Currently includes inventories for 363 languages and varieties.
Australian Language Families
Austronesian Language Family
- Austronesian Basic Vocabulary Database includes Swadesh lists for about 1000 languages (some transcriptions can be misleading).
- Austronesian Comparative Dictionary.
Formosan Branch Languages
Philippine Branch Languages
Malayo-Polynesian Branch Languages
- Balinese Dictionary Resources.
- Indonesian Dictionary Resources.
- Glossika Courses:
Cebuano Fluency Training (coming soon)
Formosan Languages Fluency Training (coming soon)
Ilokano Fluency Training (coming soon)
Indonesian Fluency Training
Malagasy Fluency Training (coming soon)
Malay Fluency Training (coming soon)
Tagalog Fluency Training - Javanese Dictionary Resources.
- Malay Dictionary Resources.
- Vanuatu and Oceanic Language Resources: Alexandre François' excellent contributions cover many of his fieldwork stories and a large number of downloadable resources on the languages.
Oceanic & Polynesian Languages
- Polynesian Lexicon Project Online.
- Glossika Courses:
Fijian Fluency Training (coming soon)
Māori Fluency Training (coming soon)
Samoan Fluency Training (coming soon)
Caucasian Languages: Northeast, Northwest, Kartvelian
- Glossika Courses:
Abkhaz Fluency Training (coming soon)
Adyghe Fluency Training (coming soon)
Chechen Fluency Training (coming soon)
Georgian Fluency Training
Kabardian Fluency Training (coming soon)
Dravidian Language Family
- Glossika Courses:
Kannada Fluency Training (coming soon)
Malayalam Fluency Training (coming soon)
Tamil Fluency Training (coming soon)
Telugu Fluency Training (coming soon)
Indo-European Language Family
- Early European Online University of Texas: covers all European language grammars, old and modern
- European Parliament Parallel Corpus of European languages.
- Indo-European Lexical Cognacy Database.
- Indo-Wordnet.
- PIE : The Primary Phoneme Inventory and Sound Law System for Proto-Indo-European.
- PIE Lexikon.
Armenian Branch
- Glossika Courses: Armenian Fluency Training
Balto-Slavic Branch Languages
- False Friends of the Slavist.
- Glossika Courses:
Belarusian Fluency Training
Bosnian Fluency Training (coming soon)
Bulgarian Fluency Training (coming soon)
Czech Fluency Training
Latvian Fluency Training
Lithuanian Fluency Training
Macedonian Fluency Training (coming soon)
Montenegrin Fluency Training (coming soon)
Polish Fluency Training
Russian Fluency Training
Rusyn Fluency Training (coming soon)
Serbian Fluency Training
Slovene Fluency Training
Slovak Fluency Training
Ukrainian Fluency Training
Introduction to Lithuanian Ebook. - Polish Frequency Database based on 101 million words from subtitles.
- Po polsku: wolne książki "audiobooki" Over 4000 books available including a few in Lithuanian (search: lietuvių).
- Slovak Parallel Corpus (includes Czech).
- Словарь трудных слов из богослужения: Церковнославяно-русские паронимы.
- Толковый словарь русского языка.
Germanic Branch Languages
- Dutch Word Frequencies based on 44 million words from film and television subtitles.
- Glossika Courses:
Afrikaans Fluency Training (coming soon)
Danish Fluency Training
Faroese Fluency Training (coming soon)
Frisian Fluency Training (coming soon)
German Fluency Training
Icelandic Fluency Training
Norwegian Fluency Training
Swedish Fluency Training - Wordnet a lexical database for English Princeton University.
Greek Branch and Albanian Branch
- Glossika Courses:
Albanian Fluency Training (coming soon)
Greek (Modern) Fluency Training - Greek (Modern) Frequency Database based on 6000 subtitled films.
Indo-Iranian Branch Languages
- Glossika Courses:
Assamese Fluency Training (coming soon)
Bengali Fluency Training
Dari Fluency Training (coming soon)
Divehi Fluency Training (coming soon)
Gujarati Fluency Training (coming soon)
Hindi Fluency Training
Marathi Fluency Training (coming soon)
Nepali Fluency Training (coming soon)
Odia Fluency Training (coming soon)
Pashto Fluency Training (coming soon)
Persian Fluency Training
Sinhalese Fluency Training (coming soon)
Sorani Kurdish Fluency Training
Tajiki Fluency Training (coming soon)
Urdu Fluency Training (coming soon) - Hindi-Wordnet.
- Romani Morpho-Syntax Database.
- Sanskrit Corpus.
Italic/Romance Branch Languages
- French: Casual Spoken contains 35 hours of high-quality recordings featuring 46 French speakers conversing among friends.
- Glossika Courses:
Catalan Fluency Training
French Fluency Training
Galician Fluency Training (coming soon)
Italian Fluency Training
Mexican Spanish Fluency Training
Portuguese Fluency Training
Portuguese Brazilian Fluency Training
Romanian Fluency Training (coming soon)
Spanish Fluency Training - Latin Library.
Non-European Languages
- Glossika Courses:
Basque Fluency Training (coming soon)
Language Description
- About World Languages Love the layout and data here. Introductory material on individual languages and language families.
- Ethnologue contains information on 7,097 known living languages including: alternate names, population, location, language maps, language status, classification, dialect names, language use, language development, writing systems employed. It has links to research. Each language page also points to OLAC: Open Language Achives Community resource and research articles.
- Glottolog: a comprehensive catalogue of the world's language families, languages and dialects.
- Hawai`i University ScholarSpace: a sample search of documents using the query "grammar".
- Language Gulper A blog talking about language families.
- Linguasphere: Le Répertoire de la linguasphère: comporte la classification géolinguistique, le codage et l’index alphabétique de l’ensemble des langues, variétés et groupes linguistiques du monde.
- Multitree: MultiTree is a searchable database of hypotheses on language relationships.
Machine Translation
- Apertium includes: Asturian, Aragonese, Breton, Northern Sami, Occitan, Tatar.
- Bing includes: Hmong Daw, Klingon, Otomi, Yucatec Maya.
- Deepl extremely high quality AI-driven translations. Supports Spanish, French, Italian, German, Dutch, Polish.
- Google recently added: Amharic, Corsican, Frisian, Kyrgyz, Hawai`ian, Kurmanji, Luxembourgish, Samoan, Scots Gaelic, Shona, Sindhi, Pashto, Xhosa.
- PROMPT very good for Russian.
Maps, Media, Fun
- Books and Novels in many languages This is perfect for extensive and intensive language practice.
- Langscape is a gateway to language diversity. It is a resource that allow users with a wide array of interests, from recreational to academic, to discover the world’s languages via interactive tools and access to established research. Sample and listen to over 3000 languages here.
- Sound Comparisons Compare recordings of key vocabulary within a language branch (Romance, Slavic, etc) spread out on an interactive map.
- Language Landscape Various language samples and recordings from around the world on an interactive map. The site is to help raise the profile of minority and endangered languages.
- Subtitles in multiple languages: open.
- Subtitles in multiple languages: yify.
- Watch TV in any language.
- Generate a name in any language.
Mon-Khmer Language Family
- Hán Việt Từ Điển Trích Dẫn.
- Mon Dictionary Resources.
- Mon-Khmer Languages Project. See also Huffman Papers which includes Huffman's Outline of Cambodian Grammar.
- Vietnamese Dictionary Resources.
- Glossika Courses:
Khmer Fluency Training (coming soon)
Northern Vietnamese Fluency Training
Southern Vietnamese Fluency Training
Phonology
- Derivational Phonology: MA Thesis
- Glossika Phonics Videos Features a video for each of the IPA symbols.
- Lyon-Albuquerque Phonological Systems Database.
- PHOIBLE: Repository of cross-linguistic phonological inventory data. The 2014 edition includes 2155 inventories that contain 2160 segment types found in 1672 distinct languages.
- Stress and Accent Patterns Typological database with stress and accent patterns 750 languages.
- Tonal Database.
- UCLA Phonological Segment Inventory Data (UPSID) Contains phonological inventories for 451 languages.
- UD Phonology Lab Stress Pattern Database Dominant stress patterns of the world's languages.
- World Phonotactics Database at Australian National University: a searchable database containing information about phonotactic restrictions of languages of the world. Using it, you can compare and contrast phonotactic patterns in different languages, group languages by features, investigate the frequencies of different settings for different features, and view the areal distribution of such patterns through the use of the interactive map. Phonotactic data on over 2000 languages.
Reading Tools
- Метод чтения Ильи Франка Glossed readers in Russian for over 50 languages.
Pidgins, Creoles, Other Languages
- The Atlas of Pidgin and Creole Language Structures Online.
- Endangered Languages Archives.
- Teaching resources for less commonly taught languages.
Semantics, Corpus, Etymology
- Affix Borrowing Database A database of 101 languages where affixes have been borrowed.
- Automated Similarity Judgment Program (ASJP) 40-item word lists of all the world's languages. A lexical distance can be obtained by comparing the word lists, which is useful, for instance, for classifying a language group and for inferring its age of divergence.
- Concepticon Links 9611 concepts from 51 different concept lists to 2206 different concept sets, 243 relations between concepts are defined.
- Cross-Linguistic Colexification Database Gives polysemy information for 221 different languages covering 64 families (more than 300000 words and 10000 concepts).
- DFG Project Algorithmic corpus-based approaches to typological comparison Large-scale linguistic typological comparison: The Bible corpus contains 1169 unique translations, which have been assigned 906 different ISO-639-3 codes.
- Global Lexicostatistical Database George is carrying on the work of his late father Sergey Starostin, famous proponent of macrofamilies and deep etymological work. Contains a lot of data on Sub-Saharan languages (including Bantu and Khoisan), Nilo-Saharan, Caucasian, and Amerind languages. However, many of the deep etymologies are not widely accepted by the scientific community.
- Google N-Gram Viewer.
- Korean corpus.
- List of dictionaries available online to over 6300 dictionary resources.
- NLTK Corpora.
- Numerals in 4000 languages.
- Open Parallel Corpus.
- Personal Pronoun System database.
- Reduplication Database.
- Semantic shifts in the languages of the world (database): thousands of semantic connections in the world's languages (polysemy, semantic changes).
- Unicode's Universal Declaration of Human Rights in multiple languages.
- Wordbank Children's vocabulary development/acquisition.
- World Loanword Database WOLD.
- Wortschatz Universität Leipzig Suche in 246 korpusbasierten monolingualen Wörterbüchern in 222 Sprachen.
Sino-Tibetan Language Family
- Burmese Dictionary Resources.
- Chinese Word Frequency based on film subtitles (download frequency lists).
- Glossika Courses:
Beijing Mandarin Fluency Training
Burmese Fluency Training (coming soon)
Cantonese Fluency Training
Eastern Min (Fuzhou) Fluency Training (coming soon)
Hakka Fluency Training
Hunanese (Xiang) Fluency Training (coming soon)
Shanghainese Fluency Training (coming soon)
Sichuanese Fluency Training (coming soon)
Taiwan Mandarin Fluency Training
Taiwanese Hokkien Fluency Training
Teochew Hokkien Fluency Training (coming soon)
Tibetan Fluency Training (coming soon)
Wenzhounese Dialect Fluency Training - Phonemica Listen to hundreds of Chinese dialects through this interactive map individuals from around China upload their own recordings. Some phonetic transcriptions and glosses are available.
- Sgaw Karen Dictionary Resources.
- Sinica Treebank.
- Taiwanese and Hakka Dictionary.
- Thesaurus Linguae Sericae An Historical and Comparative Encyclopaedia of Chinese Conceptual Schemes.
- Tibeto-Burman languages of Assam.
Syntax
- Anaphora Typology Database by Utrecht Institute of Linguistics.
- Irvine Phonotactic Online Dictionary (IPhOD).
- Get Started in Role and Reference Grammar
- Syntactic Structures of the World's Languages.
- ValPal: Valency patterns Leipzig Online Database: based on a database questionnaire for a selected sample of 80 verbs. These verbs are conceived of as representative of the verbal lexicon and have been reported in the literature to show distinctive syntactic behaviour both within and across languages.
Tai-Kadai Language Family
- Lao Dictionary Resources.
- Shan Dictionary Resources.
- Glossika Courses:
Lao Fluency Training (coming soon)
Tai Le Fluency Training (coming soon)
Thai Fluency Training - Thai Language Dictionary.
Transcription Tools
- Add stress marks to German
- Add stress marks to Russian
- Ishida Unicode code converter Transcribes into multiple output formats simultaneously: HTML, Javascript, CSS, URI, Hex, etc.
- Unicode Character Table
Trans-New Guinea Language Family
- Database of the languages of New Guinea: The Trans-New Guinea language family currently occupies most of the interior of New Guinea. This family is possibly the third largest in the world with 400 languages and is tentatively thought to have originated with root-crop agriculture around 10,000 years ago.
Typology
- Language Universals Archive at Universität Konstanz.
- Pangloss Database of audio materials from several of the world's languages.
- Rarities among languages: Das Grammatische Raritätenkabinett at Universität Konstanz.
- Reciprocal Markers Database.
- Typological Database.
- Typological Database of Intensifiers and Reflexives.
- World Atlas of Language Structures (WALS).
- WALS Sunburst Explorer shows the values for all WALS features by combining the geolocation of the respective languages with their genealogy in a sunburst visualization: to help users distinguish between cases of language contact and genealogical inheritance.
Uralic Language Family
- Glossika Courses:
Estonian Fluency Training
Finnish Fluency Training
Hungarian Fluency Training
Northern Sami Fluency Training (coming soon)