Lexical Database of Pakistani Regional Languages
Pakistan is a multilingual country where different languages are being spoken by its people. So for the cross lingual information processing there should be some centralized repository of words, usually known as a lexical database. This is the main motivation behind this research. The lexical databases for the developed languages are already built. But less attention is given to the less or under resourced languages like Punjabi, Saraiki etc.
Natural language processing (NLP) or natural language engineering has many tasks such as word sense disambiguation (WSD), machine translation (MT) and part of speech tagging (POST) and such others. All these tasks need large scale lexical databases . So there is a rich need to develop such resources usually known as Machine Readable Dictionary (MRD), Lexicon or lexical database (LDB).
A lexical database stores all related information about a language that can be attached with a structural unit of that language. These structural units might be a word, a morpheme, or even a whole sentence . LDB is an essential part of natural language processing system. It stores the lexical and semantic information for example pronunciation, part of speech tags, definition, example sentences, glosses, synonymy, hyponymy etc. for the word of a language. English WordNet developed by Princeton University is a best example of lexical database. This work will include design and construction of such a database.
This thesis describes the different approaches adopted for the construction of lexical database of different languages in the world and methodology followed for the construction of proposed system. A web interface is provided for the updation and query based results retrieval of database entries.
TABLE OF CONTENTS
CHAPTER 1: INTRODUCTION
1.1 NATURAL LANGUAGE PROCESSING (NLP)
1.1.1 NLP TASKS.
1.1.2 NLP LEVELS.
1.2 LEXICAL DATABASE (LDB).
1.3 DIFFERENCE BETWEEN TRADITIONAL DATABASE AND LDB
1.4 NEED OF LEXICAL DATABASE
1.5 ROLE OF WORDNET / LEXICAL DATABASE IN NLP
1.6 BASIC STORAGE ENTITY OF LDB
1.7 STANDARD LEXICON MODELS
1.8 PROPOSED LDB CREATION PROCESS
CHAPTER 2: LITERATURE REVIEW
2.1 LANGUAGES OF PAKISTAN
2.2 LEXICAL DATABASE / WORDNET PRINCIPLE
2.2.1 LEXICAL MATRIX
2.2.2 SYNSET AND SENSE
2.3 RELATIONS IN WORDNET
2.3.1 SEMANTIC RELATIONS
2.3.2 LEXICAL RELATIONS
2.4 IMPLEMENTED LDB RELATIONS
2.5 EXISTING LEXICAL DATABASES
2.5.1 English WordNet
2.5.5 Kannada WordNet
CHAPTER 3: PROBLEM STATEMENT
3.1 MOTIATION AND SCOPE BEHIND THE STUDY
3.2 Problem Statement
3.3 Previous Work
3.4 Research Objectives
CHAPTER 4: DESIGN AND DEVELOPMENT OF PUNJABI LEXICAL DATABASE
4.1 SYNSETS CREATION
4.2 BACKEND (DATA-ENTRY INTERFACE)
CHAPTER 5: EXPERIMENTAL RESULTS
5.1 FRONTEND (USER WEB INTERFACE)
5.2 PROBLEMS DETECTED
CHAPTER 6: CONCLUSION
6.2 FUTURE SCOPE
6.2.1 IMPLEMENTING MISSING RELATIONS
6.2.2 EMBEDDING MORPHOLOGICAL MODEL
6.2.3 DIFFERENT NLP APPLICATION
6.2.4 LANGUAGE TEACHING TOOLS