Treebanks, as special corpora annotated with syntactic structures, play a crucial role in the recent success of natural language processing applications like speech recognition, spoken language systems, parsing and machine translation. In the first phase of my thesis, we introduce a bootstrapping method to creating kind of Treebank, named XTAG-Treebank. Regarding the development methods of the treebanks, generally, they can be placed in either manually crafted or automatically extracted tree-banks. Due to the large number of sentences, the manual creation of the Treebanks can be very expensive and time consuming. The difficulties, raised in the manual creation of Treebanks, led the researchers to use automatic and semi-automatic methods of treebank development methods. On the other hand, the automatically extracted Treebanks are not as accurate as manual versions.
In the first phase of my thesis we introduce a fully-automated bootstrap-inspired method for developing
Treebank based on a complex and rich linguistically motivated structure called Supertag. To this end, a
hybrid method of supertagging was proposed that combines both of the generative and discriminative
methods of Supertagging.
In the second phase, we try to correct the errors occurring in the treebank automatically generated from the first phase. One of the methods that used for error correction is classification.