In The Name Of God

Special Topic in Database

Professor: Doctor Rahgozar

Student: Mostafa Haghir Chgehreghani

List of contents:

Presentation 1: In this presentation I have tried to explain some basic ideas in data warehouse. some topics that have expressed in this presentation are: why we need to data warehouse, properties of data warehouse, data warehouse architectures, data transformation and its stages and some of tools that used in these stages, data cleansing and its stages and tools, data mart, OLAP, multi dimensional data, Data Mining , ...

 

Presentation 2: In this presentation I have tried to explain some research topic in data warehouse. I have discussed in this presentation about Lineage Tracing, Automed, IQL and using IQL in Automed, Incremental view maintenance, Indexing in data warehouse and Bitmap indexing, Data quality in data warehouse and a model in it (GQM) ,...  

 

Presentation 3: In this presentation I have tried to explain one tool that use in Data mining: Bayes reasoning. This tool is based on Bayes formula and is used in classification, clustering and regression. In this presentation I discussed about Bayes theorem and its roles, prior and post probability in data mining, maximum a posteriori hypothesis, maximum likelihood hypothesis, The Neyman-Pearson lemma, Learning a Real Valued Function, Minimum Description Length Principle, Bayes Optimal Classifier, Gibbs Classifier, Naive Bayes Classifier, Inference in Bayesian Networks, Learning Bayes Nets, EM algorithm ....

 

Report 1: In this report, I had a survey on data warehouse. Some of topics in this report are: definition f data warehouse and why we need to a data warehouse, the properties of data warehouse, differences between data warehouse and data mining, data warehouse architectures (2-layer architecture, 3-layer architecture, dolean architecture), main stages in constructing a data warehouse e.g. capture, scrub (data cleansing), transform, load and index, tools for each of these stages, data mart  and its types, meta data and its roles, OLAP, multi dimensional data, data schemas, star schema, snowflake schema, Fact Constellations, operation on multidimensional data, drill-up, dill down, data mining, association rules, clustering, classification, regression, data visualization, ...        

 

Report 2: In this report, I have tried to explain  Bayes reasoning. Some of topics in this report are: importance of bayes reasoning, bayes theorem, Maximum a posteriori (MAP), Maximum  Likelihood (ML), Maximum Likelihood, The Neyman-Pearson lemma, Learning a Real Valued Function, Minimum Description Length Principle, Bayes Optimal Classifier, Gibbs Classifier, Naive Bayes Classifier, Inference in Bayesian Networks, Learning Bayes Nets, EM algorithm and its two steps, Derivation of the K-Means Algorithm, ....

 

Report 3: This is report from my work. I have implemented Naive Bayes classifier and EM algorithm and compared them from different aspects. In this study we have tried to compare EM algorithm and Naïve Bayes classifier by detecting their weaknesses and present good solutions for them. Some of these weaknesses expressed in previous works such as initializing the parameters in Naïve Bayes classifier. In these situations I have tried to present a more efficient solution than before works. Some of these weaknesses expressed for first time in this paper such as the problem of means equality of two or more clusters.

 

Source code: I have implemented EM algorithm and Naive Bayes classifier and an algorithm between them. The description of third algorithm is in report 3. In my implementation of these three algorithms, the .Net technology has been used. Thus for executing the program, .Net Framework must be installed. This implementation has four main classes: Initial, NaiveBayesClassifier, K-Mean, and Normal. In Initial Class, premier operations such connecting to database and converting the data to a two dimensional array and etc., are done. Each of the NaiveBayesClassifier class and the K-Mean class and Normal class implement the related algorithm. The main function of these two classes is their constructors.

 

Paper: This paper is the result of our implementations and researches.