# Domain-oriented Data-driven Data Mining

Abstract. Recent developments in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gather and store incredible vol- umes of data. It creates unprecedented opportunities for knowledge discovery large-scale database. Data mining technology is a useful tool for this task. It is anemerging area of computational in- telligence that offers new theories, techniques, and toolsfor processing large volumes of data, such as data analysis, decision making, etc. There are countlessresearchers working on designing effi- cient data mining techniques, methods, and algorithms. Unfortunately, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? What is the rule we would obey in a data mining process? What is the relationship between the prior knowledge of domain experts and the knowledge mind from data? In this paper, we will address these basic issues of data mining from the viewpoint of informatics[1]. Data is taken as a manmade format for encoding knowledge about the natural world. We take data mining as a process of knowledge transformation. A domain-oriented data-driven data mining (3DM) model based on a conceptual data mining model is proposed. Some data-driven data mining algorithms are also proposed to show the validity of this model, e.g., the data-driven default rule generation algorithm, data-driven decision tree pre-pruning algorithm and data-driven knowledge acquisition from concept lattice.

Keywords: Domain-oriented, Data-driven, Data Mining

∗Address for correspondence: Institute of Computer Scienceand Technology, Chongqing University, of Posts and Telecom- munications, Chongqing, 400065, P.R.China Also works: School of Information Science & Technology, Southwest Jiaotong, University, Chengdu, 610031, P.R.China. †This paper is partially supported by National Natural Science Foundation of P. R. China under Grants No.60573068 and No.60773113, Natural Science Foundation of Chongqing under Grants No.2008BA2017 and No.2008BA2041 ‡Also works: College of Computer and Communication, LanzhouUniversity of Technolegy, Lanzhou, 730050, P.R.China

396 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

1. Introduction

Recent developments in computing, communications, digital storage technologies, and high-throughput data-acquisition technologies, make it possible to gatherand store incredible volumes of data. It creates unprecedented opportunities for knowledge discovery formlarge-scale database. Data mining technol- ogy is a useful tool for this problem.

Data mining, as a relatively new branch of computer science,has got much attention in recent years. It is motivated by our desire of obtaining knowledge from huge[2]. It uses machine learning, statistical and visualization techniques to discover knowledge from data and represent it in a form that is easily comprehensible and useable for humans. Data mining has become a hot field in artificial intelligence.

Data mining is an interdisciplinary field. Many data mining methods are based on the extensions, combinations, and adaptation of machine learning algorithms, statistical methods, knowledge extraction and abstraction. During the past twenty years, many techniques are used in data mining such as artificial neural network, fuzzy set, rough set, decision tree, genetic algorithm, nearest neighbor method, statistics based rule induction, linear regression and linear predictive coding, et al. There are many views for the study of data mining. The vast existing studies of data mining can be classified roughly into three views[2]. The first view is the function-oriented view. It defines data mining as ”the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns from data”. The function-oriented approaches put forth efforts on searching, mining and utilizing the functionalities of different patterns embedded in various databases. The second view is the theory-oriented view. It fixes the attention on the theoretical aspects of data mining, andalso the related disciplines. The third view is the procedure-oriented view. It pays attention to making the processes of mining become effective and efficient. In other words, its objective is to speed up the performance of algorithms.

Regardless of which view is adopted in the process of the study of data mining, most data mining researchers pay much attention to technique problems for developing data mining models and methods, while little to basic issues of data mining processes. In other words, it is little to internal information processing mechanisms of data mining processes. What is data mining? What is the product of a data mining process? What are we doing in a data mining process? What is the rule we would obey in a data mining process? What is the relationship between the prior knowledge of domain experts and the knowledge mind from data?

To answer the above questions, we need to study the basic mining process. At present, a few results about these questions have been reported. A three-layered conceptual framework is proposed by Yao[3]. It consists of the philosophy layer, the technique layer, and the application layer. The layer framework represents the understanding, discovery, and utilizationof knowledge respectively. The philosophy layer investigates the essentials of knowledge. There are many related issues to this question, such as the representation of knowledge, the expression and communication of knowledge in languages. Peng et al propose a systemic framework for the field of data mining and knowledge discovery[4]. Its objective is to identify the research areas of data mining and knowledge discovery. S.Ohsuga[5] consider data mining technology from the viewpoint of knowledge acquisition is atranslation from non-symbolic to symbolic representation. A relation between symbolic processing and non-symbolic processing is discussed. In addition, international workshops on foundation of data mining were also held[6, 7, 8]. Unfortunately, there is still no well-accepted and non-controversial answer to many basic questions mentioned above. In

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 397

this paper, we will address these questions and propose our answers based on a conceptual data mining model. Our answer would be ”data mining is a process of knowledge transformation”. According to this understanding, a domain-oriented data-driven data mining (3DM) model is proposed. Some recent achievements of our work on data-driven data mining techniques are also presented to show the validity of the 3DM model.

The rest of the paper is organized as follows. In section 2, a model of domain-oriented data-driven data mining is proposed. The problem of knowledge uncertainty measurement is discussed in section 3. In section 4, some data-driven knowledge acquisition methods are presented. Experiment results are discussed in the section 5. At last, in section 6, we concludethis paper.

2. A Model of Domain-oriented Data-driven Data Mining

2.1. Data-driven Data Mining

Data mining is defined as ”the nontrivial extraction of implicit, previously unknown,and potentially useful knowledge from data”[9].Knowledge exists everywhere and it is very important for our daily life and work. Knowledge could be expressed in many different forms. There are many forms for encoding knowledge. The easiest form might be such symbolic forms as formula, equation, rule, and theorem. It is very easy for people to understand and use knowledge encoded in these forms. These forms are often used in books, documents, and even expert systems. Data is also a man-made form for encoding knowledge. There are numerals data records generated in many fields. Many natural phenomenon, rules, and even human experience are record into databases everyday. Much useful information could be concluded from data. Unfortunately, people could not read,understand, or use the knowledge expressed in data. So, we think, in a data mining process, knowledge aretransformed from a data form, which is not understandable for human, into another understandablesymbolic form like rule, formula, theorem, etc. No new knowledge will be generated in a data mining process. That is, we are just transforming knowledge from one form into another while not producing newknowledge.

To understand the knowledge transformation process of datamining, we’d better have a look at the knowledge transformation process between different systems at first.

The knowledge transformation process between different systems could be completed in many dif- ferent ways. Reading and understanding are simple ways to transform knowledge from symbolic form into biological neural link form, while speaking and writing are two converse processes to transform knowledge from biological neural link form into symbolic form. Translating a book in one language into another is also a process of knowledge transformation from one symbolic form into another symbolic form. We are learning from and studying on the natural world everyday. This is a process of knowledge transformation from natural phenomenon form into biological neural link form. People exchange their knowledge through spoken language and body language also. In a data mining process, we are trans- forming knowledge from data form into symbolic form. Data could also be taken as a measure result of the natural real world. Thus, there are many channels and ways for knowledge transformation between different systems. Fig. 1 is an illustration for such knowledge transformation processes.

398 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Figure 1. Knowledge transformation among different forms

Figure 2. Translating a book in English into Chinese

From Fig. 1, one can find that data mining is a kind of knowledgetransformation process to transform knowledge from data form into symbolic form. Thus, no new knowledge will be generated in a data mining process. In a data mining process, knowledge is just transformed from data form, which is not understandable for human, into symbolic form, which is understandable for human and easy for application. It is similar to the process of translating a book from English into Chinese. In this translation process, the knowledge in the book itself should remain unchanged. What will be changed is just the coding form (language) of the knowledge. That is, the knowledge of the Chinese book should be the same as the knowledge in the English one. Fig. 2 is an illustrationfor this case. Following this understanding of data mining, we could have the knowledge transformation framework for data mining as shown in Fig. 3.

From Fig. 3 , one can find that knowledge could be encoded into natural form, data form, symbolic form, and neural link form. That is, knowledge could be stored in a natural world system, a data system, a symbol system, or a biological neural network system. The knowledge expressed in each form should have some properties, that is, Pi’s. There should be some relationship between the different forms of the same knowledge. In order to keep the knowledge unchangedin a data mining process, properties of the knowledge should remain unchanged during the knowledgetransformation process. Otherwise, there should be some mistake in the knowledge transformation process.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 399

Figure 3. Knowledge transformation framework for data mining

The relationship between knowledge in natural form and dataform, natural form and neural link form, symbolic form and neural link form is omitted in Fig. 3.It is just like the relationship between the knowledge in data form and symbolic form.

In a data mining process, the properties of knowledge in the data form should remain unchanged. This information could provide some guideline for designing data mining algorithms. It would also be helpful for us to keep the knowledge in the data form unchanged in a data mining process. Unfortunately, the representation of knowledge is still an unsolved problem inartificial intelligence. We do not know all the properties of knowledge. It is still not known how much properties are enough or needed for knowledge representation. So, how could we keep the knowledge unchanged in a data mining process? Fortunately, we know some properties of knowledge representation, for example, uncertainty of knowledge. These properties should not be changed in a data mining process in order to keep the knowledge unchanged. Thus, in order to keep the knowledge unchanged in a data mining process, we need to know some properties of the knowledge in data form, and use it to control the data mining process and keep it unchanged. This is the key idea of a data-driven data mining model. There would be three steps for designing a data-driven data mining method.

Step 1. Select a property of knowledge which could be measured in both the data form and the symbolic form for encoding knowledge generated from data.

Step 2. Measure the property of the knowledge in the data formand the symbolic form. Step 3. Use the property to control the data mining process and keep knowledge unchanged. The knowledge property is measured in two different systems, data system and symbolic system.

There might be a problem. Is the measured result of the knowledge property in data form comparable to the result from a symbolic form? If not, how could we know whether it is unchanged in the data mining process? So, we need to design a comparable measuring methodfor the selected property. That is, we need to establish some relationship between the knowledge property in data form and symbolic form.

2.2. User-driven (Domain-driven) Data Mining

Many real world data mining tasks, for instance financial data mining in capital markets, are highly constraint-based and domain-oriented. Thus, it targets actionable knowledge discovery, which can afford important grounds for performing appropriate actions. Some domain-driven or user-driven data mining methods for such tasks have also been developed in recent years[10, 11, 12, 13, 14, 15, 16].

400 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Zhang, Cao and Lin proposed the domain-driven in-depth pattern discovery framework shown in Fig. 4 for financial data mining in capital markets [10, 11].

Figure 4. Domain-driven in-depth pattern discovery framework

Their key ideas are as follows.

1. Dealing with constraint-based context.

• Data constraints. • Domain constraints. • Interestingness constraints. • Rule constraints.

2. Mining in-depth patterns.

• In-depth patterns are highly interesting and actionable patterns in business decision-making. • In-depth patterns are not only interesting to data miners, but also to business decision-makers.

Actionable trading strategies can be found via model refinement or parameter tuning.

3. Supporting human-machine-cooperated interactive knowledge discovery.

• The in-depth pattern discovery is conducted under the cooperation of business analysts and data analysts.

4. Viewing data mining as a loop-closed iterative refinementprocess.

• The data mining process is closed with iterative refinement and feedbacks of hypotheses, features, models, evaluation and explanations in the human-involved or centered context.

Yao and Zhao proposed an interactive user-driven classification method using a granule network also[12]. The key ideas of their method are:

1. It allows users to suggest preferred classifiers and structures.

2. It is an interactive manner between users and machines.

3. Its input and output are interleaved, like a conversation.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 401

Figure 5. Users access different knowledge from a data form knowledge base

4. A user can freely explore the dataset according to his/herpreference and priority, ensure that each classification stage and the corresponding results are all understandable and comprehensible.

Kuntz, Guillet, Lehn, and Briand also developed a human-centered process for discovering association rules where the user is considered as a heuristic which drives the mining algorithms via a well-adapted interface[13]. Han and Lakshmanan integrated both constraint-based and multidimensional mining into one framework that provided an interactive, exploratory environment for effective and efficient data analysis and mining[14]. For creating lexical knowledge bases, Patrick, Palko, Munro and Zappavigna proposed a semi-automatic approach that exploits trainingfrom a knowledgeable user to identify struc- tural elements in the dictionary’s stream of text. Once learnt from the user the structures are then applied automatically to other text streams in the same document or to other documents[15]. In semantic im- age classification, Dorado, Pedrycz and Izquierdo used somedomain knowledge about the classification problem as part of the training procedures[16].

Through analyzing the above user-driven or domain-driven data mining methods, we find that there are some common basic ideas in these methods.

1. A user-driven data mining process is constraint based. 2. User’s interests are considered in a user-driven data mining process. 3. Prior knowledge of domain experts is required in a user-driven data mining process. 4. Interaction between user and machine is required in a user-driven data mining process.

2.3. Domain-oriented Data-driven Data Mining (3DM)

Is there any confliction between data-driven data mining anduser-driven (or domain-driven) data mining? Could they be integrated into one system? We will discuss about this problem in this section.

In a database management system (DBMS), different users could access different data of a whole database system from their own view. If we take data as a form of knowledge representation, a database (data set) could be also taken as a knowledge base. So, different user could find and use different subset of the whole knowledge base for his/her task. That is, through his/her view, a user could access a subset of knowledge in the data form and transform it from data form into another form he/she required. The knowledge transformation process for each user could stillbe done in a data-driven manner. Fig. 5 is an illustration of this understanding.

402 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Figure 6. User’s interests, constraint, and prior domain knowledge are all taken as input of a data mining process (Domain-oriented data-driven data mining, 3DM)

In a domain-driven data mining process, user’s interesting, constraint, and prior domain knowledge are very important. An interaction between user and machineis needed. The data mining process might be controlled by a user. In this case, the knowledge source ofthis mining process includes data and the user, while not just data. So, the prior domain knowledge is also a source for the data mining process. The control of a user to the data mining process could be takenas additional input of the data mining process. It is just like the data generation process in an incremental dynamic data mining process. So, we may also deal with the user’s control using incremental data-driven data mining methods. Fig. 6 is an illustration of this idea.

From the discussion above, we know that the domain-driven data mining does not conflict with the data-driven data mining. They could be integrated into one system. There are still a lot of works to be done to implement such a domain-oriented data-driven data mining process.

1. Design a form for encoding prior domain knowledge.

2. Design a form for encoding user’s interest and constraintfor a specific task.

3. Design a form for encoding user’s control.

4. Design an incremental data-driven data mining method that could take data, prior domain knowl- edge, user’s interest, user’s constraint, user’s control together as its inputs. Here, initial data, prior domain knowledge, user’s interest, and the constraint for aspecific task could be taken as static inputs of a 3DM system, while incremental data and user’s dynamic control as its dynamic inputs.

So, there would be a long way to implement domain-oriented data-driven data mining.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 403

3. Measuring of Knowledge Uncertainty

Knowledge uncertainty is one of important intrinsic properties of knowledge. According to the data- driven data mining model proposed in the last section, the uncertainty of knowledge should not be changed in a data mining process. Consequently, through measuring the knowledge uncertainty in data form and symbolic form, some relationship should be established between the knowledge uncertainties in the two forms. This relationship could be used to control adata mining process and keep knowledge unchanged.

Knowledge uncertainty affects not only data mining process, but also the performance of mined knowledge[17].Many researchers presented different methods for knowledge uncertainty. Professor Skowron presented a rough set framework for data mining of propositional default rules. Through setting a threshold of knowledge uncertainty and the decision rules can be generated under uncertain condition[18]. The threshold may be not correct or enough ifwe have not any prior knowledge about the domain problem to be studied. It will result in mined knowledge can’t satisfy requirement of user or reflect internal structure of data sets. Thus, it is necessary to effectively measure and handle knowledge uncertainty to find correct threshold control the process ofgenerating rules. Consequently, maximum correct rate will be got in the testing process of rule sets. In fact, several theoretical models and mathe- matical tools, such as probability theory, evidence theory, fuzzy set, and vague set etc, have already been applied in solving this problem.

Rough set theory is an important tool for processing uncertain information. Based on this theory, we proposed several methods to measure knowledge uncertainty in both data form (decision tables) and symbolic form (decision rule)[19, 20]. Through comparing the knowledge uncertainty of a decision table and the decision rules generated from it, we find that the minimal local certainty of a decision table could be used to control the process of generating rules from it.

3.1. Basic Concepts

Some basic concepts about rough set theory are introduced atfirst, and then the knowledge uncertainty based on rough set is discussed in this section.

Def.1 (Decision table)A decision table is a formal representation of a data set to be analyzed. It is defined as a pairS = (U,R, V, f),whereU is a finite set of objects andR = C

⋃ D is a finite

set of attributes,C is the condition attribute set andD = {d} is the decision attribute set. With every attributea ∈ R, set of its valuesVa is associated. Each attributea determines a functionfa : U −→ Va.

Def.2 (Indiscernibility relation)[21] Given a decision tableS,R = C ⋃ D is a finite set of attributes.

Each subset of attributesB ⊆ R determines a binary indiscernibility relationIND(B),as follows:

IND(B) = {x, y|(x, y) ∈ U × U,∀b ∈ B(b(x) = b(y))}

Def.3 (Condition class and decision class) Given a decision tableS, C andD are its condition at- tribute set and decision attribute set respectively. Condition classesEis are defined asEi = U/IND(C), where,i = 1, …,m, andm = |U/IND(C)|. Decision classesXjs are defined asXj ∈ U/IND(D), where,j = 1, …n, andn = |U/IND(D).

Def.4 (Consistency and inconsistency of a condition class) Givena decision tableS,C is its condition attribute set.A condition classEi is called consistent if and only if its all objects have a samedecision value. Otherwise, it is called inconsistent.

404 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Def.5 (Certain decision and uncertain decision table)A decisiontableS,is called certain if and only if its condition classes are all consistent. Otherwise, it is uncertain.

Eg.1. In Table 1,a ,b andc are its condition attributes, andd is its decision attribute. There are 5 condition classes, that isE1, E2, E3, E4 andE5.Its decision classes areX1,X2,X3 andX4.The condi- tion classE5 is further divided into two classes by its decision attributed,that is,E5,1 andE5,2.Thus, this decision table is an uncertain one.

Table 1. A decision table

U a b c d X

E1 1 2 3 1(50x) X1 E2 1 2 1 2(5x)

E3 2 2 3 2(30x) X2 E4 2 3 3 2(10x)

E5,1 3 5 1 3(4x) X3 E5,2 3 5 1 4(1x) X4

3.2. Measuring Method for Knowledge Uncertainty

Given a decision tableS = (U,R, V, f), whereR = C ⋃ D,C is its condition attribute set andD =

{d} is its decision attribute set. Its condition classes areEis, whereEi = U/IND(C), and i = 1, …,m,m = |U/IND(C)|. Its decision classes areXjs, whereXj ∈ U/IND(D), andj = 1, …n, n = |U/IND(D)|. There must exist a setTi for each condition classEi, andTi = max{Ei

⋂ Xj |Xj ∈

U/IND(D)} Def.6 (Global certainty and uncertainty of a decision table) Given a decision tableS = (U,R, V, f),

E1, …, Ei, …, Em are its all condition classes, then its global certaintyµc is defined asµc = ∑m

i=1|Ti|/|U |, and its global uncertaintyµuc = 1 −

∑m i=1 |Ti|/|U |.

Def.7 (Certainty of a condition class)[22]Given a decision tableS = (U,R, V, f),whereR = C

⋃ D,C is its condition attribute set andD = {d} is its decision attribute set.The certainty of its

condition classEi = U/IND(C),i = 1, …,m is defined asκ(Ei) = max{|Ei ⋂ Xj |/|Ei||Xj ∈

IND(D)} = |Ti|/|Ej |. Def.8 (Minimum local certainty and maximum local uncertainty of adecision table) Given a decision

tableS = (U,R, V, f) ,the certainties of its all condition classes areκ(E1), …, κ(Ei), …, κ(Em), then its minimum local certaintyαc is defined asαc = min{κ(E1), …κ(Ei), …κ(Em)},and its maxi- mum local uncertaintyαuc is defined asαuc = 1 − αc.

The global inconsistency of a decision table is described byits global uncertainty and global cer- tainty, while the maximum inconsistency of its all condition classes is described by its maximum local uncertainty and minimum local certainty.

Theorem 1. Given a decision tableS = (U,R, V, f) and a rule setF ,generated from it. If the decision table can represent the whole domain space, then the maximum possible correct rateη for testingF equals to its global certaintyuc,that isη = uc =

∑m i=1 |Ti|/|U |, wherem is the number of its

condition classes. This theorem can be proved according to the above definitionseasily.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 405

Now, let’s discuss the measuring of the uncertainty of a decision rule system. The uncertainty of a decision rule system can be evaluated by certainty factor (CF ).Certainty factor is a convenient way for measuring the uncertainty of rules.

Def.9 (Certainty factor)Given a decision tableS = (U,R, V, f), whereR = C ⋃ D, C is its con-

dition attribute set andD = {d} is its decision attribute set.The certainty factorCF (A −→ B) of a decision ruleA −→ B is defined asCF (A −→ B) = |X

⋂ Y |/|X|, whereX is the object set which

condition attribute values satisfying the formulaA while Y is the object set which decision attribute value satisfying the formulaB.

Def.10 Minimum certainty factor of a decision rule system) Given a decision tableS = (U,R, V, f) and a decision rule systemF = {f1, …, fi, …, fn} generated from it,cf1, …, cfi, …, cfn are the corre- sponding certainty factors of thesen decision rules, then the minimum certainty factorβ of this decision rule systemF is defined asβ = min({cf1, …, cfi, …, cfn}.

Generally, in the process of generating a decision rule system from a decision table, we need to set a threshold for choosing rules. If theCF of a rule is greater than this threshold, the rule will be gener- ated.Otherwise, the rule will not be generated. Unfortunately, it is very difficult to set the threshold using domain experts’ prior knowledge in many cases. Here, using the relationship between the uncertainty of a decision table and its corresponding decision rule system, this problem could be solved.

Theorem 2. Given a decision tableS = (U,R, V, f), whereR = C ⋃ D,C is its condition attribute

set andD = {d} is its decision attribute set.If the threshold for generating decision rules is set to be its minimum local certainty when a rule systemF is generated form this decision table, then the maximum correct rate may be got in the testing process of F theoretically.

Proof: The condition classes of the decision tableS areE(k,C) ∈ U/IND(C), where,k = 1, . . ., |U/IND(C)|.

For eachE(k,C),there will exist aT(k,C) = max{E(k,C) ⋂ Xi|Xi ∈ U/IND(D)},and aXj ∈

U/IND(D) satisfyingE(k,C) ⋂ Xj = max{E(k,C)

⋂ Xi|Xi ∈ U/IND(D)} = T(k,C).

|E(k,C) ⋂ Xj |/|E(k,C)| = |T(k,C)|/|E(k,C)| is theCF of rule Des(E(k,C), C) −→ Des(Xj ,D),

thus|T(k,C)|/|E(k,C)| = κ(E(k,C)). Sinceαc = min{κ(E(1,C), …, κ(E(k,C), …, κ(E(|U/IND(C)|,C)},then|T(k,C)|/|E(k,C)| ≥ αc. Thus,the following rule can be generated.

Des(E(k,C), C) −→ Des(Xj ,D), |T(k,C)|/|E(k,C)|.

where,|T(k,C)|/|E(k,C)| is theCF of ruleDes(E(k,C), C) −→ Des(Xj ,D). If this rule is used to test the object in|E(k,C)| in the testing process, correct decisions will be made

for the following object set:

T(k,C) = max{E(k,C) ⋂ Xi|Xi ∈ U/IND(D)}.

Thus, the object set having correct decision result in the testing process for the whole decision table S will be

∑|U/IND(C)| k=1 T(k,C).

So, the correct rate for the whole decision table isη = ∑|U/IND(C)|

k=1 |T(k,C)|/|U |. �

According to Theorem 1, we know this theorem holds.

406 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Theorem 3. Given a decision tableS = (U,R, V, f) and a rule setF generated from it, a rule f ∈ F .If the training data can represent the whole domain space, then the accuracy for testing rulef equals to its certainty factor.

Proof: Suppose the rulef isA −→ B. According to Def.9, we knowCF (A −→ B) = |X ⋂ Y |/|X|,

where,X is the object set which condition attribute values satisfying the formulaA while Y is the object set which decision attribute value satisfying the formulaB.

Since the training data can represent the whole domain space, when this rule is used to test the objects in the decision tableS, the number of correctly recognized objects isK|X

⋂ Y |, and the number

of objects identified by this rule isK|X|, whereK is a coefficient. Then, the test accuracy for this rule is:

η(f) = K|X ⋂ Y |/K|X|.

So,η(f) = CF (A −→ B). �

4. Data-driven Data Mining Methods Based on Knowledge Uncertainty

To verify the validity of the 3DM model proposed in the paper,we developed several data-driven data mining methods. In these data mining methods, knowledge uncertainty is used as a knowledge property to control the data mining process.

4.1. Data-driven Default Rule Generation Algorithm

Skowron developed an efficient propositional default rule generation algorithm to generate classification rules. However, the threshold for choosing rules of this algorithm is based on prior domain knowledge. It would be a data-driven knowledge acquisition algorithm if the threshold could be got through analyzing the data of the original decision table. According to Theorem 2, we can find that if a decision table can represent the whole domain space, its minimum local certainty can be taken as the threshold for choos- ing rules in the rule generation process. Thus, we can implement a data-driven knowledge acquisition algorithm which is fully controlled by the uncertainty of a decision table. In this way, we propose a data-driven knowledge acquisition algorithm by improvingSkowron’s default rule generation algorithm.

Algorithm 1. Data-driven default rule generation algorithm

Input:A decision tableS = (U,R, V, f), whereU is the set of objects,R = C ⋃ D is the set of

attributes,C is the condition attribute set andD = {d} is the decision attribute set.

Output: A default rule systemF . Step1: According to the condition attribute setC, calculate the condition classes ofS,E(k,C) =

U/IND(C), wherek = 1, …, U/IND(C). Use Def.7 and Def.8 to calculate the minimum local cer- taintyαc of S(αc is a threshold to control the process of generating rules).

Step 2: If|E(k,C) ⋂ Xj |/|E(k,C)| ≥ αc, then a rule will be generated.

Rule:Des(E(k,C), C) −→ Des(xj,D)||E(k,C) ⋂ Xj |/|E(k,C)|,

where,|E(k,C) ⋂ Xj |/|E(k,C)| is the certainty factor of ruleDes(E(k,C), C) −→ Des(xj ,D)

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 407

Step 3: Add the decision table into a decision table setψ,ψ = {S}. Step 4: Ifψ = φ, then stop. Otherwise, extract a decision tableS∗ = (U,R∗, V ∗, f∗) from ψ,

calculate its attribute coreCore(C∗). Through deleting a core attribute(eg.Ccut, projections of the con- dition attributesCpr = C∗ − Ccut could be generated, wherer = 1, …, |Core(C∗)|, C∗ is the condition attribute set ofS∗,andCcut is the core condition attribute set deleted. Deal with everyprojectionCpr using the following steps:

➀ If Cpr 6= φ, do the following 4 steps. Otherwise, do nothing. ➁ Add a new decision tableS

′ = (U,R

′ , V

′ , f

′ ) obtained from a projection intoψ,ψ = ψ

⋃ {S

′ },

whereR ′ = Cpr

⋃ D.

➂ The condition classes of the projectionCpr could be calculated as follows:

E(k,Cpr)(Ek,Cpr ∈ U |IND(Cpr), k = 1, …, |U/IND(Cpr))|

➃ If |E(k,Cpr) ⋂ Xj |/|E(k,Cpr)| ≥ αc, then a rule could be obtained as follows:

Rule ′ : Des(E(k,Cpr), Cpr) −→ Des(Xj ,D)||E(k,Cpr)

⋂ Xj |/|E(k,Cpr).

➄ Construct the fact for blocking the rule for every default ruleRule ′ :

If there exists anEi, Ei ∈ U/IND(C), Ei is a subset ofE(k,Cpr), andEi ⋂ Xj =φ, then the fact is:

F ′ : Des(Ei, Ccut) −→ NOT (Rule

′ ).

Step 5: Go to step 4.

4.2. Data-driven Decision Tree Learning Algorithm Based on Knowledge Uncertainty

Decision tree learning is one of the most widely used machinelearning methods. However, if there are some noises in the training data set or only very few data is available, a decision tree created from such a data set will over-fit the training data. Pruning is the mosteffective method to deal with over-fitting problem in decision tree learning process. It could be further classified into two types, post-pruning and pre-pruning. It is difficult to estimate the exact time to stop the growing process of a decision tree in a pre-pruning method, which limits its development and application[23]. According to the definition of the uncertainty of decision table in the section 3, a data-driven decision-tree pre-pruning algorithm (DDPA) based on knowledge uncertainty is proposed. In this algorithm, the global certainty of a decision table with reference to a given condition attribute is used as the measure to choose split-attributes, and control the growing of a decision tree in its pre-pruning process. Using this method, the knowledge acquisition process could be controlled by the training data set itself automatically.

Def.11 (Split-attribute of a decision tree node) The condition attribute, used to further divide the objects of a decision tree node in order to create its sub-layer nodes, is defined as its splitting-attribute.

Def.12 (Decision table of a decision tree node) A decision table containing objects included in a decision tree node is defined as the decision table of this decision tree node.

Theorem 4. Suppose the training data can represent the whole domain space. In the process of generating a decision tree from a decision table, each node,except the root node, is generated by a condition class of its father node. If the certainty of a condition class, which is used to generate a decision

408 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

tree node, is not less than the global certainty of the decision table of this tree node with reference to its splitting-attribute, then it would not improve the test accuracy to further split this node one more layer.

Proof: Suppose there is a decision tree branch as shown in Fig. 7. Thecondition class of nodea, which is used to generate nodeb, is denoted asEab.

Figure 7. A branch of a decision tree

Its certainty is denoted asκ(Eab).Denote the splitting-attribute of nodeb asb also,and denote its decision table asSb = (U

′ , R

′ , V

′ , f

′ ),R

′ = C

′ ⋃ D

′ .The total certainty ofSb with reference to the

attributeb is denoted asµc(b).Some sub-nodes will be genarated by splitting nodeb using its splitting- attributeb. If nodeb is a leaf node itself, then a decision rulefb could be generated for the path from the roota to nodeb. If nodeb is splitted by its splitting attributeb, and all its sub-nodes are leaf nodes, then a rule setFb could be generated for the paths from the root nodea to its each sub-node. Suppose the test accuracy for rulefb is η(fb) and the test accuracy for rule setFb is η(Fb), we can have a) and b) as follows:

a)η(fb) = κ(Eab) Let’s suppose the rulefb is A −→ B.The objects satisfying formulaA just equal to the condition

classEab,and the objects satisfying formulaA ⋂ B can be expressed asTb = max{Eab

⋂ Xj

|Xj ∈ U ′ /IND(D

′ )}. Thus, the certainty factor of rulefb should be:

CF (A −→ B) = |Tb|/|Eab| (1)

According to Def.7, we know: κ(Eab) = |Tb|/|Eab| (2)

According to Theorem 3, we know the test accuracy of rulefb equal to its certainty factor, that is:

η(fb) = CF (A −→ B) (3)

From equation (1), (2)and(3), we can haveη(fb) = κ(Eab).

b) η(Fb) = µc(b). Dividing the decision tableSb with condition attributeb, we can get the condition classesEks,Ek ∈

U ′ /IND(b), wherek = 1, …, n, andn = |U

′ /IND(b)|. There must exist a setTk for each condition

classEk, andTk = maxj{Ek ⋂ Xj |Xj ∈ U

′ /IND(D

′ )}. If the rule setFb is used to test the objects

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 409

in the decision tableSb, then the correct recognized objects are ∑|U ′/IND(b)|

k=1 Tk. So, the test accuracy for the rule setFb is:

η(Fb) =

|U ′ /IND(b)|∑

k=1

|Tk|/|U ′

| (4)

According to Def.6, we can have:

µc(b) =

|U ′ /IND(b)|∑

k=1

|Tk|/|U ′

| (5)

From equation (4) and (5),we can haveη(Fb) = µc(b). So, ifκ(Eab) ≥ µc(b), from a) and b) we can knowη(fb) ≥ η(Fb). Thus, Theorem 4 holds. �

It is a NP-hard problem to generate an optimal decision tree[24]. Theorem 4 cannot assure that further splitting the node two or more layers will not improve the test accuracy either. It can be taken as a strategy for partition optimization. Theorem 4 describes the relationship between a decision table and the test accuracy for the decision tree generated from it. Using this relationship, we can draw the following conclusion. If the certainty of a condition classused to generate a decision tree node is not less than the global certainty of the decision table of this tree node, then we should stop splitting this node, and generate a leaf node for it. In other words, if the certainty of a condition class, which is used to generate a decision tree node, is greater than the total certainty of the decision table of its father node with reference to its splitting-attribute, the node can be taken as a leaf node. Based on this conclusion, we can develop an algorithm for decision tree learning pre-pruning. The decision tree created by this method has high test accuracy but with a large tree size.

Algorithm 2. DDPA: Data-driven decision tree pre-pruning algorithm

Input: A decision tableS = (U,R, V, f),whereR = C ⋃ D.

Output:A decision tree.

Step1. Initialize the root node. Generate a noderoot with all the objects of the decision tableS. Calculate the global certainty of the decision tableS with reference to each of its condition attributes. Select the maximum oneµc(a) , and denote the associated condition attribute asa, then take it as the splitting-attribute of the noderoot.

Step2. Create-tree(root,S,a,µc(a)). Create-tree(root,S,a,µc(a)) {

1. Divide the decision tableS using condition attributea, create a sub-nodesub-rooti of root node for each condition classEai.Calculate the certainty of each condition class,and denote it asκ(Eai),where, i = 1, 2, …,m,m = |U/IND(a)|.

2. Fori = 1 tom do: If κ(Eai) ≥ µc(a), or there is no other condition attribute left, then create aleaf node forsub-rooti; Else {

410 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Generate a sub decision tableS ′

with the objects included in nodesub-rooti.Calculate the global certainty of the sub decision tableS

′ with reference to its each condition attribute, select the maximum

oneµc(b),denote the associated attribute asb, and take it as the spliting-attribute of the nodesub-rooti. If κ(Eai) ≥ µc(b), then create a leaf node forsub-rooti; Else Create-tree(sub-rooti ,S

′ ,b,µc(b)).

}

3. Return the noderoot. }

In order to illustrate the algorithm DDPA more clearly, let’s look at the following example.

Table 2. A decision table

Outlook Tem Hum Windy d

1 Overcast Hot High Not Y

2 Overcast Hot High Very N 3 Overcast Hot High Medium N 4 Sunny Hot High Not Y 5 Sunny Hot High Medium Y 6 Rain Mild High Not N 7 Rain Mild High Medium N 8 Rain Hot Normal Not Y 9 Rain Cool Normal Medium N 10 Rain Hot Normal Very N 11 Sunny Cool Normal Very Y 12 Sunny Cool Normal Medium Y 13 Overcast Mild High Not N 14 Overcast Mild High Medium N 15 Overcast Cool Normal Not Y 16 Overcast Cool Normal Medium Y 17 Rain Mild Normal Not N 18 Rain Mild Normal Medium N 19 Overcast Mild Normal Medium Y 20 Overcast Mild Normal Very Y 21 Sunny Mild High Very Y 22 Sunny Mild High Medium Y 23 Sunny Hot Normal Not Y 24 Rain Mild High Very N 25 Overcast Hot High Not N

Eg.2. Given a decision table as shown in Table 2, where the condition attributes areOutlook, Tem(temperature), Hum(humidity)andWindy, the decision attribute isd. We use the algorithm DDPA, ID3[25] and a pre-pruning algorithm J-pruning[26] to create three decision trees as shown in Fig. 8, Fig. 9 and Fig. 10 respectively, where, an ellipse represents a splitting-attribute, and a rectangle represents a leaf node, which includes its decision class and confidence.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 411

Figure 8. Decision tree created by DDPA Figure 9. Decision tree created by J-pruning

Figure 10. Decision tree created by ID3

The steps for creating a decision tree with DDPA are as follows.

1. Calculate the global certainty of the decision table withreference to each of its four condition attributes. They areµc(Outlook) = 0.76, µc(Temperature) = 0.64, µc(Humidity) = 0.64, µc(Windy) = 0.52.The maximum is 0.76. It corresponds to the attributeOutlook.

2. Create the root node (Outlook) of the tree. It includes all instances of the decision table. Use the attributeOutlookto divide the decision table into some classes. The certainties of each classes are κ(E(Overcast)) = 0.5, κ(E(Sunny)) = 1, κ(E(Rain)) = 0.875.

3. Sinceκ(E(Sunny))>µc(Outlook), κ(E(Rain))>µc(Outlook), κ(E(Overcast))<µc(Outlook), the Sunny class and Rain class can be taken as leaf nodes. So, we need only to process the Overcast class (1,2,3,13,14,15,16,19,20,25). We look fora next splitting-attribute from the re- maining condition attributes,Temperature, Humidity andWindy. Sinceµc(Temperature) = 0.7, µc(Humidity) = 0.9, µc(Windy) = 0.5, the attribute Humidity is selected as the next splitting- attribute. Becauseµc(Humidity) = 0.9¿κ(E(Overcast)),the tree should grow with the attribute Humidity.

412 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

4. Create a nodeHumidity, and use the attributeHumidity to further divide the Overcast class.Since κ(E(High)) = 0.83 < µc(Humidity), κ(E(Normal)) = 1 > µc(Humidity), the normal class can be taken as a leaf node. We should further look for a next splitting-attribute from the at- tributesTemperatureand Windy of the current decision table that is corresponding to the class High.Now, µc(Temperature) = 0.83, µc(Windy) = 0.67.They are both less than or equal to κ(E(High)).Thus, the classHigh should be taken as a leaf node too.

5. Return the tree shown in Fig. 8.

From the above figures, one can find that the tree produced by DDPA is the smallest one among the three trees.

4.3. Data-driven Knowledge Acquisition Based on Concept Lattice

Concept lattice[27] is the key data structure of knowledge presentation in Formal Concept Analysis. Ex- tracting knowledge from concept lattice has been studied bymany machine learning researchers. Knowl- edge acquisition based on concept lattice consists of mining association rules and classification rules[28]. Some existing concept lattice based machine learning algorithms could only deal with certain data. They could not extract probability rules. Some other concept lattice based machine learning algorithms can deal with uncertain data. However, their learning parameters have to be set by users in advance. It is very difficult to set these parameters unless the whole domain knowledge is known in advance. GALOIS[29] and RULEARNER[30] belong to the former kind. They adopt an incremental method to build a complete lattice and then to extract rules. LEGAL[31], LACS[32] and CBALattice[33] belong to the latter one. They use the learning parameters provided by users. If the user does not understand the related domain knowledge quite well, it will be much difficult for he/she to set appropriate learning parameters.

In order to study from data itself better, we adopt the model of data-driven data mining. Using the uncertainties of decision table and decision rule, the relationship of the knowledge uncertainties of three different knowledge presentation models, decision table,decision rule, and concept lattice, is discovered through analyzing their knowledge representation styles.A data-driven uncertain knowledge acquisition algorithm based on concept lattice is proposed using this relationship.

The basic concepts of Formal Concept Analysis and Concept Lattice could be referred to[27]. Some basic concepts of Concept Lattice used in this algorithm areintroduced here. Suppose thatF andG are two mapping relations,X is an object set andA is an attribute set. LetF (X) denotes the attribute set, which is common to the objects inX,G(A)denotes the set of objects which have all attributes inA. Now, we can analyze the relationship of the uncertainties measurement of the three different knowledge representation models of decision table, decision rule, and concept lattice.

The uncertainty is an essential feature of data. For different knowledge presentation models, the measurements of uncertainties are distinct. The relationship of the knowledge uncertainties of the three different knowledge presentation models of decision table, decision rules, and concept lattice, is dis- covered by analyzing their knowledge presentation styles.According to Def.1, a decision table can be a multi-value system. In other words, each attribute can haveseveral attribute values. However,the formal context of concept lattice is a binary system in general. So,a decision table has to be converted from multi-value to binary value when building a concept latticebased on a multi-value decision table.

Def.13 Let S = (U,R, V, f) be a decision table, andC ⊂ R are condition attributes. For anyc(c ∈ C), Tc is called the conversion attribute of the attributec,andTc = Vc.The set of the conversion attributes

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 413

of all condition attributes is denoted asConConversion(C) = ⋃

c∈C Vc.The set of the conversion attributes of all decision attributes is denoted asDesConversion(D) =

⋃ d∈D Vd.

The formal context of the original decision table can be obtained by combining the conversion at- tributes of its condition attributes and decision attributes as the attribute set of the new decision table.

Def.14 Let S ′ = (U,R

′ , V

′ , f

′ ) be the decision table after attribute conversion. The condition and

decision part ofS ′ are called condition formal context and decision formal context respectively, denoted

as(U,Concorrespond(C), RC) and(U,DesCorrespomd(D), RD) respectively, whereRC is a binary relation betweenU andConcorrespond(C),RD is a binary relation betweenU andDescorrespond(D).

Both the condition formal context and the decision formal context represent the concepts on the same finite object set. Therefore, there must be some relation between the condition formal context and the decision formal context.

Def.15 For any(X,F (X))((X,F (X)) ∈ (U,ConCorreapond(C), RC)) and a concept(X ′ , F (X

′ ))

in the decision formal context satisfyingX ⊆β X ′ , β = |X �

X ′ |

|X| ,where| • | is the number of objects included in the concept extent, then it is called that the concept(X,F (X)) is β included in the concept (X

′ , F (X

′ )).

This β inclusion relation between the concept of a condition formal context and the concept of a decision formal context shows the uncertainty within concepts. The rules based onβ inclusion relation are uncertain rules. In order to minimize the uncertainty ofrules, theβ value within concepts must be maximized.

Def.16 For any (X,F (X))((X,F (X)) ∈ (U,ConCorreapond(C), RC)) and all concepts of a decision formal context(X

′

k, F (X ′

k)) ∈ (U,DesCorrespond(D), RD)(1 ≤ k ≤ ND,ND is the number

of all concepts in the decision formal context),letβk = |X �

X ′

k |

|X| . βM = max(βk) is called the maximum certainty of the concept(X,F (X)) in the condition formal context with respect to the concept of the decision formal context.

βM not only expresses the uncertainty between concepts, but also is consistent with the confidence of rules. So, when extracting an uncertain rule from a concept lattice built with the formal context, theβM needs to be calculated and estimated.IfβM is greater than the certainty threshold, a rule can be obtained.

In section 3, it is proved that the local minimum certaintyαc derived from a decision table, could represent the uncertainty of the decision table and can be utilized as the threshold for choosing rules in the rule generation process.βM reveals the uncertainty of concepts in the formal context converted from a decision table. In the process of building concept latticefrom a decision table, two steps are needed. The first step is to convert the attributes to obtain formal contexts. The second is to build up a concept lattice. The essence of the conversion is that attribute values in the former decision table is viewed as new attributes in the later which have values of attribute 0 and 1 only. In this process, the uncertainty of a decision table is not changed. Building a concept lattice using the converted decision table as formal context does not influence the uncertainty of the data. In other words,αc andβM is consistent. Soαc can also be a threshold to acquire uncertain rules.

Generally, a threshold of uncertainty rules is required in an uncertain knowledge acquisition process, which is set in advance according to prior domain experience. In the following data-driven knowledge acquisition method based on concept lattice (DDKCCL), the local minimum certaintyαc of a data set is adopted as an uncertainty threshold. The novelty of a rule defined below is employed to prune redundant rules in the algorithm also. Since the parameter of the algorithm can be calculated from training data directly, it is not necessary to set up it in advance. Thus, a data-driven knowledge acquisition method

414 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

is designed. In the classification rule algorithm LACS[32],which is based on concept lattice and deals with uncertain data using concept purity and concept intensity to control the building of a concept lattice and pruning rules for multi-classification problems. The parameter used in LACS needs to be set up in advance. In order to verify whether theαc could be used as a threshold to acquire uncertainty rules in a data-driven knowledge acquisition process, we adoptαc as the threshold of concept purity on the basis of LACS. The novelty of rules and other concepts used in our algorithm are defined here.

Def.17 Given a decision tableS = (U,R, V, f).CoverSetD is the object set covered by the rules from a rule set. For a new rulef(X) −→ f(X

′ )(X,X

′ ⊂ U) if X * CoverSet,then the rulef(X) −→

f(X ′ ) is called novelty, otherwise is called redundancy.

According to the definition of novelty, if a new rule is implicated by some other rules in the rule set, it will be viewed as redundant and should not be added into therule set. This method can not only reduce the rule set, but also improve the efficiency of the algorithm.

Def.18 ∀(X,A) ∈ (U,ConCorrespond(C), RC),X ⊂ U,A ⊂ ConConversion(C), if the pair (X,A) satisfiesX = G(A) andA ⊆ F (X), then(X,A) is called the extended concept in the condition formal context which the decision table corresponds to.

Theorem 5. Given a condition formal context(U,ConCorrespond(C), RC), if (X,A) is an ex- tended concept in the condition formal context, then there must be a concept(X,B) in the condition formal context such thatA ⊆ B.

Proof: According to the definition of the formal concept, the concept intent is the maximum attribute set corresponding to the concept extent. Since(X,A) is an extended concept,A is not a maximum attribute set corresponding to the object setX, there must be an attribute setB with X = G(B) andB = F (X), where(X,B) is a concept in the formal context which the decision table corresponds to. In terms of Def.18,X = G(A) andA ⊆ F (X), thenA ⊆ B. �

As a result, in the formal context which a decision table corresponds to, an extended concept has the same object set with certain concept, but its attribute set of objects is a part of the attribute set of objects of this concept. So, the rules obtained by this extended concept can become simpler, and it can be proved that the confidence of these rules is the same as that of the rules obtained by the original concept.

Therefore, in the process of obtaining classification rules, we search the concepts or extended con- cepts satisfying certain conditions with a top-down method. At each level, the quadruple(X,A, Y,B) is used to denote the concepts or extended concepts, whereX,Y are object set,A ⊂ ConConversion(C), B ⊂ ConConversion(D),X = G(A),A ⊆ F (X), Y = G(B),B = F (Y ). A concept or an extended concept withβM greater thanαc will be used to obtain a rule, otherwise, it will be ignored. After the processing is finished at each level, we test whether the objects are covered by the current rule set con- tain all objects in the original decision table and then forecast if some novelty rules could be obtained by the concepts or extended concepts in the next level. If yes, the subsequent steps start, otherwise, the algorithm stops.

In this algorithm,αc is calculated from the training data. It is not necessary to be set up in advance. At the same time, by choosing novelty rules, the search spaceof the concepts will be reduced and the efficiency of the algorithm will be improved. In the following description of the algorithm, both concept and extended concept are called concept. �

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 415

Algorithm 3.Data-driven knowledge acquisition based on Concept Lattice (DDKCCL)

Input: A decision tableS = (U,R, V, f),a condition formal context(U,ConCorrespond(C), RC) and a decision formal context(U,DesCorrespond(D), RD).LetRuleSet = φ denotes the classifica- tion rule set.

Output: A rule setRuleSet.

Step1: According to Def.8, calculate the thresholdαc and useαc as the threshold to control the generation of rules.

Step2: Initialize the top level concept:L1 = {(G(A), A,G(B), B)}, whereG(A), G(B) ⊂ U,A ⊂ ConConversion(C), B ⊂ DesConversion(D), (G(A), A) and(G(B), B) are concepts of the condi- tion formal context and decision formal context respectively, satisfyingG(B) = maxM⊂DesConversion(D) (G(A)

⋂ G(M)). For the concepts in theL1, if βM =

|G(A) �

G(B)| |G(A)| > αc, then a ruleA −→ B is ob-

tained, andRuleSet = RuleSet ⋃ {A −→ B}.

Step3: Build the next level concept lattice, and obtain rules, initializei = 1.

➀ If the number of the objects covered by the current rule set isequal to that of all objects in the original decision table and then forecast if some novelty rules could be obtained by the concepts in the next level. If yes, the subsequent step starts, otherwise, the algorithm stops.

➁ When there is more than one concept in the levelLi,supposeLi+1 = φ. Denote the object set of the union of two concepts(G(A), A,G(B), B) and (G(A

′ ), A

′ , G(B

′ ), B

′ ) in Li asG(Z) =

G(A) ⋂ G(A

′ ).

➂ ForG(Z), if G(Z) 6= φ, then the condition part of the new concept is denoted as(G(Z), A ⋃ A

′ ),

and its decision part is denoted as(G(B), B), whereG(B) = maxM⊂DesConversion(D)(G(Z) ⋂ G(M)).

Then,Li+1 = Li ⋃

(G(Z), A ⋃ A

′ , G(B), B).

➃ If |G(Z)| 6= |G(A)| and|G(Z)| 6= |G(A ′ )|, βM =

|G(Z) �

G(B)| |G(Z)| > αc, then a ruleA

⋃ A

′ −→ B

is obtained, andRuleSet = RuleSet ⋃ {A

⋃ A

′ −→ B}.

➄ If |G(Z)| = |G(A)| or |G(Z)| = |G(A ′ )|, then no rule will be obtained,and the next combination

continues. ➅ If the pairwise combination of concepts inLi is finished,theni = i+ 1, and go to➁

5. Experiment Results

We have done a lot of simulation experiments to test the performance of our data-driven knowledge acquisition algorithms. In our simulation experiments, the UCI data sets[34] are used to test the three algorithms proposed above. In order to increase the uncertainty of data, some attributes are deleted from the original data sets randomly.

5.1. Data-driven Default Rule Generation Algorithm

In this experiment, the continuous attributes are discretized with the method of[35]. The inference strat- egy of lower frequency first is used to deal with inconsistentrules[36]. For each data set, 50% data is used as training set and the whole data set as testing set. Thebasic information of the data sets is listed in Table 3.

416 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Table 3. Data sets

Datasets Number of training samples Number of testing samples

Tic-tac-toe 479 958

Buf 172 345

Hayer-roth3 66 132

Iris-1 75 150

Iris 75 150

Car 487 974

Balance-scale 312 625

Krkopt 517 1035

Houst-vote 217 435

Yeast 185 371

Tic-tac-5000 1000 5000

Heart10000 1500 10000

Hayes-1 66 132

Glass 107 217

Ecolt 168 336

Postoperative 45 90

Heart 135 270

Lenses 12 24

Flare 161 323

Monk-3 216 432

Adult+Stretch 10 20

New-thyroid 107 215

For each data set (a decision table), the experiment processis as follows:

Step1: Discretize the training data set with the attribute importance based discretizition algorithm in [35].

Step 2: Calculate the minimum local certaintyαc of the training decision table, generate a rule system F with our data-driven knowledge acquisition algorithm, andtest the rule system with the whole data set.

Step 3: Generate rule systems with Skowron’s default rule generation algorithm by taking 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0 as threshold respectively, and test these rule systems with the whole data sets.

The experiment results are shown in Table 4. It can be found from Table 4 that the correct recognition rateincreases quickly with the decreasing of

the threshold when it is decreasing from 1 toαc, while the correct recognition rate changes a little after the threshold is further decreasing fromαc to 0.

The curve of the correct recognition rate changing with the threshold can be drawn for each data set. Fig. 11 shows two such curves for the data sets Ecolt (real line) and Lenses (dashed). From this figure,

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 417

Table 4. Accuracy for different thresholds(%)

Threshold 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 αc Data sets αc

Tic-tac-toe 0.5 15 15 20 57 69 71 69 67 67 67 67 71

Buf 0.5 5 5 17 21 45 59 59 57 54 54 54 59

Hayer-roth3 0.5 73 73 75 83 87 87 87 87 87 87 87 87

Iris-1 0.42 0 0 11 35 35 35 55 55 55 55 55 55

Iris 0.5 19 19 19 25 32 49 56 56 49 49 49 49

Car 0.5 60 60 61 79 79 80 80 78 78 78 78 80

Balance-scale 0.5 8 11 44 56 65 70 70 70 65 55 52 70

Krkopt 0.5 42 57 80 83 87 86 86 86 86 81 81 86

Houst-vote 0.5 6 54 54 86 86 86 86 86 86 86 86 86

Yeast 0.25 10 10 10 18 18 35 41 44 40 39 39 40

Tic-tac-5000 0.5 51 60 71 80 82 82 62 81 80 78 78 82

Heart10000 0.5 14 47 73 83 83 84 84 82 80 78 78 84

Hayes-1 0.42 9 9 9 9 9 9 48 48 48 32 32 48

Glass 0.33 14 14 14 14 17 41 41 41 41 41 41 41

Ecolt 0.43 26 26 31 41 52 63 63 65 64 64 64 63

Postoperative 0.5 36 36 50 73 73 68 68 63 63 63 83 68

Heart 0.5 15 54 64 67 71 82 82 82 74 74 71 82

Lenses 0.38 50 50 50 50 50 50 50 67 67 67 67 67

Flare 0.54 28 32 43 75 74 74 74 74 74 65 65 74

Monk-3 0.5 34 34 34 36 48 50 50 50 50 50 50 50

Adult+Stretch 0.5 60 60 60 70 70 80 80 80 80 80 80 80

New-thyroid 0.5 51 51 64 72 79 81 81 81 81 81 81 81

one can find that the correct recognition rate is increasing quickly when the threshold is decreasing from 1 toαc while it changes a little when the threshold is further decreasing fromαc to 0. In the process of rule generation, we hope to get a high correct recognition rate and get a rather small number of rules. Thus, we can find that the thresholds decided by our data-driven knowledge acquisition algorithm for these two data sets are very good. A high correct recognitionrate is got with a rather small number of rules.

5.2. Data-driven Decision Tree Learning Algorithm Based on Knowledge Uncertainty

In this simulation experiments, fourteen datasets from UCImachine learning repository[34] are used.

A. Feasibility test of the selection measure for splitting-attribute

Since Quinlan’s original work, there have been a number of alternative suggestions for measures for selecting splitting-attribute. In order to prove the feasibility of the selection measure for splitting- attribute proposed in this paper, we develop an algorithm Gc(global certainty), which is a transmutation of ID3. The only difference between them is the selection measure for splitting-attributes. ID3 selects splitting-attributes by information gain while Gc by the global certainty of a decision table with reference to a given condition attribute. We compare the algorithm ID3with Gc from the tree size, the test accuracy and the time cost. The experiment results are shown in Table 5.

418 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Figure 11. Simulation result of datasets Ecolt and Lenses

Table 5. Ratio of experiment result

Algorithm ID3 Gc

Ratio of leaf number 1 1.01

Ratio of test accuracy 1 1

Ratio of time 1 1

In this experiment, the missing attribute values in all datasets are processed with a method based on discernible matrix[37], and all the numeric attributes arediscretized by Nguyen greedy algorithm[38]. 70 percent of each data set is used as the training data, and the remainder as testing data. We repeat it 10 times for each data set, and then the average is taken as the final result.

From the above experiment results, it is easy to find that the ratio between the two trees created by ID3 and Gc nearly to 1:1 from the viewpoint of tree size, test accuracy and time. It proves that the measure method for selecting splitting-attribute used in this paper has the same effect with the information gain, so the measure for selecting splitting-attribute is feasible.

B.Performance test

In order to test the performance of the algorithm DDPA proposed in this paper, we compare it with the pre-pruning algorithm J-pruning [26] and post-pruningmethod reduced-error pruning (REP)[39]. Missing attribute values are assigned as the most common one. All numeric attributes are discretized into four intervals using equal-width discretization. Each data set is splitted into 10 equal parts approximately, the first of these 10 subsets is used for testing and the remainder for training, then the second subset is used for testing and all the other 9 subsets are used for training. This process is repeated for all 10 subsets and the results are averaged to obtain the final result. The result is shown in Table 6. All best results of each data set are marked with boldface.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 419

Table 6. Test accuracy and tree size for DDPA compared to REP and J-Pruning (J-PRU)

Data sets Accuracy (%) Tree size

DDPA REP J-pru DDPA REP J-pru

Autos 71.0 63.3 62.0 52.8 53.6 43

Balance-scale 62.5 78.4 68.1 35 54.6 19

Breast-w 95.4 94.3 92.4 12 15.1 20.4

Credit approval 87.0 83.2 85.4 8 8 7.6

Glass 63.2 66.6 51.4 13.3 23.4 20.9

Heart-c 64.3 60.0 54.6 18 20 20.6

Heart-h 82.0 78.0 81.0 10 15.3 15.8

Hepatitis 86.7 80.8 58.0 7 6.5 7.7

Horse-colic 85.5 84.5 71.6 26.7 17.3 36.8

Iris 95.0 94.6 95.0 10 6.4 8

Labor 93.3 79.5 88.0 6.2 6.7 9

Pima-indians 74.2 74.0 77.2 29 35.8 17.4

Vote 95.6 95.6 95.0 3 7.5 5

Zoo 86.0 90.2 89 9 14.1 10.4

Average 81.6 80.2 76.3 17.1 20.3 17.2

From Table 6, we can find that the algorithm DDPA has higher test accuracy and produced smaller trees than algorithm REP and J-pruning. Tapio Elomaa provedthat the algorithm REP can produce the smallest tree that has minimal error with respect to the validation set[40]. So, if the training set and the validation set can represent the whole domain objects well,the algorithm REP is an ideal method for generating a decision tree. From the experiment results, one can find that the algorithm DDPA is even better than REP. From this viewpoint, the algorithm DDPA hasa higher adaptability than REP. Further more, the algorithm DDPA need not validate the feasibility by using the validation set in each pruning process, and need not produce a total tree before finishing the learning process. So, it spends less time and less space than the algorithm REP.

It has been proved that post-pruning is more effective than pre-pruning in decision tree learning process[41]. From the above experiment results, we can find that the pre-pruning algorithm DDPA reach and exceed the effect of an ideal post-pruning algorithm REP, so it is an ideal method for decision tree learning also.

C.Efficiency test

Since the most time spent in generating a decision tree is used to select splitting-attributes, and both the global certainty proposed in this paper and the information gain selection measure for splitting- attributes have the same time complexity, there is little difference among the algorithms to create a decision tree in time complexity. In order to show the efficiency of the algorithm DDPA, we test the time spent in generating a decision tree by it. The training data are generated automatically, and the conflict ratio of each data set is 100%, which means that there are no two records which have the same condition

420 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

attributes values but with different decision attribute value. There are 15 condition attributes and one decision attribute in each data set, and they have 2,000, 5,000, 10,000, 20,000, 50,000 and 100,000 records respectively. The results are shown in Table 7, wherem ∗n means that there are m attributes and n records in a data set.

Table 7. Efficiency test result

Data Set Size of the data set Time(minute)

Data Set1 16∗2000 1.29

Data Set2 16∗5000 5.87

Data Set3 16∗10000 6.21

Data Set4 16∗20000 16.89

Data Set5 16∗50000 49.71

Data Set6 16∗100000 150.23

Since in real applications the conflict ratio of a data set is much less than these experiment data sets, they will generate smaller trees. Thus, the time spent in real applications will be much less than the testing results.

Above experiment results show that algorithm of the data-driven decision-tree pre-pruning is valid.

5.3. Data-driven Knowledge Acquisition Based on Concept Lattice

In this experiment, the inference strategy of high confidence priority is used to deal with inconsistent rules. 50 percent of each data set is used as the training data, and the remainder as testing data. The basic information of the data sets used is listed in Table 8.

Table 8. Data sets

Data sets Total number of samples Number of training samples Number of testing samples

Breast-Cancer 699 349 350

Balance-Scale 625 312 313

Letter 20000 2000 18000

Mushroom 8124 4012 4012

Car 1728 691 1037

Iris 150 75 75

Solar-flare 1066 533 533

Tic-Tac-Toe 958 479 479

The validation of the local minimum certaintyαc in the algorithm LACS is examined at first. The value of the concept purity starts from 0.1with a shift step of 0.1. The value of concept intensity starts from 0.01 with a shift step of 0.01. The algorithm LACS is executed on the data set Balance-Scale to obtain the accuracy and the number of rules. Then, initialize the value of the concept purity asαc ,run the algorithm to obtain the accuracy and the number of rules again. The test results are shown in Table 9 and Table 10.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 421

Table 9. Accuracy generated from the data set Balance-Scale(%) P

P P

P P

P P

P P

P P

concept intensity

concept purity

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 αc=0.33

0.01 75.7 75.7 75.7 75.7 75.7 75.3 69 62.9 34.8 34.8 75.7

0.02 77.3 77.3 77.3 77.3 77.3 76.4 66.7 57.8 32.6 14.7 77.3

0.03 78.3 78.3 78.3 78.3 78.3 76.7 67.1 56.5 32.6 14.7 78.2

0.04 75.1 75.1 75.1 75.1 75.1 73.2 47.9 21.1 9.6 3.5 75.1

0.05 72.8 72.8 72.8 72.8 72.8 71.2 47.9 9.26 4.47 0 72.8

0.06 73.1 73.1 73.1 73.1 73.1 70.6 42.8 1.27 0 0 73.1

0.07 73.1 73.1 73.1 73.1 73.1 70.6 42.8 0 0 0 73.1

0.08 73.1 73.1 73.1 73.1 73.1 70.6 42.8 0 0 0 73.1

0.09 73.1 73.1 73.1 73.1 73.1 70.6 42.8 0 0 0 73.1

0.1 73.1 73.1 73.1 73.1 73.1 70.6 42.8 0 0 0 73.1

Table 10. Number of rules generated from the data set Balance-scale P

P P

P P

P P

P P

P P

concept intensity

concept purity

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 αc=0.33

0.01 245 201 142 120 99 75 49 36 15 8 136

0.02 188 145 116 94 76 58 39 26 11 4 110

0.03 160 120 99 80 63 50 36 25 11 4 93

0.04 117 87 72 57 41 30 21 12 4 1 67

0.05 83 56 47 39 29 19 14 5 2 0 45

0.06 55 38 33 27 19 10 6 1 0 0 31

0.07 39 28 24 19 14 8 4 0 0 0 23

0.08 36 26 22 18 14 8 4 0 0 0 21

0.09 36 26 22 18 14 8 4 0 0 0 21

0.1 36 26 22 18 14 8 4 0 0 0 21

As the Table 9 and Table 10 show, when the concept intensity isfixed and the concept purity decreases from 1 to a certain value, the accuracy will increase at a highspeed, but when the purity continues decreases after that, the change of accuracy is not remarkable. For example, when the value of concept intensity is 0.01 and the value of concept purity decreases from 1 to 0.5, the accuracy increases from 34.8% to 75.7%,but when the value of concept purity goes on decreasing, the accuracy stays around 75.7%. Further more, when the value of concept purity is fixed, the value of concept intensity increases from 0.01 to 0.1, the change of accuracy is not regular and it decreases slowly from the overview, the number of rules decreases constantly as well. Therefore, replacing the concept purity asαc, a relatively larger accuracy can be reached and fewer rules will be generated.

Thus, it is valid to useαc as the threshold to generate rules in the algorithm LACS.

Now let’s test the validity of usingαc as threshold to generate rules in the algorithm DDKCCL. The initial value of the uncertainty threshold is 0.1, the shiftstep is 0.1. The algorithm is executed on all data sets in Table 8 to obtain the accuracy. Then initialize the value of the uncertainty threshold asαc , run this algorithm to obtain the accuracy again. The test results are shown in Table 11.

422 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

Table 11. Accuracy of the DDKCCL(%)

Threshold 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 αc Data Set αc

Breast-Cancer 0.5 89 92 92 92 92 92 92 92 92 92 92

Balance-Scale 0.33 41 42 64 68 73 73 73 73 73 73 73

Letter 0.2 1 11 11 11 13 19 35 40 40 40 40

Mushroom 0.65 65 72 74 74 74 74 74 74 74 74 74

Car 0.6 21 33 46 64 64 64 64 64 64 64 64

Iris 0.5 88 88 89 89 89 89 89 89 89 89 89

Solar-flare 0.5 89 89 97 97 97 97 97 97 97 97 97

Tic-Tac-Toe 0.5 21 35 49 69 71 71 71 71 71 71 71

Table 11 shows when the uncertainty threshold decreases from 1 to a certain value, the accuracy will increase at a high speed, but when the threshold goes on decreasing after that, the change of accuracy is not much. Whenαc is used as the threshold to generate rules, the accuracy obtained is the highest one for each data set. So, it is valid to useαc as the threshold to generate rules in the DDKCCL.

The validity ofαc is illustrated Fig. 12. The two curves describe the changes of the accuracy for Balance-scale and Letter data sets when the threshold changes. It can be found obviously from Fig. 12, when the uncertainty threshold isαc , the accuracy maintains a certain value in the curves and itsvalue is relatively higher.

Figure 12. Distribution of accuracy for balance-scale and letter data sets using different threshold

All above experiment results show the validity ofαc. To further interpret the validity of DDKCCL, we compare it with LACS. The comparison results are shown in Table 12.

From Table 12, we can find that for different data sets, the value of concept purity and concept inten- sity are different for guaranteeing higher accuracy and fewer rules. So, if one user does not understand the related knowledge quite well, it is much difficult to set up appropriate learning parameters in ad- vance. Although the highest accuracy could not be reached byusing DDKCCL, the results are close to the maximal accuracy, and the number of rules is relatively less as well. As the experiment results show, it is valid to acquire rules from the concept lattice by usingthe local minimum certainty and the novelty of rules.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 423

Table 12. Comparison of algorithms LACS with DDKCCL

LACS DDKCCL

Data Set concept concept accuracy number αc accuracy number

purity intensity (%) of rules (%) of rules

Breast-Cancer 0.8 0.02 92.8 66 0.5 92.3 39

Balance-Scale 0.5 0.03 78.3 63 0.33 73.1 21

Letter-Recognition 0.1 0.01 40.1 63 0.2 40.1 40

Mushroom 0.8 0.01 60 33 0.65 74.5 10

Car 0.3 0.02 64 9 0.6 64 6

Iris 0.6 0.02 90.7 10 0.5 89.3 12

Solar-flare 0.7 0.04 97.6 40 0.5 97.4 21

Tic-Tac-Toe 0.6 0.02 72 43 0.5 71 9

All experiment results show that the data-driven knowledgeacquisition algorithm for uncertain knowledge extracting from concept lattice proposed in thispaper is valid for acquiring uncertain knowl- edge. The parameters used in the algorithm can be calculatedfrom training data, so it is not necessary for users to initiate it. The algorithm indicates the essence of data adequately. At the same time, by estimating the novelty of rules redundant rules can be deleted, and the efficiency of the algorithm is also improved.

6. Conclusion

In this paper, the essence of data mining is examined. Data mining is defined as a knowledge transfor- mation process to transform knowledge from a data form, which is not understandable for human, into a symbolic form, which is understandable for human and easy to be used. We could not generate new knowledge from data base. The properties of knowledge couldbe helpful for us to keep knowledge in the data form unchanged in a data mining process. Using the relationship between the knowledge in data form and symbolic form, a data-driven data mining model is discussed. User-driven and domain-driven data mining can deal with many real world data mining tasks. These tasks are highly constraint-based and domain-oriented. Thus, it targets actionable knowledge discovery, which can afford important grounds for performing appropriate actions. In this paper, relationship between user-driven and domain-driven data mining and data-driven data mining is discussed. From the discussion, we know that the domain- driven data mining does not conflict with the data-driven data mining. They could be integrated into one system. So a new model of domain-oriented data-driven data mining is proposed. Three data-driven data mining algorithms are proposed. Experiment results show the model is accurate and valid.

In the future, we will address the following problems: (1) Encoding prior domain knowledge, user’s interest and constraint for a specific task and user’s control. (2) Designing an incremental method that could take data, prior domain knowledge, user’s interest, user’s constraint, user’s control together as its input. (3) Applying our incremental mining method to real-world data mining task, such as instance financial data mining in capital markets.

424 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

References

[1] Wang, Y.X.: On Cognitive Informatics. Brain and Mind,A Transdisciplinary Journal of Neuroscience and Neurophilosophy,Vol.4, No.2,151-167,2003.

[2] Y.Y.Yao, Zhong, N., Zhao,Y.:A three-layered conceptual framework of data mining,In Workshop Proceed- ings Foundations of Data Mining ICDM 2004,Brighton,UK, 205-212.

[3] Y.Y. Yao:A step towards the foundations of data mining,Data Mining and Knowledge Discovery: Theory, Tools, Technology V,Orlando, Florida, USA, April 21-22, 2003,Dasarathy, B.V.(ed), The International Soci- ety for Optical Engineering, 254-263,2003.

[4] Peng, Y., Kou, G., Shi, Y., Chen, Zh.X.:A Systemic Framework for the Field of Data Mining and Knowl- edge Discovery,Workshop Proceedings Foundations of Data Mining ICDM2006,Hongkong,China,Dec,18- 22,2006.

[5] S.Ohsuga:Knowledge Discovery as Translation,Studies in Computational Intelligence,2005,6,3-19.

[6] T.Y.Lin, S. Ohsuga,Proceedings of IEEE ICDM’02 Workshop on Foundation of Data Mining and Knowledge Discoery,2002.

[7] T.Y.Lin, X.H.Hu, S.Ohsuga, C.J.Liau,Proceedings of IEEE ICDM’03 Workshop on Foundation of New Di- rections in Data Mining,2003.

[8] T.Y. Lin,Proceedings of IEEE ICDM’04 Workshop on Foundation of Data Mining,San Jose State,2004.

[9] W.Frawley,G.Piatetsky-Shapiro,C. Matheus:Knowledge Discovery in Databases: An Overviews,AI Magazine,1992,13,213-228.

[10] Cao, L., Lin, L., Zhang, C.Q.: Domain-driven In-depth Pattern Discovery, A practical methodology, [Re- search Report],Faculty of Information Technology, University of Technology, Sydney, Australia, June 2005.

[11] Zhang, C. Q., Cao, L.:Domain-Driven Data Mining: Methodologies and Applications,Advances in Intelligent IT – Active Media Technology 2006, IOS Press, 13-16.

[12] Zhao,Y., Y. Y. Yao:Interactive Classification Using a Granule Network,Proceedings of the 4th IEEE Interna- tion Conf on Cognitive Informatics, Irvine, USA, 2005, 250-259.

[13] P. Kuntz, F. Guillet, R. Lehn, H. Briand: A User-Driven Process for Mining Association Rules,LNCS1910, 2000, 483-489.

[14] J. Han, L. Lakshmanan, R. Ng: Constraint-Based, Multidimensional Data Mining,IEEE Computer, 1999, 32(8),46-50.

[15] J. Patrick, D. Palko, R. Munro,M. Zappavigna:User Driven Example-Based Training for Creating Lexical Knowledgebases,Australasian Natural Language Processing Workshop, Canberra, Australia, 2002, 17-24.

[16] A. Dorado, W. Pedrycz, E. Izquierdo:User-Driven FuzzyClustering: On the Road to Semantic Classification, LNCS3641, 2005, 421-430.

[17] Zhao, J., Wang, G. Y.:Research on System Uncertainty Measures Based on Rough Set Theory,Rough Set and Knowledge Technology, Volume 4062/2006, 227-232.

[18] T. Mollestad, A. Skowron.:A Rough Set Framework for Data Mining of Propositional Default Rules,IS- MIS’96,Zakopane, Poland, pp.9-13, June 1996.

[19] Wang, G.Y., He, X.:Initiative Learning Algorithm Based on Rough Set,Data Mining and Knowledge Discov- ery: Theory, Tools, and Technology,Volume 5098/2003, 94-102.

G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining 425

[20] Wang, G.Y., He, X.:A self-Learning Model under Uncertain Conditions,Journal of Software(in Chinese).2003,14(6),1096-1102.

[21] Pawlak,Z.:Rough set,International Journal of Computer and Information Sciences,1982,11(5),341-356.

[22] Wang, G.Y.: Rough set theory and knowledge acquisition, Xi’an: Press of Xi’an Jiaotong University, Xi’an, 2001.

[23] Tom M. Mitchell:Machine Learning,The McGraw-Hill Companies, Inc.1997.

[24] Hong J. R., Ding M. F., Li X. Y., Wang L. W.: A new algorithmof decision tree induction,Journal of Software, 1995, 18(6),470-474.

[25] Quinlan J. R., Discovering rules from large collections of examples: a case study,Expert systems in the Micro electronic Age, Edinburgh University Press, Michie D, ed.,1979.

[26] M. Bramer: Pre-pruning Classification Trees to Reduce Overfitting in Noisy Domains,Intelligent Data Engi- neering and Automated Learning – IDEAL 2002(eds. H.Yin et al.), Springer-Verlag, 2002, 7-12.

[27] Ganter B,Wille R: Formal concept analysisSpringer,1999.

[28] Fu, H.Y., Fu, H.G., et al.: A Comparative Study of FCA-Based Supervised Classification Algo- rithms,Proceedings of the Second International Conference on Formal Concept Analysis, Sydney, Australia, February 23-26,2004,313-320.

[29] Carpineto, C., Romano, G.: Galois: An order-theoreticapproach to conceptual clustering,Proceedings of the Tenth International Conference on Machine Learning, University of Massachusetts, Amherst, MA, USA, June 27-29,1993,33-40.

[30] Sahami, M.: Learning Classification Rules Using Lattices.Proceedings of the Eighth European Conference on Machine Learning, Heraclion, Crete, Greece, April 25-27,1995,343-346.

[31] Engelbert Mephu-Nguifo:Galois Lattice: A Framework for Concept Learning. Design, Evaluation and Re- finement,Proceedings of the Sixth International Conference on Toolswith Artificial Intelligence, New Or- leans, Louisiana, USA, November 6-9,1994,461-467.

[32] Xie, Zh.P., Liu Z.T.: Research on Classifier Based on Lattice Structure,Proceeding of conference on Intelli- gent Information Processing, 16# World Computer congress 2000, Aug 21-25,Beijing, China,333-338.

[33] Anamika G, Naveen K, Vasudha B: Incremental Classification Rules Based on Association Rules Using Formal Concept Analysis,Proceedings of the 4th International Conference on MachineLearning and Data Mining in Pattern Recognition, Leipzig, Germany, July 9-11, 2005,11-20.

[34] Blake C., Keogh E., Merz C:UCI repository of machine learning databases, Technical report, University of California, Department of Information and Computer Science, Irvine, CA.(1998) http://www.ics.uci.edy/ mlearn/MLRepository.html.

[35] Hou, L. J., Wang, G. Y., Wu, Y., Nie, N: Discretization inRough Set Theory,Chinese J. of Computer Science, 2000, 27(12), 89-94.

[36] Wang, G. Y., Liu, F., Wu, Y.: Generating Rules and Reasoning under Inconsistencies,2000 IEEE Interna- tional Conference on Industrial Electronics, Control and Instrumentation, Japan, 2000, 2536-2541.

[37] Zhang, W.X., Liao, X. F. Wu, Z. F.: An Incomplete Data analysis Approach Based on Rough set The- ory,Pattern Recognition and Artificial Intelligence, 2003, 16(2): 158-162.

[38] Nguyen H. S., Skowron A: Quantization of Real Value Attributes-Rough Set and Boolean reasoning Ap- proach,Proc of the Second Joint conference on Information sciences, 1995,34-37.

426 G. Wang and Y. Wang / 3DM: Domain-oriented Data-driven Data Mining

[39] Qinlan, J. R.: Simpliying decision trees,International Journal of Man- Machine studies, 1987, 27(3), 221- 234.

[40] Apio Elomma, Matti K: An Analysis of Reduced Error Pruning,Journal of Artificial Intelligence Research, 15, 2001,163-187.

[41] Windeatt T., Ardeshir G: Tree Pruning for Output Coded Ensembles,International Conference Pattern Recognition, ICPR02, 2002, 92-95.