TY - JOUR
T1 - Applying incremental tree induction to retrieval from manuals and medical texts
AU - White, Kieran J.
AU - Sutcliffe, Richard F.E.
PY - 2006/3
Y1 - 2006/3
N2 - The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (US) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
AB - The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (US) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
UR - http://www.scopus.com/inward/record.url?scp=33645022496&partnerID=8YFLogxK
U2 - 10.1002/asi.20321
DO - 10.1002/asi.20321
M3 - Article
AN - SCOPUS:33645022496
SN - 1532-2882
VL - 57
SP - 588
EP - 600
JO - Journal of the American Society for Information Science and Technology
JF - Journal of the American Society for Information Science and Technology
IS - 5
ER -