Information retrieval (IR) is the process of selecting relevant documents from a collection likes web, digital library [8, 12, and 13] etc., known as corpus, using certain IR strategies. Based on the match between the usersâ specified key words or queries and the index terms in corpus items relevant documents are arranged in the descending order of their relevance for retrieval. âPrecisionâ and âRecallâ are the two measures used to infer the significance of an IR scheme under these circumstances .
Performance of IR schemes varies over different corpuses . To enhance it, IR schemes are usually combined or fused in a judicial way . Selected documents have been observed to reveal 1) Skimming effect, 2) Chorus effect and 3) Dark horse effect in their selection. Skimming effect is the selection of top ranking documents under each of the individual IR schemes participating in fusion while retrieval of documents due to an unexpected key-word match resulting in unusually accurate relevance score estimation is called the Dark horse effect. The Chorus effect assigns a high degree of relevance to the documents found in a majority of lists of relevant ones returned by the schemes. Consequently, these are deemed to be the final relevant list retrieved by the fusion of IR schemes. The extent of chorus effect amplification depends on the number of low-score returning IR schemes for a selected document and hence need to be filtered out, treating them as noises.
This present paper examines the chorus effect amplification and efficiency of retrieval schemes using filters of various sizes. The low score returning schemes are treated as noises. As they create an illusion about the relevance of the documents and become the cause of degradation in performance. The IR schemes themselves may be considered to be symbols, which may be found in a âmessageâ, provided by the fusion function. These assumptions allow the study with in the perspective of Information theory  suggesting the use of information content and Entropy as performance indicators for the IR schemes and the Fusion function used. Both chorus and skimming effects are effectively tapped by a newly proposed fusion function (F-combMax) resulting in a phenomenal performance improvement vis-a-vis the existing fusion functions found in the literature which had always shown one of the aforesaid effects percolated in retrieved pages. A paired student-t test applied to the relevant populations of retrieved documents obtained by with and without filters has shown that the proposed method is effective one.
2. Prior Work in Data Fusion
Data fusion for information retrieval was employed by Fisher  by combining two Boolean searches together one on the title words and the other on the manually generated index terms. A linear combination method for fusing multiple sources by assigning weights to the individual schemes was studied by Belkin and Croft [4, 2] with the limitation of requiring prior knowledge of the retrieval systems for assigning the weights .
The âComb-functionsâ for combining scores that treat all schemes equally have been proposed by Fox and Shaw [6,7]. The impact of weights has been analyzed and recorded . Extensive work on Comb-functions has been carried out by Lee [9, 10, 11] proposing new rationales and indicators for data fusion. Using a probabilistic approach, the training data for the fusion operation are used to select the best functioning scheme with appropriate weights.
The scheme with best performance is selected automatically from the pool of schemes in spite of the appreciable performance of the remaining ones. Bilhart  who proposed a heuristic data fusion algorithm that uses Genetic Algorithm (GA) overcame this for combining the retrieval scores. Some of the comb-functions, which are used in the present study and are shown in table 1.
3. Contribution of the Retrieval Schemes
The certainty about the relevance of the documents as indicted by the score may be analyzed using the statistical information theory , as it is possible to establish an abstract correspondence between it and the comb-functions by considering the individual IR schemes participating in a selection to be symbols constituting a message whose source being the fusion function. Let âsâ and âpâ be the sets of message symbols and if there are ânâ IR schemes these become
Let the jth retrieval scheme assign a maximal score to a particular document which means that the message symbol j has a high probability of occurrence. Further,
The desired condition for a high probability to a symbol leads to a very low information content. The entropy may be used as the performance indicator for analyzing the characteristics of the message source and is given by
When the occurrence of all message symbols is equally likely, the entropy can be written as H(j) = log(n). In view of the statistical communication theory, the desired criteria for the fusion may be restated as
1. The information content of the message symbol should be minimum and
2. The entropy of the message source should be high.
Consider a situation where the probabilities of all symbols are unequal and the probability of one of them, say pi is maximum.
Consequently, H(j) â log(n)
The desired condition may be achieved by increasing the probabilities of message symbols by deleting the low relevance scores in the denominator of (2). If is the sum of âmâ low relevance scores to be deleted, then the probability of the message symbol âjâ becomes
When the low relevance scores are discarded one by one, pj â’ 1, I (j) and H (j) â’ 0 which is the unwanted side effect. The number of low relevance scores âmâ deleted along with their corresponding IR schemes may play a vital role in meeting the desired conditions and the concept of filter is used to determine them.
4. Selection of Retrieval Schemes
This paper focuses on the concept of filter for selecting the best retrieval schemes. Filters allow the signals above a fixed (range of) cut off frequency. The signal is usually expressed in decibels and for a given signal with frequency (score returned by an IR strategy) Î», its decibel equivalent is given by 20 â log10Î». Fixing one of its ends at the maximal score of a document and varying the other end to any specified level can vary its size. The number of relevant scores present inside the filter is treated as the overlap value (Î³) and the scores that lie outside are deleted. A set of modified fusion function that works with in the filter and the criteria used for selection of documents is defined as follows:
1. F-CombMAX : Maximum relevance score ÃÎ³
2. F-CombSUM : Sum of all relevance Scores which lie inside the filter
3. F-CombMNZ : F-CombSUM ÃÎ³
F-CombSUM and F-CombMNZ functions linearly combine the relevance scores and get influenced by the chorus effect whereas the F-Comb-MAX considers all schemes equally; manifesting the skimming effect.
4.1. Data Collection and Schemes
The experiment is conducted over the three-benchmark test document collections namely: 1) MEDLARIS 2) CISI and 3) ADI under an uniform environment consisting of the same Smart stop word list; Porterâs-Stemmer algorithm; and weight assignment. The table 2 shows the characteristics of these three sets.
The Term-Frequency and Inverse-Document Frequency (TF-IDF) weight assignment method is used and the corresponding term- weight (wt), and document-term weight (wd,t) are given by
N=total number of documents in corpus,
ft = number of documents contains the term t
fd,t = frequency of the term t in document d.
The similarity measures of Vector Space Model (VSM) and P-Norm model has been used with user specified three different values of p  for document retrieval in the experiments. It is to be noted that wm = weight of the mth index term and 1 â¤ p â¤ â.
R – Relevance score of d wrt q,
wq,d – weight of the term t in the query q,
wd,t – weight of the term t in the document d,
Wq – weight of the query and
Wd – weight of the document d.
The conjunctive query form of P-norm model given by
wm – weight of the mth index term and 1 â¤ p â¤ â.
4.2. Effect of Filter Size
The effect of varying the filter size on fusion functions in steps of 0.5 dB is analyzed using the 11-point interpolated precision . The average value of the 11-point interpolated value for the F-CombMNZ over the three test document collections is shown in the figure 1. The line marked as A in the graph is the reference line (the relevance score at 0 dB; 100%) used for comparison. In the graph at 0 dB, the performances of the functions are recorded as such without imposing the filter. The precision value at the 0 dB and at the flattening point is quantitatively same. This is due to the fact that at 0 dB no filter is applied and as the filter size is increased gradually, at the flattening point all retrieval schemes are included (equivalently no filter is imposed). The performance of the F-CombMAX and the F-CombSUM functions are qualitatively same and hence not shown separately.
4.3. Performance Comparison
The F-Comb functions of the proposed study are compared with the comb-functions and the overall average precision values are given in table 3. The performance has been found to improve to a maximum of 13.2% and an average of 3.7%. The corresponding filter sizes in these cases vary from function to function and corpus to corpus. Hence, determining an optimal filter size is attempted for performance enhancement.
5. Optimal Filter Size
A generalized curve enveloping the effects of filter sizes is shown in figure 2.
The maximum difference among all relevance scores for any generic document at 0 dB gives the filter size at the flattening point, as the precision values at 0 dB and flattening point are identical. This significant observation is used for computing the size of the filter at the flattening point
5.1. Computing the Value of OX
The point X at which the peak precision value occurs is computed as follows: The filter size is reduced in steps of multiples of 0.1 (0.9 times the filter size, 0.8 times filter size and so on) and the performance at each filter size is recorded. This is done by varying the number of schemes participating in it (2 to 7). The average of all combination is considered for testing whether the filter size has any significant impact over the performance using ANOVA table. The hypotheses used for testing are:
Null hypothesis (H0): There is no significant difference among precision values at various filter sizes.
Alternative hypothesis (H1): The negation of the above.
The computed F value of the ANOVA table is shown in the table 4.
The null hypothesis is rejected successfully proving that the filter size has an impact over the F-comb functions. The filter size, which gives maximum average value, will be optimal. The scores are normalized to avoid the domination of the data set with a higher relevance score range. The average values of the normalized score for the three functions over the all data sets are given table 5.
As the peak precision value occurs at a point (X) corresponds to the 70% of the filter size at the flattening point, OX is fixed as the optimal filter size. The algorithm to determine the optimal filer size and the method of assigning the relevance score to the documents is given Algorithm 1.
Algorithm 1 Documentâs Relevance Score
# di is the size of the filter at the flattening point
# Î³ is the overlap value (the number of relevant scores that lie inside the filter)
# ofs is the optimal filter size
# n is the number of documents in the corpus
# m is the number of relevance scores
for (i = 1 to n)
calculate the absolute value of maximum difference among all relevance scores(d).
for (i = 1 to n)
d1 = 1 – d.
d1(in dB) = 20 * log (d1).
ofs = 0.7 * d1
for(j=1 to m)
if (relevance score < ofs)
F-CombMAX = Maximum of all relevance sores Ã Î³
F-CombMNZ = Sum of all relevance scores ÃÎ³
F-CombSUM = Sum of all relevance scores, which lie inside the filter
6. Experiment and Results
The benchmark test collections and the retrieval schemes mentioned in Â§4.1 given by (7) – (11) are used to test the effectiveness of the proposed functions. The 11-point interpolated precision measure is used for comparing the performance of the newly defined filter based fusion functions with the conventional comb-functions.
6.1. Number of Schemes to be Fused
In the experiment a total of seven retrieval schemes are used ((7) – (11)) and it is planned to test the performance of various combinations of them. Hence, varying number of schemes from 2 to 7 are used in the experiment resulting in several combinations (7Cr , r=2,3,…,7). The Average of 11-pt interpolated precision of all combinations is used for comparison.
Table 6 shows the average 11-pt interpolated precision of the F-combfunctions and Combfunctions. The graph, which gives the 11-point interpolated precision for all functions over the three test data sets, is shown in fig 5.
6.3. Performance Comparison and Test of Hypothesis
Performance of F-Combfunctions is compared with the Comb-functions and the percentage of improvement for F-Combfunctions is computed. Paired âstudent-tâ test is used for comparison. The null and alternative hypotheses are shown below:
Null hypothesis: H0: Âµ1 = Âµ2
Alternative hypothesis: H1: Âµ1 â¤ Âµ2
Here, Âµ1 and Âµ2 are the average precision values for Comb-functions and the F-Combfunctions respectively. The table 7 gives the percentage of improvement andâtâ value for the F- Combfunctions. The table 7 shows that the improvement in performance for the F-CombMAX function is significantly higher as it utilizes the advantage of both skimming and Chorus effects at the optimal filter.
The algorithm for selecting the best retrieval schemes is derived from the concept of filter. The effect of filter is analyzed by treating the low relevance score as noise and the results are used to find the optimal filter. The performance of the fusion functions within the optimal filter is found to be better as all ill performing schemes are deleted. The F-CombMAX achieves significant improvement over the others and hence it may be advantageously used for IR.
...(download the rest of the essay above)