Skip to content

Assignment for Deep Learning course at UPM which aims to rank queries based on the paper Optimizing Search Engines using Clickthrough Data.

License

Notifications You must be signed in to change notification settings

angeligareta/machine-learned-ranking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learned Ranking

Project developed for Deep Learning course of the EIT Digital data science master at UPM

UPM License

Problem statement

For this assignment, we were given a dataset of medical results from the LOINC database (Logical Observation Identifiers Names and Codes), which is a public standard for identifying medical laboratory observations. The application domain will be build a model capable of ranking these results for any set of given queries, specifically:

  • “Glucose in blood”
  • “Bilirubin in plasma”
  • “White blood cells count”

Implementation

For this purpose, we were given three different approaches to choose from: Pointwise, Pairwise and Listwise. We decided to go with the Pairwise approach, based on the paper Optimizing Search Engines using Clickthrough Data, by Thorsten Joachims.

In this paper, the author presents a method that utilizes clickthrough data for training, connecting the query-log of the search engine with the log of links the users clicked on in the presented ranking. The advantage of this method in the web context is the huge amount of cheap and easily accessible click data available for training.

In this context, this data can be represented as a triplet (q,r,c) consisting of a query q, with a ranking of results presented to the user r and a set of entries that the user clicked on c. To encode this information, each query and link is given a unique ID, which is encoded along with the URL. It is important to note that this data is not absolute, as it is dependent on how much the user scrolled to find the results. Then, if the user clicked on the links ranked 1, 3 and 7 for instance, we can infer that the 7th link is more relevant than the links 2, 4, 5 and 6, but we do not know if it is more relevant than the links from 8th onward. Unfortunately, this kind of information does not suit well to standard machine learning approaches.

So instead the method employs the use of SVMs to include historical user data as part of a set of feature ranking results generated by different features. This gets around the issue of classification as the documents are not classified into a class, and also regression as no document has a defined score because it is user defined.

In order to define an objective function to maximize, the paper defines for an unknown distribution Pr(q,r*) of queries and target rankings on a document collection D with m documents a retrieval function.

Proposed solution

The proposed solution and more details about the implementation can be found in the project report.

Results

For the given queries, we obtained the following results:

Glucose in blood Bilirubin in plasma White blood cells count
#1 Glucose [Moles/volume] in Urine Bilirubin total [Mass/volume] in Synovial fluid Billrubin total [Presence] in Unspecfied specimen
#2 Glucose [Moles/volume] in Pleural fluid Bilirubin indirect [Mass/volume] in Serum or Plasma Nitrofurantoin [Susceptibility]
#3 Glucose [Moles/volume] in Serum or Plasma Bilirubin direct [Mass/volume] in Serum or Plasma Cholesterol [Mass/volume] in Serum or Plasma
#4 Glucose [Mass/volume] in Serum Plasma or Blood Bilirubin total [Mass/volume] in Serum or Plasma Trimethoprim+Sulfamethoxazole [Susceptibility]
#5 Cholesterol in l-iDL [Mass/volume] in Serum or Plasma Cholesterol in l-iDL [Mass/volume] in Serum or Plasma Blood group antibody screen [Presence] in Serum or Plasma

Authors

  • Angel Igareta
  • David Burrell
  • Miguel Pérez
  • Rodrigo Pueblas

About

Assignment for Deep Learning course at UPM which aims to rank queries based on the paper Optimizing Search Engines using Clickthrough Data.

Topics

Resources

License

Stars

Watchers

Forks