| Ranking via User Performance Metrics |
|
Ranking documents based on large data sets - Click here for original Patent filing
Hello all…. How are we today? We’re gonna try melting our brains once again with some notes from a June 2007 Google patent that is a thrilling tale of ranking and re-ranking documents… not a good bedtime story unless your tea is real hot. Last time we were looking at establishing relevance with; Learning a probabilistic generative model for text - and this time we will look at some ways of ranking results based upon this model. …. And away.
Ranking and re-ranking from past user data
If you’re still awake after that… I think we’ll be ok. Once again we’re touching on using prior searches (and likely user sessions) and probabilities as with the recent review of ‘Method and apparatus for learning a probabilistic generative model for text’ (atch Link). We’re also implementing training data and creation of rules based upon the method that are relative to that particular document as well. Give it a read as well at some point for reference. One can right away start see again how prior searches would be implemented into the ranking process based on methods such as ‘determining a prior probability of selection corresponding to the search query and one of the documents, and generating a score for the one document based, at least in part, on the determined prior probability of selection’ – if we extrapolate from the learning process from ‘Method and apparatus for learning a probabilistic generative model for text’ it is possible the ‘prior probability of selection’ starting point could be the ‘scoring’ discussed in that particular filing.
No Magic bullets As always when talking about ranking, I tend to think of various methods as layers on an onion. There is no one secret to be had there are a ‘plurality of factors’ involved in any scoring ranking system. Much of this patent seems targeted at aspects relating to user data as we would see in areas such as personalized search.
There is the problem and one of my favourite catch words for 2007 – relevance. So once more we are striving to get more relevance and tighten up the ranking system in general. A method to ‘improve the determination of a document's relevance’ using factors including ‘prior information retrieval data, such as query data, user information, and document information’. The interesting one there is the ‘user information’ which furter implies layers relating to personalized search type models. Earlier in Item 20 they have ‘information corresponding to the user who provided the search query’ – which further strengthens this. I have to think in terms of personalized search or user data aspects related to current offerings as well as down the road (Google Phone, Google Desktop) and a variety of end user data they could have at their disposal.
Getting personal –
Some of the usages mentioned include;
There is even some hints towards storing click data as well as the system can rank documents based on pre-existing retrieval data including ‘data relating to users, queries previously provided by these users, documents retrieved based on these queries, and documents that were selected and not selected in relation to these queries’. Not only can the previously selected data be weighted, but results that weren’t selected in previous queries. This makes for a much more fluid world of ranking, at least for a particular user (personalized search and relative programs again). Other client devices noted were ‘a wireless telephone, a personal computer, a personal digital assistant (PDA), a lap top, or another type of computation or communication device’….. just to cover all the bases. There is also mention of the core method involving identifying phrases and synonyms (or other methods) to rank the document repository prior to scoring the results based from the users past actions. We could easily infer larger user sets of data also being a layer in the process as well, though not directly stated. All of this can be used to create a new set of training documents for future reference and application.
Tightening the screws In summing up this one I’d go with the ranking/scoring methods are based on;
While it certainly seems aimed at personalization of search results, there is no reason to believe that the larger data sets over many users, could be also implemented in such a way to affect ranking, to some degree, in the regular search indexes.
(also see related parts; Part I, Part II and Part III and the summary on my Blog)
Resources - this is part of a 3 part series - Summary; Relevance through end user metrics - Learning a probabilistic generative model for text - Ranking documents based upon large data sets - Using concepts for Ad Targeting. Original Patent - Ranking documents based on large data sets Patents of further interest - Query revision using known highly-ranked queries - User Distributed Search Results - Systems and methods for analyzing a user's web history - Systems and methods for modifying search results based on a user's history - Methods and systems for opportunistic cookie caching - 2002; Methods and apparatus for employing usage statistics in document retrieval -
Need help ranking? Get in touch today for affordable SEO services |
|||||