Your SEO Fix

 


 

Ranking via User Performance Metrics Print E-mail

Ranking documents based on large data sets - Click here for original Patent filing

 

Hello all…. How are we today? We’re gonna try melting our brains once again with some notes from a June 2007 Google patent that is a thrilling tale of ranking and re-ranking documents… not a good bedtime story unless your tea is real hot. Last time we were looking at establishing relevance with; Learning a probabilistic generative model for text - and this time we will look at some ways of ranking results based upon this model.

…. And away.

 

Ranking and re-ranking from past user data

a ranking model that predicts a likelihood that a document will be selected by: storing information associated with a plurality of prior searches, determining a prior probability of selection based, at least in part, on the information associated with the prior searches, and generating the ranking model based, at least in part on the prior probability of selection; training the ranking model using a data set that includes approximately tens of millions of instances; identifying documents relating to a search query; scoring the documents based, at least in part, on the ranking model; forming search results for the search query from the scored documents; and outputting the search results.”

If you’re still awake after that… I think we’ll be ok. Once again we’re touching on using prior searches (and likely user sessions) and probabilities as with the recent review of ‘Method and apparatus for learning a probabilistic generative model for text’ (atch Link). We’re also implementing training data and creation of rules based upon the method that are relative to that particular document as well. Give it a read as well at some point for reference.

One can right away start see again how prior searches would be implemented into the ranking process based on methods such as ‘determining a prior probability of selection corresponding to the search query and one of the documents, and generating a score for the one document based, at least in part, on the determined prior probability of selection’ – if we extrapolate from the learning process from ‘Method and apparatus for learning a probabilistic generative model for text’ it is possible the ‘prior probability of selection’ starting point could be the ‘scoring’ discussed in that particular filing.

 

No Magic bullets

As always when talking about ranking, I tend to think of various methods as layers on an onion. There is no one secret to be had there are a ‘plurality of factors’ involved in any scoring ranking system. Much of this patent seems targeted at aspects relating to user data as we would see in areas such as personalized search.

The search engine oftentimes ranks the documents using a ranking function based on the documents' perceived relevance to the user's search terms. Determining a document's relevance can be a tricky problem.”

There is the problem and one of my favourite catch words for 2007 – relevance. So once more we are striving to get more relevance and tighten up the ranking system in general. A method to ‘improve the determination of a document's relevance’ using factors including ‘prior information retrieval data, such as query data, user information, and document information’.

The interesting one there is the ‘user information’ which furter implies layers relating to personalized search type models. Earlier in Item 20 they have ‘information corresponding to the user who provided the search query’ – which further strengthens this. I have to think in terms of personalized search or user data aspects related to current offerings as well as down the road (Google Phone, Google Desktop) and a variety of end user data they could have at their disposal.

 

Getting personal

 

Some of the usages mentioned include;

receive a search query from a user, identify documents corresponding to the search query, and rank the identified documents based, at least in part, on a ranking model that includes rules that maximize a likelihood of the repository.

selecting candidate conditions from training data, estimating weights for the candidate conditions, and forming new rules from the candidate conditions and corresponding ones of the weights

determining a score for each of the documents based, at least in part, on the prior probability of selecting the document and generating search results for the search query from the scored documents.”

There is even some hints towards storing click data as well  as the system can rank documents based on pre-existing retrieval data including ‘data relating to users, queries previously provided by these users, documents retrieved based on these queries, and documents that were selected and not selected in relation to these queries’.

Not only can the previously selected data be weighted, but results that weren’t selected in previous queries. This makes for a much more fluid world of ranking, at least for a particular user (personalized search and relative programs again). Other client devices noted were ‘a wireless telephone, a personal computer, a personal digital assistant (PDA), a lap top, or another type of computation or communication device’….. just to cover all the bases.

There is also mention of the core method involving identifying phrases and synonyms (or other methods) to rank the document repository prior to scoring the results based from the users past actions. We could easily infer larger user sets of data also being a layer in the process as well, though not directly stated. All of this can be used to create a new set of training documents for future reference and application.

 

Tightening the screws

In summing up this one I’d go with the ranking/scoring methods are based on;

prior information retrieval data, such as data relating to users, queries previously provided by these users, documents retrieved based on these queries, and which of these documents were selected and not selected in relation to these queries.”

 

While it certainly seems aimed at personalization of search results, there is no reason to believe that the larger data sets over many users, could be also implemented in such a way to affect ranking, to some degree, in the regular search indexes.

 

(also see related parts; Part I, Part II and Part III and the summary on my Blog)  

 

Resources - this is part of a 3 part series - Summary; Relevance through end user metrics - Learning a probabilistic generative model for text - Ranking documents based upon large data sets - Using concepts for Ad Targeting.

Original Patent - Ranking documents based on large data sets

Patents of further interest - Query revision using known highly-ranked queries - User Distributed Search Results - Systems and methods for analyzing a user's web history - Systems and methods for modifying search results based on a user's history - Methods and systems for opportunistic cookie caching - 2002; Methods and apparatus for employing usage statistics in document retrieval -

 

Need help ranking? Get in touch today for affordable SEO services

 
< Prev   Next >

Knowledge Base
Link building ideas for 2009

Call me a freak.. I can take it…. Or call me old… cause I am getting there, but once upon a time links meant more than ToolBar PageRank and SERP referrers, they meant actual surf-in traffic. I wanted to start out by stating that it is still a consideration. Don’t focus obsessively on building links willy-nilly in an attempt to rocket up the ranks and become a gazillionaire!! Don’t fool yourself, some links can actually bring in some pretty good traffic all on their own folks - don't get myopic.

"It's like a finger pointing away to the moon. Don't concentrate on the finger, or you will miss all the heavenly glory." Bruce Lee in Enter The Dragon (1974)

 

… next, some basics;

 

Read more...