Knowledge Base

A Probabilistic Learning Model

 Method and apparatus for learning a probabilistic generative model for text - Click here for Original Patent

This is an interesting method that seeks to ‘teach’ the system how to relate various documents, or more appropriately, the TEXT within the documents, from semantics to link nodes. Or as stated at one point – “a system that learns concepts by learning an explanatory model of text”. This is something they have worked on for a while and can been seen in the earlier related patents; Test classification system and method and Method and system for creating improved search queries

Phrase Based Personalization of Search

Continuing the journey into Phrase Based Optimization

One thing worth mentioning, is that there is limited info relating to personalized search and PaIR. It merely touches the surface of the over-all personalized search methodologies. This means it would merely play a role in the PS engine. There is much more to it and the PaIR model aspects are by no means comprehensive. I simply wanted to give a quick break down as to how a PaIR system would handle PS processes.

Spam detection in a PaIR system

Detecting spam documents in a phrase based information retrieval system 

This is a continuation of;Phrase Based Optimization and  Phrase Based Indexing and Retrieval II 

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.

At least that’s the opening folly of the document. As a basic refresher, the method looks not only at the search term but related phrases for a given topic and related phrase occurrences expected to be present in a document statistically. I is calculated over individual/multiple documents and collections of documents (web pages and website for our purposes).

 

 

Phrase Based Indexing and Retrieval 2


Picking up where we left of with the overview of Phrase Based Optimization – I wanted to scan over some relevant points from the other Phrase Based Indexing and Retrieval (IR) Patents. This time we'll step back from the algo-babble and explore the intricacies a little further.

As you (undoubtedly) remember the core concept of the processing is to identify valid (actual/real) phrases in a given document collection (or web pages in our case). The goal being to classifying each potential phrase as either “a good phrase or a bad phrase” depending on it’s usage and frequency; then using those ‘good’ phrases in predicting the usage of other ‘good phrases’ in the collection of web pages.


What’s a ‘Good Phrase’?

The classification for possible phrases as either a good phrase or a bad phrase is when the possible phrase; ‘appears in a minimum number of documents, and appear a minimum number of instances in the document collection’. What that number is, we don’t know. Those are the ‘dials’ the Search Gods themselves only have access to. It is almost looking at a Phrase Density over the aggregate of documents (the web site). Also, a BAD phrase is not one with dirty words, it is simply a phrase with too low a frequency count to make the ‘good’ list.

Duplicate Content

– One more time

Why do engines care? - In order to make a search more relevant to a user, search engines use a filter that removes the duplicate content pages from the search results, Another is that they don’t want to spend the resources in indexing pages that are substantially similar.

That said, there still seems to be some confusion out in the SEO world over ‘duplicate content’ and how search engines treat and deal with them. Right away I would like to say - RELAX -. If you are doing sneaky things like filling up a site with dodgy content that YOU KNOW is duplicate, then worry. Most people that may have duplicate content issues are honest web site owners and aren’t at risk of any penalization.

Phrase Based Optimization

The main goal of this document is to give SEO enthusiasts a stronger grasp of how Phrasing is dealt with in Search Engines, in an effort to help you further target and optimize your web sites. The theories and information relate well to keyword/phrase research as well as content creation and to a lesser extent back links text development.

The crux of the piece was based on analysis of an existing Google Patent on ‘Phrase based searching’, (see Resources at the end). That is as far as I shall go on the original Patent since it can lead to assumptions of what may, or may not be used in their indexing and retrieval processes (algorithms). Just because they filed the patent, doesn’t necessarily mean they have implemented it. I feel the main point here is to get a better idea of HOW search engineers think and WHAT may possibly be in place now, or in future Search technologies.

Get Started Now

Name *
Invalid Input
Email *
Invalid Input
Phone
Invalid Input
URL *
Invalid Input
Budget
Invalid Input

Our Sites

Home - About Us - Consulting - SEO Reports - SEO Programs - SEO Packages - Request for Proposal - Contact Us

All content and images © Verve Developments 2012