How To
Properly Use A Search Engine For Genealogy
I.
Which search engine is
best?
There are literally dozens and
dozens of engines. It isn't necessary to
know or use every search engine. A small handful of engines handle the vast
majority of all web traffic. Below are the Big Five along with the approximate
percentage of web queries each handles.
1. Google - 35%
2. Yahoo - 28%
3. MSN - 16%
4. AOL - 15%
5. Ask Jeeves -
3%
These five search engines account for approximately 97% of all web searches.
All the dozens of remaining search engines are not worth your time playing
with. For the rest of this search
tutorial we will concern ourselves only with Google. You will find links to
these and the other relevant search engines by Clicking
Here.
I.
Why not just stick to
Google alone?
Before a webpage shows up in a
search engine, that engine must first go out and find it. That takes time, and
there are many webpages on the internet that are not listed yet in any search
engine. Some pages are listed in some search engines but not others.
Currently, Google states that it
lists four billion pages. Only three billion of these have been sorted by
topic, (a task known as "indexing"), so effectively we can say Google
makes available about three billion webpages of information to us. That's
roughly half of all the webpages that are out there. Yahoo also claims it has
about three billion pages indexed. Using these two together should provide us
with access to most of the seven or eight billion webpages that can be found on
the internet.
II.
Rules For Effective
Searching.
Rule #1 - Choose search words carefully.
Rule #2 - Use short phrases in quotation marks.
Rule #3 - Limit responses using Boolean search
delimiters.
Don't be frightened off by such words as "Boolean" or
"delimiter", and don't ignore advice like choosing search words
carefully because you think it's too simple. It has been estimated that 99% of
all web searches are performed using inappropriate words, too many words, or
too few words when performing a search.
There are highly effective search techniques, but the vast majority of
internet users know nothing about them.
This tutorial will give
Rule #1 - Choose search
words carefully.
Open up the Google search engine and type into the search
field the word "Leathers" without the quotation marks. This returns
approximately 553,000 entries. This is far more than you could ever look at,
and the vast majority of these entries deal with the material
"leather" as a textile, not as a family name.
Most people use search words that are too common,
meaning they apply to too many different things besides the actual object of
the search. Unfortunately for us, leather is a popular, therefore common
commodity, making our family name alone virtually useless as a search
word. It needs a lot of help from
other words and search techniques.
Typing the words, Leathers Family together narrows down our search to
approximately 68,700. This doesn’t mean
there are 68,700 websites that have information about the Leathers family. It means that 68,700 sites contain both the
words "Leathers" and "family", but they are not necessarily
together.
We need to add at least another word or two but what words should we use? This is where knowing something about our
subject becomes crucial. As our first additional search word try using our
subject as the word. In this case
our subject is "genealogy". Putting
that word into our search narrows down the responses we get to only 2,860. To narrow it down more use specialty words
related to your subject. The more
specific and obscure the word, the better our chances are of narrowing down
our search.
Let’s try using "heraldry" or "ancestry". Adding the word ancestry to our search gives
us "Leathers family genealogy ancestry." But, hey! Instead of
narrowing down the search returns even further, using "ancestry" as
an additional search word brought back 10,100 returns! We're going in reverse.
What has happened?
Congratulations! You've just fallen victim to a quirk about search engines that
very few people even know exists. Search engines interpret the search words
you input using what is known as a "heuristic algorithm", which
is a fancy way of saying they use a mathematical formula to play with the
search words so as to give you back the greatest possible number of responses
related to your search terms. The more search terms you use, the more combinations
a search engine comes up with. Of course the programmers know you want to
zero in on your topic and not be bothered by unrelated information, so the
algorithm is written such that it uses combinations of words most frequently
found on the internet.
If you use only one word, it searches for only that word. If you use two words
it searches for all webpages that contain both of those words. If you use three
words it looks for pages that have all three words. But if you use four or
more search words the algorithm then begins playing what we might call the
"either/or game", meaning it will find pages that contain
combinations of your words but that do not necessarily have ALL of your
search words. It mixes and matches them in such a way so as to give you the
greatest possible number of responses.
In fact even the order you put the words in affects the responses you get.
Take our words, "Leathers family genealogy ancestry", and reverse the
order of genealogy and ancestry. You will find that instead of getting 10,100
returns you now get only 8,100. Same words, different order, but it makes a
very big difference! If you are using
more than 3 search words that you try arranging them in different orders.
You should also be aware that Google (and most other search engines) has a ten
word maximum limit, so no more than ten search words can be used at one time.
Let's summarize what we've covered under Rule #1: - Choosing search words
carefully.
1. Our search should always use our
root word, in this case "Leathers" to start.
2. Add up to two
additional words, being sure to select the more unusual ones regarding our
subject, in our case, genealogy is our subject, so we used it and ancestry as
additional search terms.
3. If we use more than three search
words, remember the order makes a difference, so try rearranging the words in
different combinations, always keep "Leathers" (or your key search
term) as the first word.
Rule #2 - Use short phrases in
quotation marks.
Ordinarily, search engines treat each search word as an independent object. But
words enclosed by quotation marks are considered to be the same as a single
word. Let's go back to our original example. We found that using
"Leathers" by itself Google returned 553,000 possible responses for
us to look at. Adding the word "family" narrowed it down to 68,700
responses. Put these words into Google again only this time enclose them in
quotation marks, so that it looks like this: "Leathers family". The use of quotation marks has reduced the
number of Google search returns to a mere 166.
Why did this happen? Before, Google was looking for all webpages which had the
words Leathers and Family somewhere on the same page, though not necessarily
beside each other. The addition of quotation marks tells the Google search
algorithm to look only for instances in which the words Leathers and Family are
used together as a single term. Does this mean there are 166 webpages that talk
about the Leathers family? Yes!!
We've come a long way towards locating the information we want, but 166 pages
is still a fair amount of material to sort through. Now is where we need to get
very specific about exactly what information we're searching for. As an
example, perhaps we are looking for Leathers family trees. We could include the
word "tree" within our search quotation marks. Let's try it and see
what happens. You will notice that Google returns only one instance of a
Leathers Family Tree. Putting everything
in quotation marks can narrow down the search too much.
This is an instance in which we want to use that heuristic algorithm those
Google programmers built in to our advantage. Let's use the word
"tree", but let's put it outside the quotation marks. There, that
gave us twenty returns, and that's a small enough number that we can easily
search each item. Sure enough, the thirteenth page listing entitled
"LEATHERS Genealogy" contains the phrase "Leathers family
line" which is very similar to "Leathers family tree", so it's
worth a look. After clicking on the link it takes us to a page with another
link at the top that says "View All LEATHERS family members", and if
we click on that link it takes us to a great site with a Leathers family tree on it! Sometimes narrowing a search
down too narrowly causes us to miss webpages that contain information we want,
but that use slightly different words than our search terms. In that case, try
putting some search words outside the quotation marks.
It is a good practice to have several search terms in separate quotations.
For example, let's go back to our term of "Leathers family" and add
to it the term "family history". It is perfectly acceptable to have
the word "family" in both terms, since the words are being viewed by
the algorithm as independent phrases and not as individual words. If we perform
this search we find that we come up with twenty-one returns, the first of which
is for a "Leather Family History Society" that it is likely many of
you never knew existed before! If you were so inclined you could drop the
second search phrase and just use the word "history" outside of the
quotation marks. This opens up more possibilities, giving about seventy returns
to look at. Mix and match your words using them in quotations and out of
quotations. If you find you have more than three total search words/phrases,
try using them in different orders.
OK, let's summarize what we learned about using Rule #2 - Use short phrases in
quotation marks.
1. Enclosing two or more words in
quotation marks causes the search engine to look for that exact phrase.
2. Narrowing down the search terms too
much can cause too few search returns.
3. Try using a search term with another
search word outside the quotation marks.
4. Use multiple search terms in separate
quotation marks.
5. When three or more search
phrases/words are used, mix and match the order they go in to get new results.
Rule #3 - Limit
responses using Boolean search delimiters.
What is a Boolean search and what are delimiters? Literally speaking, it is
searching using the logical phrases of AND, NOT, and OR. We're going to add a couple more of these
"delimiters" that are not, strictly speaking, Boolean, but perform
the same basic purpose, so we'll include them.
The easiest way to think of a Boolean search is to think of how a sculptor
works. He doesn't add anything to the stone he is working with. He simply chips
away the things he doesn't need and what's left behind is what he was looking
to reveal. That is precisely how Boolean searching works. As we saw with
Rule #2, sometimes being ultra-specific with your search terms limits the
responses you get down to nothing. In that case we will need to use more
general search words and phrases. This opens us up to the possibility of
getting so many returns that the search is of no use to us. What we need is a
way to specify to the search engine the things we DON'T want it to show
us so that it whittles away the junk and leaves us with the things we are
really interested in.
Now that we have an idea what a Boolean search does, let's see exactly what
these delimiters look like. Going back to our very first example, we put the
word "Leathers" into Google and got back 553,000 returns. We saw that
the vast majority of these returns had nothing to do with our family name, but
were related to the material "leather". We want to get rid of
these unrelated entries without getting rid of ones we want, and we will
accomplish this using the most basic Boolean delimiter of all; the word
"NOT", which must be written in all capital letters for
the search engine to recognize that we mean it as a Boolean term. A simpler
way is just to use the minus sign "-", placed directly in front of
the word we want our search engine to exclude from its search results.
Since "leather" was probably the most common word returned, we can
exclude it from our searches by adding to our search the word leather with a
minus sign in front of it (no space between the sign and the word,) like this:
"-leather". This goes immediately after our search word of
"Leathers" (there is a space between Leathers and -leather.) Try it
and see what happens.
You will notice that by telling Google to eliminate that one word, our returns
have fallen from 553,000 to a mere 211,000. Better, but still far too many
returns to be of any real use to us. We need to chip away a few more terms. How
do we decide what terms to eliminate? Well, we could go down and look at
the returns to see what sorts of things are popping up. This might be feasible
at the very start of the process, but the more we refine our search the harder
it will be to spot words that are frequently found in our search results but
are unrelated to our intended target. A better way is to make a few educated
guesses.
We know that leather, as a textile or material is what the vast majority of
these erroneous search returns are related to. And leather is used largely in
the manufacture of clothing and furniture, and these two items are sold in
stores. So let's eliminate the words "clothing, furniture, store"
from the search results. Try it one word at a time so you can see the results.
Eliminating "clothing" brings our results down to 176,000.
Eliminating furniture down to 167,000. And taking out the word
"store" drops it further down to only 148,000. We could continue
eliminating words, like "sale", "price",
"quality", and so forth, getting rid of words which are directly
related to the retail sales industry. The thing we need to be careful of is
that we don't eliminate a word that is also common to the field of genealogy.
Some words have applicability to more than one subject, and if we get to crazy
we could lose returns that we really wanted to see. And remember, search
engines have a ten word maximum for search words, so we can't simply add more
"NOT" words indefinitely.
Obviously, just using the NOT delimiter alone isn't going to get us where we
want to be anytime soon. We will want to combine it with our other basic search
rules. For example, you might use the search words "Leathers" and
"family" without quotation marks, and after these use the
"-leather" delimiter. Try it and you will see that it drops the
returns from 68,700 to 36,700.
This is a good place to introduce the "AND" delimiter, which can
also be stated as a "+" immediately preceding a word. What this tells
the search engine is that this word MUST be found in all webpages returned, and
to ignore any pages that do not have that particular word in them. Let's
keep using our example above and add the word "genealogy" without
using any delimiter on it. You will see that it returns about 2,180 webpages.
Now insert a "+" immediately in front of the word
"genealogy", and be sure that there is no space between the + and the
word. The returns decrease to about 1,850. This is because before we used the +
delimiter we had four search words, which meant Google's normal search
algorithm was making up it's own word combinations. Some of them did not
include the word "genealogy". By specifying we wanted genealogy in
all search returns, Google eliminated those which its algorithm had come up
with that did not contain our required word. Pretty simple, huh?
On to the "OR" delimiter. Earlier we used the phrase in quotations,
"Leathers family tree" and found it gave us only one return.
Similarly, if we used the phrase "Leathers family line" we would get
only one return, but it is a different webpage. To get the best of both worlds,
put in both of these search phrases and between them insert the word
"OR" in all capital letters. You will see that you now get two
returns. The OR delimiter tells the search engine to give returns from
either one of the phrases, but from BOTH if it can find them! For fun, add
a third phrase, "Leathers family history" to the search, and put an
OR between it and the other search phrases. It now returns ten webpages to you,
all of which contain at least one of the search phrases you entered. The OR
delimiter is a powerful search tool that greatly speeds up our searching.
Here are a couple of bonus delimiters that are technically not Boolean, but
because they function in a similar way we'll use them. The first is what is
known as a "proximity search", which uses the word "NEAR"
in all capital letters. Going back to our original example, if we search
the words "Leathers" and "family" without using quotations,
we get back 68,700 returns. Putting the words in quotation marks drops it down
to 166. But what if we still aren't finding what we want? Perhaps the quotation
marks have narrowed our search down too far. We need a way to find
"Leathers" and "family" in a way that relates the two
words. So we insert the word "NEAR" between them. This tells the
search engine to return webpages that have those two words close together. They
may not be side by side but they are close, and that returns about 17,800
webpages. Still too many to deal with, but now we can begin using our other
delimiters and search word techniques to whittle it down.
Here's another great delimiter. Putting the tilde "~" in front of
a word tells the search engine to look for all synonyms of that word. For
example, if we put it in front of the word "genealogy", the search
engine will look for that word and for all synonyms of that word. This is a
great way to cheat that ten word limit most search engines have. Using just
one word and the ~ lets us search for multiple words without actually entering
them in to the search engine.
Yet another trick is to use the asterisk "*" as what is known as a
"wildcard". For instance, I might list a search word like this:
"educat*". That asterisk tells the search engine to use educat as
the root, and to find all words having that root, such as educate, education,
educating, educator, etc. This would be especially helpful if you are
searching for a particular person's first name but are not exactly certain of
the spelling or believe the person may have used some variation of their name.
For example, maybe you're looking for a "Ricardo" Leathers, but you
think he may have Americanized his first name when he entered the
Let's summarize what we've learned about Rule #3 - Limit responses using
Boolean search delimiters.
1. Use "NOT" or
"-" to exclude words from search returns.
2. Use "AND" or "+"
to specify words that MUST be in the search returns.
3. Use "OR" between two words
or phrases to get either one or BOTH returned in the search results.
4. Use "NEAR" between two
words or phrases to find sites where those words or phrases are close to each
other but not necessarily side by side.
5. Use "~" to get synonyms of
your search word returned in the search results.
6. Use "*" as a wild card with
the root of a word to have all variations of that root returned in the search
results.
7. Use of Boolean search delimiters is
most effective when combined with a combination of Rules #1 and #2.
If you have followed this basic search tutorial and learned to apply the three
rules you will uncover more results pertinent to your searches and do so faster
than 99% of the people using the internet. The rules are not difficult. Once
you have read through them you should be able to use them referring only to the
summaries for reference.