How To Properly Use A Search Engine For Genealogy
I. Which search engine is best?
There are literally dozens and
dozens of engines.† It isn't necessary to
know or use every search engine. A small handful of engines handle the vast
majority of all web traffic. Below are the Big Five along with the approximate
percentage of web queries each handles.
1. Google ††††††††††††† - ††††††††† 35%
2. Yahoo †††††††††††††† -†††††††††† 28%
3. MSN ††††††††††††††† -†††††††††† 16%
4. AOL †††††††††††††††† -†††††††††† 15%
5. Ask Jeeves ††††††† -†††††††††† † 3%
These five search engines account for approximately 97% of all web searches. All the dozens of remaining search engines are not worth your time playing with.† For the rest of this search tutorial we will concern ourselves only with Google. You will find links to these and the other relevant search engines by Clicking Here.
I. Why not just stick to Google alone?
Before a webpage shows up in a search engine, that engine must first go out and find it. That takes time, and there are many webpages on the internet that are not listed yet in any search engine. Some pages are listed in some search engines but not others.
Currently, Google states that it
lists four billion pages. Only three billion of these have been sorted by
topic, (a task known as "indexing"), so effectively we can say Google
makes available about three billion webpages of information to us. That's
roughly half of all the webpages that are out there. Yahoo also claims it has
about three billion pages indexed. Using these two together should provide us
with access to most of the seven or eight billion webpages that can be found on
II. Rules For Effective Searching.
Rule #1 - Choose search words carefully.
Rule #2 - Use short phrases in quotation marks.
Rule #3 - Limit responses using Boolean search
Don't be frightened off by such words as "Boolean" or "delimiter", and don't ignore advice like choosing search words carefully because you think it's too simple. It has been estimated that 99% of all web searches are performed using inappropriate words, too many words, or too few words when performing a search.† There are highly effective search techniques, but the vast majority of internet users know nothing about them.† This tutorial will give
Rule #1 - Choose search words carefully.
Open up the Google search engine and type into the search field the word "Leathers" without the quotation marks. This returns approximately 553,000 entries. This is far more than you could ever look at, and the vast majority of these entries deal with the material "leather" as a textile, not as a family name.†
Most people use search words that are too common,
meaning they apply to too many different things besides the actual object of
the search. Unfortunately for us, leather is a popular, therefore common
commodity, making our family name alone virtually useless as a search
word.† It needs a lot of help from
other words and search techniques.
Typing the words, Leathers Family together narrows down our search to approximately 68,700.† This doesnít mean there are 68,700 websites that have information about the Leathers family.† It means that 68,700 sites contain both the words "Leathers" and "family", but they are not necessarily together.
We need to add at least another word or two but what words should we use?† This is where knowing something about our subject becomes crucial. As our first additional search word try using our subject as the word.† In this case our subject is "genealogy".† Putting that word into our search narrows down the responses we get to only 2,860.† To narrow it down more use specialty words related to your subject.† The more specific and obscure the word, the better our chances are of narrowing down our search.
Letís try using "heraldry" or "ancestry".† Adding the word ancestry to our search gives us "Leathers family genealogy ancestry." But, hey! Instead of narrowing down the search returns even further, using "ancestry" as an additional search word brought back 10,100 returns! We're going in reverse. What has happened?
Congratulations! You've just fallen victim to a quirk about search engines that very few people even know exists. Search engines interpret the search words you input using what is known as a "heuristic algorithm", which is a fancy way of saying they use a mathematical formula to play with the search words so as to give you back the greatest possible number of responses related to your search terms. The more search terms you use, the more combinations a search engine comes up with. Of course the programmers know you want to zero in on your topic and not be bothered by unrelated information, so the algorithm is written such that it uses combinations of words most frequently found on the internet.
If you use only one word, it searches for only that word. If you use two words it searches for all webpages that contain both of those words. If you use three words it looks for pages that have all three words. But if you use four or more search words the algorithm then begins playing what we might call the "either/or game", meaning it will find pages that contain combinations of your words but that do not necessarily have ALL of your search words. It mixes and matches them in such a way so as to give you the greatest possible number of responses.
In fact even the order you put the words in affects the responses you get. Take our words, "Leathers family genealogy ancestry", and reverse the order of genealogy and ancestry. You will find that instead of getting 10,100 returns you now get only 8,100. Same words, different order, but it makes a very big difference!† If you are using more than 3 search words that you try arranging them in different orders. You should also be aware that Google (and most other search engines) has a ten word maximum limit, so no more than ten search words can be used at one time.
Let's summarize what we've covered under Rule #1: - Choosing search words carefully.
1. † Our search should always use our root word, in this case "Leathers" to start.
2. † Add up to two
additional words, being sure to select the more unusual ones regarding our
subject, in our case, genealogy is our subject, so we used it and ancestry as
additional search terms.
3.†† If we use more than three search words, remember the order makes a difference, so try rearranging the words in different combinations, always keep "Leathers" (or your key search term) as the first word.
Rule #2 - Use short phrases in quotation marks.
Ordinarily, search engines treat each search word as an independent object. But words enclosed by quotation marks are considered to be the same as a single word. Let's go back to our original example. We found that using "Leathers" by itself Google returned 553,000 possible responses for us to look at. Adding the word "family" narrowed it down to 68,700 responses. Put these words into Google again only this time enclose them in quotation marks, so that it looks like this: "Leathers family".† The use of quotation marks has reduced the number of Google search returns to a mere 166.
Why did this happen? Before, Google was looking for all webpages which had the words Leathers and Family somewhere on the same page, though not necessarily beside each other. The addition of quotation marks tells the Google search algorithm to look only for instances in which the words Leathers and Family are used together as a single term. Does this mean there are 166 webpages that talk about the Leathers family? Yes!!
We've come a long way towards locating the information we want, but 166 pages is still a fair amount of material to sort through. Now is where we need to get very specific about exactly what information we're searching for. As an example, perhaps we are looking for Leathers family trees. We could include the word "tree" within our search quotation marks. Let's try it and see what happens. You will notice that Google returns only one instance of a Leathers Family Tree.† Putting everything in quotation marks can narrow down the search too much.
This is an instance in which we want to use that heuristic algorithm those Google programmers built in to our advantage. Let's use the word "tree", but let's put it outside the quotation marks. There, that gave us twenty returns, and that's a small enough number that we can easily search each item. Sure enough, the thirteenth page listing entitled "LEATHERS Genealogy" contains the phrase "Leathers family line" which is very similar to "Leathers family tree", so it's worth a look. After clicking on the link it takes us to a page with another link at the top that says "View All LEATHERS family members", and if we click on that link it takes us to a great site with a Leathers family tree on it! Sometimes narrowing a search down too narrowly causes us to miss webpages that contain information we want, but that use slightly different words than our search terms. In that case, try putting some search words outside the quotation marks.
It is a good practice to have several search terms in separate quotations. For example, let's go back to our term of "Leathers family" and add to it the term "family history". It is perfectly acceptable to have the word "family" in both terms, since the words are being viewed by the algorithm as independent phrases and not as individual words. If we perform this search we find that we come up with twenty-one returns, the first of which is for a "Leather Family History Society" that it is likely many of you never knew existed before! If you were so inclined you could drop the second search phrase and just use the word "history" outside of the quotation marks. This opens up more possibilities, giving about seventy returns to look at. Mix and match your words using them in quotations and out of quotations. If you find you have more than three total search words/phrases, try using them in different orders.
OK, let's summarize what we learned about using Rule #2 - Use short phrases in quotation marks.
1. † Enclosing two or more words in quotation marks causes the search engine to look for that exact phrase.
2. † Narrowing down the search terms too much can cause too few search returns.
3. † Try using a search term with another search word outside the quotation marks.
4. † Use multiple search terms in separate quotation marks.
5. † When three or more search phrases/words are used, mix and match the order they go in to get new results.
Rule #3 - Limit responses using Boolean search delimiters.
What is a Boolean search and what are delimiters? Literally speaking, it is searching using the logical phrases of AND, NOT, and OR. We're going to add a couple more of these "delimiters" that are not, strictly speaking, Boolean, but perform the same basic purpose, so we'll include them.
The easiest way to think of a Boolean search is to think of how a sculptor works. He doesn't add anything to the stone he is working with. He simply chips away the things he doesn't need and what's left behind is what he was looking to reveal. That is precisely how Boolean searching works. As we saw with Rule #2, sometimes being ultra-specific with your search terms limits the responses you get down to nothing. In that case we will need to use more general search words and phrases. This opens us up to the possibility of getting so many returns that the search is of no use to us. What we need is a way to specify to the search engine the things we DON'T want it to show us so that it whittles away the junk and leaves us with the things we are really interested in.
Now that we have an idea what a Boolean search does, let's see exactly what these delimiters look like. Going back to our very first example, we put the word "Leathers" into Google and got back 553,000 returns. We saw that the vast majority of these returns had nothing to do with our family name, but were related to the material "leather". We want to get rid of these unrelated entries without getting rid of ones we want, and we will accomplish this using the most basic Boolean delimiter of all; the word "NOT", which must be written in all capital letters for the search engine to recognize that we mean it as a Boolean term. A simpler way is just to use the minus sign "-", placed directly in front of the word we want our search engine to exclude from its search results.
Since "leather" was probably the most common word returned, we can exclude it from our searches by adding to our search the word leather with a minus sign in front of it (no space between the sign and the word,) like this: "-leather". This goes immediately after our search word of "Leathers" (there is a space between Leathers and -leather.) Try it and see what happens.
You will notice that by telling Google to eliminate that one word, our returns have fallen from 553,000 to a mere 211,000. Better, but still far too many returns to be of any real use to us. We need to chip away a few more terms. How do we decide what terms to eliminate? Well, we could go down and look at the returns to see what sorts of things are popping up. This might be feasible at the very start of the process, but the more we refine our search the harder it will be to spot words that are frequently found in our search results but are unrelated to our intended target. A better way is to make a few educated guesses.
We know that leather, as a textile or material is what the vast majority of these erroneous search returns are related to. And leather is used largely in the manufacture of clothing and furniture, and these two items are sold in stores. So let's eliminate the words "clothing, furniture, store" from the search results. Try it one word at a time so you can see the results. Eliminating "clothing" brings our results down to 176,000. Eliminating furniture down to 167,000. And taking out the word "store" drops it further down to only 148,000. We could continue eliminating words, like "sale", "price", "quality", and so forth, getting rid of words which are directly related to the retail sales industry. The thing we need to be careful of is that we don't eliminate a word that is also common to the field of genealogy. Some words have applicability to more than one subject, and if we get to crazy we could lose returns that we really wanted to see. And remember, search engines have a ten word maximum for search words, so we can't simply add more "NOT" words indefinitely.
Obviously, just using the NOT delimiter alone isn't going to get us where we want to be anytime soon. We will want to combine it with our other basic search rules. For example, you might use the search words "Leathers" and "family" without quotation marks, and after these use the "-leather" delimiter. Try it and you will see that it drops the returns from 68,700 to 36,700.
This is a good place to introduce the "AND" delimiter, which can also be stated as a "+" immediately preceding a word. What this tells the search engine is that this word MUST be found in all webpages returned, and to ignore any pages that do not have that particular word in them. Let's keep using our example above and add the word "genealogy" without using any delimiter on it. You will see that it returns about 2,180 webpages. Now insert a "+" immediately in front of the word "genealogy", and be sure that there is no space between the + and the word. The returns decrease to about 1,850. This is because before we used the + delimiter we had four search words, which meant Google's normal search algorithm was making up it's own word combinations. Some of them did not include the word "genealogy". By specifying we wanted genealogy in all search returns, Google eliminated those which its algorithm had come up with that did not contain our required word. Pretty simple, huh?
On to the "OR" delimiter. Earlier we used the phrase in quotations, "Leathers family tree" and found it gave us only one return. Similarly, if we used the phrase "Leathers family line" we would get only one return, but it is a different webpage. To get the best of both worlds, put in both of these search phrases and between them insert the word "OR" in all capital letters. You will see that you now get two returns. The OR delimiter tells the search engine to give returns from either one of the phrases, but from BOTH if it can find them! For fun, add a third phrase, "Leathers family history" to the search, and put an OR between it and the other search phrases. It now returns ten webpages to you, all of which contain at least one of the search phrases you entered. The OR delimiter is a powerful search tool that greatly speeds up our searching.
Here are a couple of bonus delimiters that are technically not Boolean, but because they function in a similar way we'll use them. The first is what is known as a "proximity search", which uses the word "NEAR" in all capital letters. Going back to our original example, if we search the words "Leathers" and "family" without using quotations, we get back 68,700 returns. Putting the words in quotation marks drops it down to 166. But what if we still aren't finding what we want? Perhaps the quotation marks have narrowed our search down too far. We need a way to find "Leathers" and "family" in a way that relates the two words. So we insert the word "NEAR" between them. This tells the search engine to return webpages that have those two words close together. They may not be side by side but they are close, and that returns about 17,800 webpages. Still too many to deal with, but now we can begin using our other delimiters and search word techniques to whittle it down.
Here's another great delimiter. Putting the tilde "~" in front of a word tells the search engine to look for all synonyms of that word. For example, if we put it in front of the word "genealogy", the search engine will look for that word and for all synonyms of that word. This is a great way to cheat that ten word limit most search engines have. Using just one word and the ~ lets us search for multiple words without actually entering them in to the search engine.
Yet another trick is to use the asterisk "*" as what is known as a "wildcard". For instance, I might list a search word like this: "educat*". That asterisk tells the search engine to use educat as the root, and to find all words having that root, such as educate, education, educating, educator, etc. This would be especially helpful if you are searching for a particular person's first name but are not exactly certain of the spelling or believe the person may have used some variation of their name. For example, maybe you're looking for a "Ricardo" Leathers, but you think he may have Americanized his first name when he entered the
Let's summarize what we've learned about Rule #3 - Limit responses using Boolean search delimiters.
1. † Use "NOT" or "-" to exclude words from search returns.
2. † Use "AND" or "+" to specify words that MUST be in the search returns.
3. † Use "OR" between two words or phrases to get either one or BOTH returned in the search results.
4.†† Use "NEAR" between two words or phrases to find sites where those words or phrases are close to each other but not necessarily side by side.
5. † Use "~" to get synonyms of your search word returned in the search results.
6. † Use "*" as a wild card with the root of a word to have all variations of that root returned in the search results.
7. † Use of Boolean search delimiters is most effective when combined with a combination of Rules #1 and #2.
If you have followed this basic search tutorial and learned to apply the three rules you will uncover more results pertinent to your searches and do so faster than 99% of the people using the internet. The rules are not difficult. Once you have read through them you should be able to use them referring only to the summaries for reference.