Oakland.pm

Reviews

Review of "Google Hacks"

by Tara Calishain & Rael Dornfest

reviewed by George Woolley


book cover image

Note:

  • Click on the image above to go to the O'Reilly catalog page for this book.

Recommendation

Very good.

If you use Google a lot and haven't mastered the capabilities involved (such as allowed by special syntaxes), likely you'll learn far more than enough from this book to justify the cost.

If you are a programmer and you are primarily interested in this book for what it can teach you about the Google Web API, this is likely a good book for you too. However, I suggest looking around for another review as I didn't use the API or write any programs based on this book. And/or you may wish to take a look at the table of contents and browse the book on Safari.

Chapter Titles

  • 1. Searching Google
  • 2. Google Special Services and Collections
  • 3. Third-Party Google Services
  • 4. Non-API Google Applications
  • 5. Introducing the Google Web API
  • 6. Google Web API Applications
  • 7. Google Pranks and Games
  • 8. The Webmaster Side of Google

Some Special Syntaxes on Google

  • filetype: (to search for hits with a specific file extension)
  • inurl: (to search for hits with something specific in the URL)
  • intitle: (to search for hits with something specific in the title)
  • phonebook: (to search for a phone number or the corresponding person or business)
  • site: (to search within a specified domain)

Notes:

  • The above are some of the special syntaxes you can use on Google.
  • There is an example of each of these special syntaxes in this review.

An Amusing Law

"Ninety percent of everything is crud." -- Sturgeon's Law.

Note:

  • included in the chapter that contains the webmaster hacks in the context of encouraging webmasters to develop good content.

Some Google Events

  • Sep. 1998: opens doors in Menlo Park, CA. (1)
  • ???? 1998: 10,000 queries a day (1).
  • Feb. 1999: .5 million queries a day. (1)
  • Mar. 1999: 71 million pages indexed in data base. (2)
  • June 1999: large infusion of venture capital. (1)
  • ???? 1999: 3 million queries a day. (1)
  • Sep. 1999: beta label comes off website. (1)
  • Jul. 2000: 355 million pages indexed in data base. (2)
  • Mar. 2002: 968 million pages indexed in data base (2)
  • Apr. 2002: offers Google Web API. (this book)
  • Dec. 2002: 3 billion pages indexed in data base. (2)
  • Feb. 2003: 250 million queries a day. (3)

Notes:

Contents

book cover image

Notes:

What's Google?

Well, most obviously, Google is a search engine for identifying web pages of interest.

But you can find more than web pages. You can also find:

  • images
  • news
  • discussion group postings

Google is easily the leading search engine in terms of pages indexed (over 3 billion as of Dec. 31, 2002) and in terms of queries per day (250 million queries per day as of Feb. 2003). In my opinion, Google is also the best in terms of ease of use and the aesthetics of their user interface.

What are Hacks?

In the Preface of this book two definitions are given of what a hack is. A hack is said to be either:

  • a quick and dirty solution for a programming problem or
  • an interesting technique for getting a task done.

According to the first definition, a hack has to do with programming; but according to the second definition, a hack doesn't necessarily have to do with programming.

All the hacks in this book fit the second definition. Nearly half of the hacks in the book also fit the first definition.

What's the Google Web API?

API stands for application program interface. Most users use a browser such as Internet Explorer or Netscape to do their searches on Google. But Google also provides an interface which allows you to write an application program (in Perl, Java, Python or whatever) to access its data base more directly.

Even if you are not interested in the Google Web API, there's plenty else in this book that may interest you.

About the Reviewer

I've been using search engines heavily for a number of years. Google has long been my search engine of choice. I do enough searching that it's been worth my while to do evaluations of major search engines from time to time to make me more aware of how they are changing. In my evaluations, Google has ranked #1 consistently starting with my 2001-09 evaluation.

Although I have used Google a great deal, my usage of it has been generally very simple. More specifically, until I read this book,

  • I didn't use the various special syntaxes at all.
  • I wasn't aware of the many special capabilities related to Google.

I have many years of experience designing and writing programs. I've been using Perl since 1994. So I can follow the parts of the book describing the Google Web API and how to write programs using it. However, at this time, I don't have any significant need that I'm aware of to automate my use of Google, so I only looked at the parts of the book related to that with a view to the future.

I am the webmaster of over a dozen websites, some of them for clubs and some of them personal. As a webmaster, I have many reasons to search my sites and also to search the web as a whole. I'm also concerned with having appropriate traffic come to my sites.

Limitations of the Review

I am reviewing this book primarily as a long time but unsophisticated user of Google. If you are already using most of Google's capabilities, my experiences with this book may not be useful for you.

And I'm reviewing this book primarily as a book for people who simply want to improve there use of Google and do not necessarily plan to write any programs using the Google API.

Who's the Book for?

There are three types of people who I think might benefit from this book:

  • users of the Google search engine from the web interface
  • people who are programmers and are interested in using the Google Web API
  • webmasters

Most of the hacks seem to be oriented towards one of these three interests. The way I see it, there are:

  • 44 hacks for users
  • 48 hacks specifically for programmers and automaters
  • 8 hacks specifically for webmasters

Of course, someone could be all of the above. And you don't need any special skills to use the hacks for users. And I'd say that webmasters in particular likely have a need to do searches about their websites that make most of the user hacks relevant to them.

who the book is not for: I don't recommend this book for people who are trying to learn how to use search engines at all. I don't recommend this book for people who don't have a strong interest in improving their use of of Google as a user unless they wish to automate their use of Google.

who the book is for: I recommend this book for:

  • users of Google who want to improve their use.

I believe the book would also be good for:

  • people who want to use the GoogleWeb API.

However, as mentioned before, I didn't test out this part of the book by writing some programs using the Google Web API.

google logo

Hacks for Users

Well, in this context what I mean by a user is someone who:

The way I see it, 44 of the hacks are relevant to such people. Below I touch on some of these hacks.

special syntaxes: From the book I learned a number of syntaxes that Google allows. The various syntaxes are indicated by a prefixed string in the form

  • <syntax-name>:

Below I give a few examples that I found useful to give you a feel for the "special syntaxes". How you wish to use them may, of course, be quite different.

searching my web domain: I have my own domain www.metaart.org. Within my domain, there are many distinct subdomains. For example one of these subdomains is for Isaland, which is the name of the pages I put together for my granddaughter Isabelle and has the URL http://www.metaart.org/isaland.

First a very simple example. The introduction to Chapter 1 briefly describes each of the special syntaxes in isolation. I wanted to find all the pages in my domain relating to Martha Graham. So I entered:

  • site:www.metaart.org graham

The result showed that there were 6 pages in my domain in three different subdomains that referred to Martha Graham.

Hack #8 describes how you can use the various special syntaxes together and Hack #14 contrasts the use of site: versus inurl:. Among other things, there are many of my children's stories in the subdomain I created for my granddaughter. One of the characters who is in some of those stories is a bear named Luv. To find out how many stories and other places Luv is referred to in, I did the following search:

  • site:www.metaart.org inurl:isaland luv

The result showed that Luv was referred to on the index page for Isaland, in 7 stories and on 2 other pages.

searching outside my domain: I also found the special syntaxes useful for searching outside my domain.

Hack #17 tells you how to use the special syntax phonebook: to consult the phonebook. Given a name, city and state, one can search for a phone number. What I found intriguing though was the reverse look up. Here's an example:

  • phonebook:(510)465-2948

The result showed this was the phone number of the Mimosa Cafe and also provided it's address. From time to time I have a phone number and don't know what it goes with, which is annoying. The reverse look up could come in handy.

There's a part of my website that's devoted to bad web pages. I wanted to find some similar pages elsewhere on the web. One search that I did that got a number of useful results was:

  • intitle:bad intitle:html

searching for images in my domain: Above I was searching for text, but one can also search for images. Hack #31 describes how to do this. To begin with you select the Images Tab (instead of Web, Groups, Directory or News).

To determine what all the images in my domain were, I entered the following search.

  • site:www.metaart.org inurl:metaart

The inurl:metaart part of this is redundant except that site: can't be used by itself. The result showed 138 different images in my domain.

I wanted to find out if there were any gifs in the Isaland subdomain, so I did the following search.

  • site:www.metaart.org inurl:isaland filetype:gif

It turned out there was just one.

searching for images elsewhere: You can also search for images throughout the web.

I wanted to look at some images of rooks (i.e. the chess piece), so I did the following search which worked out well

  • inurl:rook chess

special capabilities: The book describes a number of capabilities that are related to Google but provided by a different group on a different site. Below I give two instances which I enjoyed as examples.

googlisms: I learned about Googlisms from Hack #39. The basic idea of a Googlisms is that you enter a subject phrase representing a who, what, where or when and you get back opinions about that subject. For example, for myself, I got back simply

  • george woolley is a dance lover who advocates connection with your partner and the music

If the idea amuses you, you can try it out at the Googlism site.

google whacking: I learned about Google Whacking from Hack #87. The basic idea of Google Whacking is to find a simple two word query the returns exactly one result. Each of the two words must be in the Dictionary.com Dictionary. You can read more about Google Whacking on the googlewhack site. Or see The Whack Stack for some recent examples.

Hacks for Programmers and Automaters

The Google Web API allows us to query Google's data base directly. Scrapping allows us to process HTML results returned from normal web searches from a browser and saved. There are about 48 hacks related to using the API (directly or indirectly) or related to programming the processing of HTML results returned by Google.

restrictions: The book cautions us that automated use of the Google data using the API is restricted by Google. Some of the restrictions are:

  • you must identify yourself through a Google Web API developer key that you can obtain from Google.
  • you must use the API only for personal use.
  • you are limited to 1,000 queries per day.

Automating non-API accessing of the Google data base is prohibited by Google.

scraping and using the api: Chapter 4 deals with scraping and includes a number of instances where you might wish to scrape Google results as well as providing you with code to do it with. Chapters 5 and 6 deal with the Google Web API. Among other things, these chapters include:

  • a description of the API
  • a summary of the conditions of use
  • many ideas for using the API
  • many code examples (and in a number of languages) of using the API.

The chapters on scraping and on using the API are well written and clear. Because I didn't do any scraping and didn't write any code using the API, I'll refrain from commenting further on these chapters.

my use of these chapters: Before getting this book, I didn't feel any need for automating my use of Google or to scrape HTML results received from, and I still don't. Perhaps now that I'm aware of these possibilities some opportunity will occur to me.

Hacks for Webmasters

Chapter 8 of the book is focused specifically on webmasters. And it contains 8 hacks specifically for webmasters. Most of the hacks for users, discussed earlier are also relevant for webmasters.

webmaster world: The introduction to Chapter 8 includes a recommendation of Webmaster World for on-line discussions related to Google. Here's a small sample of Discussion Conferences you'll find there with a sample question asked in each.

Name of
Discussion Conference
Sample Question asked There
Cloaking "What is the general consensus on the use of cloaking to hide meta tags?"
Domain Names "does anyone know where to find when a company got their name copyrighted?"
Google News "Does your virtual hosting company affect rank?"
Keyword Discussion "How many key phrases per site?"
  • In case you don't know what cloaking is, this book explains what it is.

page rank: Google assigns each page a PageRank, which is a rough measure of its importance. PageRank is one of the factors that Google considers when ordering search results.

There are several hacks that are concerned with PageRank. The one most specifically focused on Page Rank is Hack #95 which goes into how PageRank is calculated.

Here are a couple of insights from that hack:

  • incoming links are always good.
  • links from pages with a good PageRank and few outgoing links are especially good.

A useful link is provided to the pagerank calculator on

importance for me as a webmaster: What I've learned from this book has greatly enhanced my ability to search effectively. The book has also provided me with useful insights (and reminders) on how to best set up sites to get the most out of Google.

book cover image

Note:

  • To see the catalog entry, click on the image above.

Last Updated: 2003-03-22