Oakland Perl Mongers

Reviews

"HTTP: The Definitive Guide" Review

by George Woolley


Recommendation

If you want to learn more about HTTP, its core technologies, and the context in which it exits, this book will likely serve you well.

http: definitive guide cover

some valuable links for bot writers

Notes:

  • The links above are from the book.
  • This is a tiny fraction of links in the book.
  • Likely you'll focus on different links.

Contents

What's HTTP?

When I started this process, I knew a bit about HTTP, but not much.

HTTP stands for Hypertext Transfer Protocol. Typically a browser sends an HTTP request message to a web server for a particular web object (such as an html file or an image file), and the web server sends back a response message which includes the requested object. HTTP defines a format for such messages and also provides information on how to use them.

About the Reviewer

I'm a long time webmaster, software developer, and training developer. I've written a number of CGIs which are operational. I had a job for about a year where I did nothing but write small Perl programs to process web logs.

On the other hand, I'm not deep into HTTP. And my websites and the websites I've worked on have always been hosted by someone else. I've never been a system administrator and have not been deeply into any of the various protocols related to communications.

Also, I'm a big supporter of O'Reilly and of open source. I own lots of O'Reilly books too, especially Perl books.

How I Used this Book

I have used this book to:

  • increase my general awareness of various aspects of HTTP and its related technologies,
  • provide useful perspective and detail for writing web bots.

Later, there is a section on how each of these uses has worked out.

Based on working through the book, I believe the book will now serve as an excellent reference. If the answers I seek are not in the book, I believe I'd find them in the various references many of which are on the web.

Who's this Book for?

If you want to learn more about HTTP, its core technologies and the context in which it exits, this book will likely serve you well.

This book is a guide that provides perspective and a considerable amount of detail. It can be used as a reference but it isn't a pure reference. If you want a pure reference which assumes you know the why of things, this book is likely not for you. This book is well written and quite readable, but there's a lot here. Absorbing it may take some time and hard work. If you want an easy read, this book may not be for you.

In my opinion you can make better use of this book if you follow some of the links (and other references) provided.

http: definitive guide cover

Learning More about HTTP

This section, and the one that follows, are included as example uses of this book. Your usage may be different.

My aim here was not very explicit. I just wanted to expand my general awareness of HTTP and related technologies. HTTP is so basic to web that I'm convinced that will pay off many times.

I read all 21 chapters of this book shortly after acquiring it, although not in order. I then worked through the 8 appendixes. By the end of this somewhat arduous endeavor, I felt that I had greatly increased my knowledge of HTTP. I now have a better sense of HTTP and the architecture of the web. And I have a much better idea of what to ask and where to look when I need to expand my knowledge in the future.

Some characteristics of the book that helped me in my endeavor were:

  • lots of perspective and explanations of why
  • lots of detail where needed
  • clear organization at every level
  • logical clustering of chapters into five coherent parts
  • managable sized chapters (from 9 to 36 pages)
  • many helpful graphics
  • a large number of references, many of which are links which means that I can have them quickly before my eyes

If you wish to expand your knowledge of HTTP, you may be encouraged by my experience. But you may not wish to read the book from cover to cover as soon as you acquire it (or perhaps ever). Well, I didn't read the chapters in order. I jumped all over the place. My impression is that the chapters are sufficiently independent that you could read the chapters over a longer period of time, leaving out ones that don't interest you. Unless you are very knowledgable about HTTP, you might be wise to begin by reading Part I though, or at least the first three chapters.

Learning More about Bots

This section describes an example use of this book. Your usage may be different.

My intent here was fairly specific. I've been looking at writing some bots in Perl. By bots I simply mean automated user agents. By user agent I mean a mechanism that accesses web content on the users behalf. (E.g. a browser. E.g. a search engine robot.) The bots I'm interested in writing are similar to search engine robots in that they are automated and will examine HTML. But only one of the bots I have in mind does any kind of recursive searching, and even that one would do that only within a website. And all of these bots would be searching for specific information.

For this concern, the most relevant chapter is "Web Robots". Especially relevant was the information and perspective on:

  • recursively following web links
  • guidelines for robot operators
  • how to identify your user agent
  • logging the actions of your user agent
  • testing your user agent

Part I (which consists of four chapters) is also quite relevant. It introduces the HTTP protocol and describes its core technologies, giving valuable perspective as well as some specific detail. Some of the most relevant specifics had to do with:

  • the methods (esp. GET & HEAD) in request messages that indicate the particular action requested
  • the status codes returned in response messages that indicate success, not found or whatever

While reading the whole book to expand my exposure to HTTP, I kept a checklist of insights and information that I thought would be useful in designing bots. Some of the items on it had to do with:

  • getting the most recent version of a page
  • using the various headers that occur in HTTP messages and affect their interpretation

For the latter, the "HTTP Header Reference" appendix was very useful.

This book served my purpose well. I came to appreciate the mix of perspective and hard detail that this book provides.

You may not be that interested in bots. But you may be encouraged by how well this book served me in providing useful information about a specific concern.

The O'Reilly Page on the Book

This book links to the O'Reilly page about this book. The page is worth looking at.

There is a Safari search mechanism there that will search the book. I did several searches on the book and found the results useful. Note that the references are in terms of chapter and section rather than page number. Now that I've done a few searches, I prefer this because it provides me with information about the context of the use and is usually a smaller area of text than a page.

There are also some other worthwhile things on the page, especially if you haven't yet purchased the book.

http definitive guide cover


Last Updated 2003-01-29