Writer Turned Web Developer: 2007

Monday, December 10, 2007

ALT Tags According To Google's Matt Cutts

I had been doing a lot of research about how to properly tag images since the summer when I read an article in Digital Web about Google Universal Search. There was suprisingly little information available at the time, considering how big social photo sites like Flickr have become.

Finally, Google is speaking vis a vis Matt Cutts -- head of Google's webspam team. In Cutt's ALT tag video he talks about how you can use ALT tags to tell Google more about what is in an image file and boost it's page ranking.

Here are some highlights:

Including ALT text with your images will help Google to better identify its contents

ALT tags do not need to be long. Cutts' tag was 7 words. Even 25 is too much.

An ALT tag used in conjuction with a descriptive filename (cat.jpg versus ds134.jpg, for example) will improve the searchability of your images even more

Thursday, November 8, 2007

Quality photos can be found on social bookmarking aggregator site PicURLs

I began work on a site earlier this year that an excellent photo gallery, but hadn't yet explored any SEO tactics to promote it. My first instint was to look at getting the images on one or more of the social bookmarking sites--specifically Flickr because another site I had worked on in the past had gotten a lot of exposure when we posted our images with them.

One of the questions I got about the images users would be provided if they typed in a keyword like Wall Street or Microsoft Windows was "How do you know what kind of image is going to come up." The truth is that there is a lot of randomness. A hundred images can be tagged "Wall Street," but of those about 50-60 are probably not relevant to the user's search and another 20 or so are of such poor quality that they shouldn't have been uploaded in the first place.

But now there is PicURLs, a site that aggregates photos from the nine most popular photo social bookmarking sites and displays the most popular images for each site so that only quality photographs and well-tagged images are shown. This only works if users tag photos well and appropriately. But I have faith in the users of social media. While some tag photos indescriminately to gain exposure, many understand how important good tags are to user experience. And that is why I think we will be seeing a lot more from PicURLs, and other aggregators like it, in days to come.

Saturday, November 3, 2007

Posting Duplicate Content Must Be Harmful To Our Google Ranking

I am responsible for content development for one Web site that has taken an inexplicable dive in traffic over the past two and a half months and after scores of conference calls with everyone in the company from IT to SEO to the head of metrics we still haven't been able to figure it out. One thing we have been able to isolate is the fact that over the years the magazine has changed the domain a couple of times before finally deciding on its current URL a few years ago. I rediscovered this Google blog "Dealing deftly with duplicate content" today which has some helpful tips, but am suprised at the writer's last comment:

Don't worry be happy: Don't fret too much about sites that scrape (misappropriate and republish) your content. Though annoying, it's highly unlikely that such sites can negatively impact your site's presence in Google. If you do spot a case that's particularly frustrating, you are welcome to file a DMCA request to claim ownership of the content and have us deal with the rogue site.

With Google cracking down on all the rogue linking schemes one can't help but think that maybe this employee isn't in touch with reality. Maybe this post is just out of date. In any event, the other tips are very helpful. Here they are:

Block appropriately: Rather than letting our algorithms determine the
"best" version of a document, you may wish to help guide us to your preferred
version. For instance, if you don't want us to index the printer versions of
your site's articles, disallow those directories or make use of regular
expressions in your robots.txt file.
Use 301s: If you have restructured your
site, use 301 redirects ("RedirectPermanent") in your .htaccess file to smartly
redirect users, the Googlebot, and other spiders.
Be consistent: Endeavor to
keep your internal linking consistent; don't link to /page/ and /page and
/page/index.htm.
Use TLDs: To help us serve the most appropriate version of a document, use top level domains whenever possible to handle country-specific content. We're more likely to know that .de indicates Germany-focused content, for instance, than /de or de.example.com.
Syndicate carefully: If you syndicate your content on other sites, make sure they include a link back to the original article on each syndicated article. Even with that, note that we'll always show the (unblocked) version we think is most appropriate for users in each given search, which may or may not be the version you'd prefer.
Use the preferred domain feature of webmaster tools: If other sites link to yours using both the www and non-www version of your URLs, you can let us know which way you prefer your site to be indexed.
Minimize boilerplate repetition: For instance, instead of including lengthy copyright text on the bottom of every page, include a very brief summary and then link to a page with more details.
Avoid publishing stubs: Users don't like seeing "empty" pages, so avoid
placeholders where possible. This means not publishing (or at least blocking)
pages with zero reviews, no real estate listings, etc., so users (and bots)
aren't subjected to a zillion instances of "Below you'll find a superb list of
all the great rental opportunities in [insert cityname]..." with no actual
listings.
Understand your CMS: Make sure you're familiar with how content is
displayed on your Web site, particularly if it includes a blog, a forum, or
related system that often shows the same content in multiple formats.

Sunday, October 21, 2007

Everything you need for Googlebot to work correctly

It seems that the bloggers at Google have been talking a lot about Googlebot, so I am going to do a little link roundup and maybe come back and do a post or two about one aspect of Googlebot or another. Having read each of these entries in detail, it seems that taken as a group a Web developer or Webmaster should have just about everything they need to make sure Googlebot is functioning correctly on their site.

All about Googlebot - Q&As about about robots.txt files and Googlebot's behavior fielded by Google Webmaster Central founder Vanessa Fox.

How to verify Googlebot - Google SEO specialist Matt Cutts talks about how to determine a bot is authentic.

Learn more about Googlebot's crawl of your site and more! - Vanessa Fox discusses new additions to Google Webmaster tools meant to help the Webmaster track the bot better.

Googlebot activity reports - Google blogger explains how the company tracks the amount of traffic between Google and a given site.

Better details about when Googlebot last visited a page - Vanessa Fox breaks this very confusing subject into excellent detail.

Monday, August 20, 2007

Vanessa Fox Talks In Detail About Googlebot

As a Web developer I'll never cease my quest for the holy grail: understanding just how Google takes all those URLs it crawls and turns them into a tidy little list of serch engine result pages (SERPs).

I know I have a pretty good understanding--relative to that of the layman--but the more I learn the more I find that I don't know squat. But when Vanessa Fox, founder of Google Webmaster Central, authors a blog entitled All About Googlebot, I know I am going to learn something valuable.

I haven't been reading Fox's blog for long. But when I have read something she has written or heard one of her speaches I have found them to be refreshingly open and honest about Google's practices. And while Google may need to protect it's "secret sauce," we Web developers know that a lot of changes the company makes to its algorithms are in an effort to stop black-hat practices. So it's nice to have someone like Vanessa Fox out there to lend some insight to those of us who are doing things the right way and for the right reasons. Here are some Q&As from the recent Search Engine Strategies Conference she shared in her blog:

If my site is down for maintenance, how can I tell Googlebot to come back later rather than to index the "down for maintenance" page?

You should configure your server to return a status of 503 (network unavailable) rather than 200 (successful). That lets Googlebot know to try the pages again later.

What should I do if Googlebot is crawling my site too much?

You can contact us -- we'll work with you to make sure we don't overwhelm your server's bandwidth. We're experimenting with a feature in our webmaster tools for you to provide input on your crawl rate, and have gotten great feedback so far, so we hope to offer it to everyone soon.

Is it better to use the meta robots tag or a robots.txt file?

Googlebot obeys either, but meta tags apply to single pages only. If you have a number of pages you want to exclude from crawling, you can structure your site in such a way that you can easily use a robots.txt file to block those pages (for instance, put the pages into a single directory).

If my robots.txt file contains a directive for all bots as well as a specific directive for Googlebot, how does Googlebot interpret the line addressed to all bots?

If your robots.txt file contains a generic or weak directive plus a directive specifically for Googlebot, Googlebot obeys the lines specifically directed at it.

Saturday, July 14, 2007

Why The Web Can Never Replace Print Media

I read a shocking statistic in a book I have been reading lately, Clear Blogging: How People Are Changing the World and How You Can Join Them. Author Bob Walsh refences the Web site SaveJournalism.org, which referenced a reported 44,000 news industry emloyees who had lost their jobs due to layoffs and cutbacks in recent years. I took a look at the site and wasn't able to find these figures, but found another very informative site about the massive layoffs that have occured in the U.S. since the dot.com bust. I Want Media's Layoff's page has numbers going all the way back to May 2000 when CBS layed off 24 employees in its CBS Internet Group.

I know there are a lot of reasons why the media business is in turmoil, and I won't try to pretend that I know even the half of it. Since this blog is dedicated to editorial content on the Web I can't help but wonder why layoffs continue in the name of concentrating on the Web as the most viable business model. It is true that a great deal of people get their news primarily from the Web these days. But according to the Pew Research Center for the People and the Press, the self- billed independent public opinion survey research project that studies attitudes toward the press, politics and public policy issues, only slightly more than 26% of the online news audience got their news on the Web every day in 2006, down from almost 35% in 2005 (see "The Percentage Who Get News Online Everyday, 1995-2006.")

So what about the other 75%? I would have to say that the people in this group are those individuals that I sit next to on the subways every morning who put their Blackberrys in their holsters while they read the day's edition of the New York Times. They are the people who read the cover story of the latest issue of Forbes to the end and rip it out (as much as it pains me, since I don't even dog-ear pages of my magazines) for a friend.

I saw a special on PBS (sorry, no links guys) that talked about how some well-faring newspapers were able to figure out the balance between what content is best presented in print and online. Newspapers really had trouble after the dot.com boom because all those classifieds that readers and newspapers counted on so dearly went straight to the Web--Career Builder having the biggest draw for the employement section of the classifieds.

Doing classifieds on the Web makes sense. It makes searching for jobs, garage sales, etc. much easier for the reader. But there are some things that you just can't do well online. In one usability study, Jakob Nielsen and Noa Loranger, co-authors of Prioritizing Web Usability, found that participants with "high" Web experience spent an average of 25 seconds on a Web site's homepage and 45 seconds on a site's interior page. Given that users could theoretically read approximately read 200 words per minute, they said, the user would spend more of that 25 to 45 seconds scanning the page's navigation than reading the text.
So where is the balance? Local newspapers are letting the national newspapers and wires tell the big stories and focusing on the really local news for their print editions. Hyperlocal is a term I have heard thrown around a couple of times, but I haven't been in the newspaper biz for a couple of years now. Moms and Dads will always go out and buy a dozen copies of the paper when little Billie's soccer team gets a spread on the cover of the sports section.

Magazines are not daily publications. Therefore it's natural to concentrate on the kind of trend reporting makes readers take the staples out of the binding and tape a foldout to the wall of their cublicle.

There is some content that you would think is too sacred to get caught up in this whole print versus Web controversy. Former U.S. News & World Report chief political correspondent Roger Simon didn't think he was going to be asked to leave when the weekly news magazine cut 10 staffers in October 2005 to trim the fat and focus on its coverage on the Web. In U.S. News Gives a Top Political Writer the Pink Slip Simon says, "There were rumors that we were going to have layoffs, but I really didn't think I'd be among them. I thought I had a terrific year."

Simon had just won a National Headliner Award.

Wednesday, July 11, 2007

The Importance Of Clear Hyperlinks To SEO and Usability

When I took Web design 101 almost a decade ago my professor told us nothing about how important it is to not only include authoritative and direct hyperlinks in the text on our sites (that is links that not only link to a company's Web site, but to the area of a reputable entity's site where the product or service being discussed is displayed).

Today, users expect clear hyperlinks. "Click here" does nothing to:

Provide the user with information and/or a service in a timely manner
Increase the SEO of the site
Encourage the user to navigate to other areas on the site

But writing clear hyperlinks is no easy task. As writers it requires us to step outside our creative thought process and go back to the research process to think of the terms we used when we were creating that very same story.
So while I am speaking of writing text for hyperlinks, I am going to refer to the basic prinicple of writing for the Web, which I think is best explained by the great usability expert, Jakob Nielsen. The first duty of writing for the Web, Nielson says, is to write to be found.

Some of Nielson's tips:

Precise words are often better than short words, which can be too broad to accurately describe the user's problem.
Use keywords that match users' search queries: Queries are typically 2 to 3 words long
Supplement made-up words with known words: It's tempting to coin new terms because you can own the positioning if the term catches on. But, more likely, people will continue to use their old terminology.

Getting back to hyperlinks. Jagdeep.S. Pannu, Manager of Online Marketing at SEORank, a leading Search engine optimization services company says that the hyperlinked text is actually searched by Google when its spiders crawl the site every 15 to 45 minutes or so. Here is what he had to say:

"The inclusion of important keywords in the anchor text can make a big difference in the final ranking of your site pages. All search engines that matter, give significant weight to the anchor text on your pages. In fact Google even has a special operator: ‘allinanchor:keyword’, which picks up text only from within the anchor text of indexed pages. This further implies that Google’s algorithm is configured to index anchor text as separate queryable data, thereby making it evident, that Google considers it an important pointer to page relevance. Our internal research leads us to believe that weight given to anchor text has been raised recently in the Google algorithm. With these changes, it is possible to enhance your website’s ranking by using the right keywords in anchor text."

For the purposes of news writing, I believe it's important to keep Nielsen and Pannu's advice in mind, but here are some of my tips.

When linking to another story always use the headline. Topical headlines are searchable, so readers are more apt to find them in search engines. And by using the headline in the hyperlink, you will naturally describe the story before and after the link, therein providing the user and search engines with keywords that will clue them in to the nature of the linked text.
When linking to a company drill down as close to the product or service as possible. If the company has released a product, link to the page for that product--not to the company's home page.
When referring to a report or white paper, link to it by name. Google the paper and find the proper name. Doing so will be of great use to the reader and to the site's SEO.

Friday, July 6, 2007

Google Employee Gives Advice About Best Uses of Flash

We've all been taught that Google is, in essense, a "blind user" and I had heard that it couldn't search for the content contained in Flash, so I have always recommended against using it in page designs. However, I am hearing that Google is making an effort to search Flash content (or at least the content surrounding the flash design), so when I saw Mark Berghausen's post, "The Best Uses of Flash," I was intrigued. He says:

As many of you already know, Flash is inherently a visual medium, and Googlebot doesn't have eyes. Googlebot can typically read Flash files and extract the text and links in them, but the structure and context are missing. Moreover, textual contents are sometimes stored in Flash as graphics, and since Googlebot doesn't currently have the algorithmic eyes needed to read these graphics, these important keywords can be missed entirely. All of this means that even if your Flash content is in our index, it might be missing some text, content, or links. Worse, while Googlebot can understand some Flash files, not all Internet spiders can.

Berghausen recommends:

Using Flash only where needed: This is a recommendation the great usability expert Jacob Nielson has been touting for ages (Check out his article "Flash: 99% bad." I can't recommend his work enough.)

Using sIFR for to display headers, pull quotes, or other textual elements. I disagree here. As a strong advocate of usability, I don't think that bells and whistles like flash or their counterparts should be used for textual elements for a variety of reasons. One is because of the critical nature of those textual elements to search, especially the header. If a designer uses flash or sIFR to display a header it is not likely that they will display that element again as text because in most cases it will not be aesthetically appealing. But this is what needs to be done for that element to be properly picked up for search. Another reason is that a flash element slows down the speed that the page loads. Visitors today have high demands when it comes to viewing pages, and when it takes even a couple moments for a page to view, or worse the page has loaded and another element or elements is still loading, visitors exit. Additionally, as more and more visitors "information snack" having content available in those first few seconds is critical because those visitors especially are guaranteed to stay on your site for only a few moments before going on to another domain.

Non-Flash Versions: Flash used is as a front page "splash screen" where the root URL of a website has a Flash intro that links to HTML content deeper into the site. This recommendation seems to make sense for the designer who absolutely insists on using flash and the developer who is assured that their audience has the hardware and the internet connection to load the page speedily enough that they won't depart because the page loads so slowly they leave as a result. And becayse the page links to HTML deeper on the site SEO remains intact.

Can Self-Promotional Comments Be As Harmful As Spam To Our Blogs?

Today I encountered the first piece of spam in Wall Street & Technology's WS&T Blog. I'm not going to post it here since I am having difficulty removing the comment, but the staff knows which post I am referring to.

The comment in question wasn't the sort of thing that I would have suspected as spam at first glance. It contained some information about a publishing company none of us had heard of and described their newsletters. At the end there was information for readers to subscribe. There wasn't one reference to an erectile dysfunction pill or an interest-free mortgage.

The author of the blog e-mailed the group to say that readers sometimes write partially self-promotional comments. She said she didn't object if they are relevant. But she questioned where the line is.

The writer's e-mail made me think. Should we be discouraging readers from engaging in dialogue with us for fear that their comments may be construed as too promotional or self serving? Here's how I responded:

"I am torn. If it were the sort of thing where someone had posted a thoughtful and authentic comment (even if it were self-promotional) and suggested at the end to subscribe to their newsletter I wouldn't think twice about taking their post down."

As soon as I pressed send, as I do with so many e-mails, I re-thought what I had written. Would I really have not given a thought to leaving the comment on the blog? I have been reading a great book lately called Clear Blogging: How People Are Changing the World and How You Can Join Them. The author, Bob Walsh, talks about what he calls The dirty little secret in his blog Clear Blogging. The dirty secret is basically the fact that spam makes up 94% of the comments submitted in the bloggosphere.

While a piece of promotional copy submitted as a comment on our site alone would not have been a problem, given the wording of the comment (and the fact that I have become so engrossed in Walsh's book that I haven't turned on my TV since Sunday) I decided to heed the expert's advice:

"React to spam on your blog like you would react to finding a cockroach on your arm - kill it instantly. Yes, I know stopping your work to kill spam crawling up your blog is terrible for productivity. I can't prove it, but I'm convinced that spammers track which blogs kill their spam comments and trackbacks and how quickly. If you don't react fast, they swarm you faster than you can say rabid rat attack."

I don't want to attack potential readers, but I certainly don't want our sites to be attacked by spam.

Writer Turned Web Developer