Calling @crtweet for clarification, please!

May 7, 2009  |  Michael Wurzer

There’s been quite a dustup over the decision reportedly made by the Indianapolis Metropolitan Board of REALTORS® (MIBOR) that their MLS IDX rules against “scraping” also prohibit Google from indexing an agent’s site showing IDX listings.

For a bit of background, indexing is what Google does — it crawls the web and creates indexes of as much of it as it can so that when people search on Google it can return relevant results quickly.  Here’s what Wikipedia has to say about scraping (with some emphases from me added):

Web scraping (or Web harvesting, Web data extraction) is a computer software technique of extracting information from websites. Usually, such software programs simulate human exploration of the Web by either implementing low-level Hypertext Transfer Protocol (HTTP), or embedding certain full-fledged Web browsers, such as the Internet Explorer (IE) and the Mozilla Web browser. Web scraping is closely related to Web indexing, which indexes Web content using a bot and is a universal technique adopted by most search engines. In contrast, Web scraping focuses more on the transformation of unstructured Web content, typically in HTML format, into structured data that can be stored and analyzed in a central local database or spreadsheet. Web scraping is also related to Web automation, which simulates human Web browsing using computer software.

The highlighted sentence is where the confusion begins on this issue.  Scraping and indexing are closely related.  That they are different, however, is emphasized by the important words “in contrast” that follow the “closely related” sentence.  Put together, indexing is “closely related” to scraping but it is “in contrast” to it in what I think are important ways, namely the resulting use of the data.  I’ll expound on this more below, but, for now, back to the controversy at hand.

In responding to the post on Agent Genius, Hilary Marsh from NAR said:

. . . questions have arisen about the scope of the requirement that IDX site operators protect the listings of other participants displayed on their IDX sites from “scraping”. Specifically, whether the policy distinguishes between “malicious” scraping and what might be considered “good” or “benign” scraping. Also, whether “indexing” is a type of scraping. The Center for REALTOR® Technology (”CRT”) advised that while the intent of “scrapers” may be malicious, and the intent of “indexers” good, the two practices from the Web server’s view appear to be the same. Consequently, NAR staff responded to questioners that the requirement to prevent scraping includes indexing.

So, the rub of the issue is that MIBOR punted the ball back to NAR, which asked CRT, and CRT (as a technical body) said, technically, there’s no difference between scraping and indexing.  Of course, as is clear from the above Wikipedia definition, CRT is right — there really is no distinction from the perspective of the computer activity between scraping and indexing.  Both processes read the web site and do stuff with the data.

However, focusing on the technical process here is wrong.  Instead, the important distinction is between the results of the activity.  Here is perhaps a compelling explanation of how these two are different.  When you go to visit a web site, your web browser reads the web site and displays the information back to you.  In fact, most web browsers store a copy of that site on your computer so that it can display it back to you faster if you look at it again later.  From a technical perspective, your visit to the web site and your browser caching the content locally on your computer is not very different from what a scraper does.

However, nobody is going to argue that web visitors are scrapers.  Why?  Because of their intent and what they are doing with the data.  A consumer looking at content is a good thing.  So, too, I would suggest is Google indexing the web and real estate content.  Google is not (at least today) taking the content and presenting it as their own creation.  Instead, they are linking back to the source of the data, which provides a critically important service to the web site being indexed.  This is what the web is all about and so interpreting indexing and scraping as the same thing results in the leap backward the commenters on the Agent Genius post decry.  It’s an undoing of the web for IDX sites, which have become critically important to agents and brokers today.

Before concluding this post, however, I also want to point out that not every one agrees that Google’s indexes are positive or even benign.  In Belgium, a court has ruled that Google’s News service violates certain newspapers copyrights.  In hailing the opinion, the winner of the case is quoted by the New York Times as saying:

”Today we celebrate a victory for content producers,” said Margaret Boribon, secretary-general of Copiepresse. ”We showed that Google cannot make profit for free from the credibility of our newspaper brands, hard work of our journalists and skill of our photographers.”

Could a similar argument be made by MLSs or listing agents about Google indexing listing data?  Possibly.  However, I think getting a similar ruling from a US court is unlikely.  (Any lawyers out there who know the law on this, please comment to clarify, because I’m definitely no expert here.)

More importantly, our industry has accepted the web as its friend and Google is accepted as a critical part of the web.  To many, in fact, Google is the web.  What’s wrong with the MIBOR decision and CRT’s narrow, technical interpretation that led to MIBOR’s decision, is that it goes against the many decisions that have already been made that the web is the real estate industry’s friend. That decision cannot be unmade.  It’s done.  Rule interpretations like that provided by CRT, however, do result in NAR members not being able to compete.  As many on Agent Genius have commented, Trulia, Zillow and are not hamstrung by this same interpretation of the IDX policy, which only hinders and restricts NAR’s members.  That’s wrong.

Fortunately, we live in a web world and, for many, that means we know each other personally. Most of those commenting over at Agent Genius have met, know and greatly respect Chris McKeever (@crtweet on Twitter), who now heads up CRT.  My hope is that Chris can join the conversation and clarify CRT’s interpretation or let us know why the current interpretation is best.  I’m asking for this conversation with the greatest respect for Chris and everyone at CRT.  MIBOR put them on the hot seat but perhaps there’s a possibility the conversation can result in greater understanding for everyone, and hopefully a quick clarification on this critically important matter for MLS organizations that haven’t yet interpreted the policy on this issue.

16 Responses to “Calling @crtweet for clarification, please!”

  1. Mike,

    (Since I often say that I’m not speaking for my employer…I’ll say the opposite now. I am speaking for my employeer, or at least in my capacity there.)

    I think you’ll be unsurprised to learn I was involved in these discussions as well. Actually, it was more CTO/ITS than CRT, but people sometimes forget I have a new job, and Mark is still technically part of CRT as the CTO, but that’s all semantics. (And one hell of a poorly written sentence.)

    In any case, our role in this was to explain the technical details to the rest of staff to help them interpret the ruling. All the arguments you bring forth here we argued as well, including the browser being a scraper technically. (I really like that we think alike there.)

    However, the intent of the paragraph was that the realtor with an IDX site must take steps to prevent scraping. Since scraping and indexing look the same without taking a lot of additional (potentially expensive and irritating) technology hurdles (which can be circumvented by the black hats easily anyway) it was the decision of NAR to rule the way it did. As it was explained to me, it comes down to the question of intent of who “took” the data, and that’s not something that we can affect with policy. I won’t fully go into how the sausage is made, but the outcome is what we’ve seen.

    Anyway, to my recollection, which may be incorrect so Cliff is the expert here, was that a complainant at the local board said that Google was a scraper and a peer was not taking steps to stop it. That, as NAR has stated, is a valid interpretation of the policy.

    Is it the right thing? I have my feelings, which you can guess on that, but sadly its not my call. (But when I take over the world…MUH-HAHA…)

    Should the policy be changed? That’s up the members. Again, we’ve been accused of punting with statements like that, but that’s really where the power lies.

    I feel bad that it came out that CRT alone made this as that certainly was not the case. Chris has been doing an excellent job since he got CRT and I’m afraid of any negative feelings he gains for this. They probably really belong to me.


  2. Thanks a lot for clarifying that this is a policy interpretation, Keith. As I mentioned in the post, I have a great deal of respect for everyone who is and has been involved with CRT. You all have contributed to the real estate technology community an enormous amount, and continue to do so and I don’t want anything I’ve written to detract from that. Certainly, everyone involved in the real estate community knows or should know that CRT is not anti-Google or anti-web and certainly wants to help brokers and agents compete on the web. In fact, that’s been CRT’s reason for being.

    Because this is an MLS policy interpretation, I think Jim Duncan’s comments over at Agent Genius are very important. Most of the members concerned about this issue have no idea how to go about addressing it. Hilary referenced an email address for Cliff ( to suggest changes to the policy. Perhaps someone else at NAR who knows the process better can answer whether these emailed suggestions will be considered at the MLS Policy Committee meetings Thursday next week in Washingont at 10 a.m.. That would be a great way to bring this issue up, if it’s possible at this time.

    My two cents are that the policy does not need changing or clarifying. It’s already clear to me that scraping is not indexing for all the reasons above. If I’m correct, then the policy does not need to change but the interpretation offered does. I think this difference is critically important, because any change to the policy will take a long time to work its way through the process but the interpretation can be clarified immediately.

  3. […] Read the original here:  FBS Blog » Blog Archive » Calling @crtweet for clarification, please! […]

  4. Jay Thompson says:

    Keith – I won’t hold anything against Chris, or the CRT. Now you on the other hand… 😉 Kidding. I appreciate your candid response.

    Mike – THANKS for posting this and pulling Keith into it. Todd Carpenter managed to convince some high level folks to fly me out to DC for the meeting, where I’ll have an opportunity to express my thoughts (and hopefully make some connections to affect real change in this policy interpretation). And given my, uhm, let’s just say “vocal opinion” on this matter, I have to hand it to the NAR for being willing to listen (unless the real plan is to meet me at the airport and dump me into the Potomac..).

    I emailed Cliff as well, and got a swift response where he promised to deliver my message to the committee (and in fact he copied them on it). I’ll post Cliff’s response when/if I get permission from Cliff to do so.

    I think I’ve made my thoughts clear — in short, the NAR blew this call. That they are at least willing to listen is a step forward. A big step.

    (If I’m not back by next Friday night, someone call the Coast Guard to dredge the river).

  5. Thanks for the update, Jay! I look forward to seeing you in DC. The community is lucky to have you representing them at the event — you’re knowledgeable and passionate, which is the best combination.

  6. Mark Flavin says:


    I think you hit the nail on the head. The analogy you drew between ambivalent browser, google indexing and evil scraping is similar to discussions we have had within our own MLS meetings.

    Its unfortunate that often times policy and the intent behind it seem to clash. I for one can empathize with Keith and Chris’s position in this case wherein the stated policy was in direct contrast to intended effect depending on how it was read.

    Ultimately I am sure this will get worked out as MIBOR is in direct violation of their own policy. They provide a feed via Listhub which does not prevent indexing via google

    If this does not get worked out then there will need to be a lot of engineering efforts made to resolve the issue for all IDX service providers.

    Keith, Mark and Chris beyond being very knowledgeable are extremely approachable as you well know. It is truly unfortunate that in the fallout CRT arguably one of the more valuable services provided by NAR is getting painted with a bad brush.

    Excellent blog entry Mike I love reading your stuff and I cannot wait to see how this all evolves.

  7. Duane Sauke says:

    I want to thank all of you in the conversation about this issue. As most of us can not participate directly, you all serve us by your high level thinking and actions. Please know that there are many agents and brokers that know enough about this to be concerned and our only solace is that great thought and practical analysis will true the bearing of our path.

  8. […] MLS meetings. With short sale issues burning and some MLS providers wanting to classify Google as a “scraper” this all adds up to nothing but […]

  9. […] Did Google scrape my website? MIBOR and NAR have no grasp of internet technology and how it works! FBS Blog Cliff Niersbach NAR response to Google spidering listings in Mibor MIBOR and NAR ReTechulous – NAR […]

  10. […] has been all the rage the past week since it hit Agent Genius (370 comments now and coverage on a variety of blogs and forums). The issue was discussed at NAR midyear (Jay Thompson’s coverage), and […]

  11. Mary Englund says:

    The idea that Google’s indexing of an IDX site links back to the source of the data isn’t exactly correct. It may link back to the IDX site they got it from, but the source of the data is the listing broker, and they are actually the owner of the listing Google is displaying under another company’s name. As part of IDX, we’ve agreed to share listing data with each other on our websites, but we did not agree to allow competing brokers to display that data on Google using their brokerage name as the source (or “linkback” as you call it). I found my listings all over Google under a competitor’s name and was shocked. Because my IDX comes in a frame through flexmls, google doesn’t pick up the data and I don’t have the same opportunity to display this competitor’s listings on Google. This particular broker spends a lot more money and has an IT department was able to display the data in a way that Google could easily pick up all the IDX listings and link back to their company’s website as the source. Whatever the technical issues are, it’s a business decision that Realtors should make based on the business impact of what’s happening. You may feel that customers should have access to everything they want whenever they want, but as a business owner I totally disagree. No other business would even think of displaying publicly details of a contract they have with a specific client. If I acquire the listing (a business contract) it should be my decision where, when and how it is displayed, and keep some details accessible only through me. If it goes on Google, I want it to be under my company’s name and not that of my competitor. If ReMax wants their listings to show up on Google under my company’s name, they too should have that choice if they feel it’s in their company’s best interest. It’s easy to look from the outside in and say that giving consumers access to everything should be the goal. But after losing nearly 40% our local brokers in this new financial downturn, many Realtors should be looking at what’s best for our industry’s business model first, since without brokers working there will be no IDX to discuss.

  12. Jay Thompson says:

    Mary wrote: “If I acquire the listing (a business contract) it should be my decision where, when and how it is displayed, and keep some details accessible only through me.”

    Personally, I think it should be my clients decision as to where, when and how THEIR listing is displayed.

    And without fail, 100% of the time, my clients want the fact that their home is listed for sale to be “anywhere and everywhere” possible.

    “If it goes on Google, I want it to be under my company’s name and not that of my competitor.”

    I could care less under what name my clients listing appears. I have ONE goal in taking a listing — to get it sold. I *wish* it would appear on Google one time for each of the 35,000 agents in Phoenix.

    “many Realtors should be looking at what’s best for our industry’s business model first”

    I think many Realtors should be looking at what’s best for their clients.

  13. Mike (and others): I think NAR and MIBOR’s current approach might be at least somehwat justified. I’ve begun a series of posts at MLSTesseract to discuss this issue in more detail.