Saturday 10 November 2007

Blogger Magic: Where does all the data go?

Try this simple trick.

Take a recent article in CairnsBlog – pick any article.... then select a couple of sentences. Pick any sentences!

Choose sentences unlikely to have been written elsewhere. Then copy that text – and paste it into a Google searchbox.

Put the whole block of text inside quotes (inverted commas at each end). That tells Google you’re looking for the exact phrase – not just all those words in some kind of combination on a webpage.

Now click ‘Search’!

What happens? If your reality is anything like mine, you see a single listing to the article in CairnsBlog from which you copied the text.

A couple of qualifications. Firstly, for some reason, the trick doesn’t seem to work if you begin or end with half a word. Hence “Blah, CairnsBlog, blah orange and ap” won’t work, if it was extracted from “Blah, CairnsBlog, blah orange and apple”. Google seems to read (potentially meaningful) whole words – not meaningless single characters. Store that thought.

The second qualification is that if the article was just posted, it may not yet be in Google’s publicly-accessible database. Nevertheless, I think you’ll find it’s very fast! Try a section of text from the most recently-posted article. A few hours and the data is retrievable from Google.


Now try a similar experiment.

Open the comments on a recent article. Copy a chunk of text – be fair, whole words in the given sequence, just as before. Now past that into Google and click Search!

What do you see? If your reality is anything like mine, you’ll see this message “Your search - "Blah, blah, blah… etc" - did not match any documents.”

More Magic!

Now you see it (via Google)... now you don’t!

So, what’s the difference? The difference is that while Google spiders the web for its public database content, it does not spider everything. It’s possible to block search engine spiders. Information can be kept ‘private’, by design.

Blogger clearly seems to do this. For whatever reason, it chooses not to make comments information public. What reason can it have? I could guess a few. Perhaps you can too?

Blogger is, actually, a quite remarkable service. Check out the Blogger homepage. I see very few, if any, advertisements. Sign-up is free. Somehow the company that provides this extraordinary service to humanity provides anyone, anywhere with a free blog, as long as they agree to the Terms and Conditions. Who need a Welfare State when the corporate sector can be so beneficent!

Cutting to the chase, Blogger is a Google service. So hey, you may think, no mystery, after all! Google can afford it! Indeed it can.

But it seems Google is uncharacteristically stingy when it comes to making this fascinating, huge store of comments available to the world. No company is better placed to spider, index and make publicly available comments to Blogger than Google! But it chooses not to do so.

I wonder if the data is trashed? :-)

Who else (outside Google) has access to it? That would be interesting to know. Perhaps a letter to Google from a Blogger blog such as CairnsBlog might elicit an unequivocal answer? Would local politicians such as Jason O’Brien or Steve Wettenhall – or our next Federal MP - like to take it up with Google on behalf of their constituents?

I’ve ‘Googled around’ a little, but I can’t find much about this particular topic on the web.

To be clear, this is not an anti-Google article. I like Google. I use it more that I use almost any other tool. I even use the word ‘Google’ too. Google is really quite ubiquitous. In future years, historians may call ours the ‘Age of Google’. Perhaps we should start doing that now?

Trouble is, I did get out of kindy a while back. I read '1984' long before it happened.

I have passing familiarity with the various anti-civil liberties laws passed around the world during the Age of Google, not least of which are the thoroughly Orwellian PATRIOT Laws enacted in the USA, where Google is headquartered. If “the authorities” “believe” they are looking for “terrorists”, they can access pretty much any data they like. I suspect they do.

I understand and appreciate the concern of some contributors to this Blog, who like commenting but don’t feel they can use their own name. My earlier remarks that argued against ‘anonymous’ comments herein (a position I still hold) could be taken as sanctimonious. I hope it isn’t. I appreciate some folk feel able to speak openly about some things - and others don’t, at present.

But let's shift gear and look at the bigger picture. How about control of data by organisations and agencies far distant from our community, for no good reason? I don’t like electronic snooping – but I probably can’t stop it. What thoroughly infuriates me is information withheld from the public domain that they – and they alone – can store, analyze and do with heaven knows what else.

If I post comments under my own name, I do so consciously. I’m aware that they can be found by other people. I choose that. It’s a conscious act to speak under one’s own name about political issues. Some can’t afford this ‘luxury’ (I would call it a right). That’s too bad! Let's change it! But those of us who judge we can share our thoughts with the world want to share them with anyone who might be interested - not just allow Google and whoever accesses the data to scan them at leisure .

If I sign up to comment under my own name on Mike’s website, it’s because I trust him. I may not have met the guy, but I’ve had enough virtual contact to feel comfortable with sharing my basic data with him (phone, email, maybe address if he wants that).

I choose to share it with Mike and CairnsBlog, but I’d prefer not to share it with Google. Sorry guys! Don’t fret! You'll get whatever I post on Mike’s site. You don’t need access to my thought, long term, when the rest of humanity doesn’t. You don’t need to have that info, in a form easily linked to other uses of Google by the same person, such as searches, email, use of GoogleMaps etc etc.

I want a safe world too – but Google was never appointed to look after it for humanity – not through any democratic process, in any event. No-one elected Google for the job. Whatever hash they make of it, it’s a task for our politicians, working with the whole community. At least it should be.

In a recent response to Bryan Law’s recent article, Local State MP Jason O’Brien made an interesting reply, using his own name. Did he know that Google would have ongoing access to that data – but not his constituents? Is that what he’d have freely chosen?

At absolute minimum, if we’re to live in the ‘Age of Google’, we’d better try to understand it.

Article by Sid Walker


Anonymous said...

Hi Sid
No I didn't know google kept my pearls of wisdom but am happy that they will take up some memory space in their datebase somewhere. I try not to get too paranoid about these things but am happy to listen on why you think I should be. They can sell it to the CIA or whoever for all I care. Its all part of the territory of living a public life. Contrary to what you say my constituents do get access to the information, they just have to go to this site. I liken it to going to a public space where you can be legally photographed. Blogs like this one are a public space and you would hope that people will behave in public. Which brings us back to the original point of annonymous posts. It's upsetting that people do not feel secure enough to show their face in the public domain of the internet which is why they wont post with their real name (this excludes the trolls that seem to plague these type of sites). As you point out how the law protects people's freedom of speech and political association with regard to the internet is probably something that hasn't been explored in many jurisdictions. I suggest that the same principles would apply on the net as outside. If you do find out what other authorities are doing to protect people's privacy on the net I'd be happy to learn about it.

Anonymous said...

Hi Jason,

Instilling paranoia about "these things", as you describe them, was not my goal in this particular article.

I simply wanted to share a few insights into what may be happening below the surface in this amazing 'data revolution' that we are living through.

Most people, understandably, just want to use tools and not think too much about them. I'm much the same. If I need to use a screwdriver, I don't want a dissertation on the history of screwdrivers and detailed info about how they are made and marketed. I just want to pick up the tool and use it.

I'm trying to point out that we now have a ubiquitous tool (Google) that unlike most tools, gives its manufacturer more utility - in the form of information - than it gives its hundreds of millions of individual users.

The tool in question is made and owned by a multinational corporation over which people in this community has no effective control whatsoever.

At the very least, IMO, that is a matter worth considering. This is an unprecedented situation in human history. It behooves us to be aware of it - and reflect on some of the implications.

Many Australian politicians seem to take the view that these topics are beyond their ken and scope... so best leave understanding and regulation to others beyond our shores. I think we deserve better from our elected representatives.

The example of broadband merits reflection. IMO, it's great the Federal ALP has at last jumped on the "bring broadband to the people" bandwagon. But had it listened carefully to savvy people in the community (as opposed to vested interests in the corporate sector - especially the mass media), this could have been ALP policy 15 years ago. Other countries committed to high-speed mass access to the internet at that time. The results are now in operation.

IMO, the civil liberties implications of the data revolution should be a central issue in Australia's contemporary political debate. That it is not is testament, yet again, to how poorly the public interest is served by our agenda-setting mass media and largely conformist political caste.

As for paranoia, if I'd wished to instill that among readers of this blog, I'd have written a different article.

I may do that in due course, subject to Michael's approval.