Web clips collected from the web

Tuesday, October 25, 2005

Semantic Google Base

So I was wrong on the name and scope of my GoogleHosting prediction, they are working on a hosting project but their plan is bigger, they are going to let content owners submit their content to Google to build a enormous web database.

So this got me thinking about how are they going to handle the huge amount of information that will be coming, it looks like they are using a distributed structured storage system called BigTable, which was presented last week in a talk at University of Washington where they mentioned it is in use in Google Print, My Search History, Orkut, Crawling/Indexing pipeline, Google Maps/Google Earth, Blogger, Google Reader and others.

According to this screenshot and these others the service will have these features:

  • An option to advertise a service you offer or an item you sell.

  • An option for webmasters to upload articles.

  • An option to researchers to publish their work.


And they are going to organize the information with labels and attributes, this means more metadata to filter and rank the search results. This move is really interesting and it is not only a move to compete against Craigslist or eBay but it is the RDF store that Adam Bosworth was talking about in the MySQL Users Conference 2005. If they are successful with this project they will have taken one more step in their pursuit for knowledge, I hope that the automatic filter that may be in place when an item goes from items status "Processing - will publish soon" to "Published" will be good enough to handle spam.

Wednesday, October 19, 2005

Agile and innovative Microsoft at Gartner ITxpo

Steve Ballmer was interviewed this Wednesday at Gartner's Symposium/ITxpo 2005 in Orlando, for those who could not assist there is blog coverage and a copy of the interview and a transcript, here are some things to point out about this re-organized Microsoft.

Strategy in the changing world

Windows brings massive community of developers with users together, similar to what eBay does with buyers/sellers and what online advertising does with advertisers, content producers and end users, there are a lot of opportunities to deliver new value on the client, server and the internet cloud.

Innovation is a top priority

There are 3 scenarios to innovate, short-term (MSN), mid-term (Office) and long-term (Windows). The "greatest innovation pipeline" includes:


Agility in the engineering system

To enhance integration and detect possible issues earlier - like security - Microsoft changed its development methodology to write agile software and reuse components, focusing in customer interaction through betas with huge-impact products like WinFS

The web as the end-user environment

MSN is globally the leader in time spent online thanks to the number of subscribers of MSN Messenger, Hotmail and MSN Spaces and to enhance web services experiences some things to work with are security in the browser, search integration and extending local desktop components

Innovation in search as the strategy

There are plenty of opportunities to innovate and compete in the search market, specially for the business end-user. Relevance will be enhanced when the search engine remembers the things I do respecting my privacy using technologies like natural language.

On Virtualization

Virtual Server 2005 R2's features are able to compete with upcoming open source competitors until the release of the hypervisor in 2008.

A better Unix than Linux

There are more migrations from Unix to Linux and Microsoft wants to be more competitive with the Compute Cluster Edition of Windows Server High-Performance Computing, lightweight windows-based hosting and security in appliances.

Ajax convergence with the rich client

Storage, processing and visualization are powerful in a rich local environment and will converge with technologies oriented to services like Atlas in applications like the new Hotmail Kahuna

Service Oriented Architecture Interoperability

Microsoft's focus is to bring the interoperability standards for Web Services in Windows products like ASP.NET and Biztalk towards better use of RSS-like protocols.

Management tools and Total Cost of Ownership

TCO is a high priority and tools like the System Center Capacity Planner are part of the Dynamic Systems and Infrastructure Optimization initiatives.

Simplifying licenses

Microsoft's simplest license available is the Enterprise Agreement and there have been announcements of licensing in virtualization scenarios, multicore processors and shared source.

Watch this in the future



This is where Microsoft is heading but I did not get the tip I was looking for, there was some hype before Web 2.0 conference about what company Scoble wanted Microsoft to acquire, and he is still expecting the check. Now I agree it could be Newsgator but Sphere looks interesting too, for sure I'll get back to this.

Tuesday, October 18, 2005

Improve usefulness of the blogosphere

Apparently this weekend Blogger was hit by a spam-blog bomb and it fired a lot of posts about it on Memeorandum, reactions included Icerocket's decision to stop indexing new blogspot posts, Jeff Harvis' request to share secrets to ignore spam blogs and Chris Pirillo's request to kill Blogspot. Google is already taking some action by adding some spam barriers.

The discussion is still going on Memeorandum, with Randy Charles stating that if not on Blogspot, spam-bloggers will try somewhere else and Chris Pirillo's suggestions for Blogspot. I like specially the Probationary Period suggestion which could be implemented similar to how Gmail was adding the number of invites through usage before, other actions should be coordinated with tools similar like the Splog Reporter and browser/toolbar extensions to report spam-blogs.

Cleaning the blogosphere it's not the only way to improve usefulness, to solve some blog usability issues Ajax could help. There is an Ajax WordPress Theme coming soon, here is a Flash-based Techdemo which shows some functionality, it looks promising.

More things to consider on Blog Usability

Yesterday Jakob Nielsen posted an article about weblog usability, which has some interesting suggestions for bloggers interested in building readership, you can follow the discussion on Memeorandum and I am already trying to correct the Nondescript Posting Titles and Links don't say where they go design mistakes in this blog but I don't agree with Jakob in:

  • Irregular publishing frequency. Usually you discover new posts through subscriptions or search engines

  • Mixing Topics. You can provide subscriptions for each topic you post about


I have been doing some thinking about what do I need in a blog to be useful and I am putting the following factors in consideration to find which is the adequate hosting provider/engine for blogging

  • easily discovered RSS/Atom feeds

  • visible and simple search

  • simple tagging for a lightweight cognitive process

  • simple comments/trackback navigation

  • full text RSS/Atom feed or customized excerpt feed



I am still trying to find the best way to quote from other blogs, any thoughts about this?

Monday, October 17, 2005

Cunningham leaving Microsoft going to Eclipse

The inventor of Wiki, Ward Cunningham is leaving Microsoft and going to the Eclipse Foundation as Mike Milinkovich reports. Peter Provost, a co-worker in Microsoft's Patterns and Practices Team - shares his thoughts.

I love agile development so I bet this is a great gain for Eclipse, you can check a video with him and Sam Gentile available at Channel 9 to watch and listen some of their opinions.

Blog mining in the spotlight

According to Technorati's State of the Blogosphere the number of blogs doubles every 5 months, but around 8% of them are spam blogs and around 6% of daily posts are coming from them. These numbers are scary so something has to be fixed, Chris Pirillo wants Google to fix Blogspot - check the comments via Memeorandum - and the conversation is heating up.

Towards getting better information from the blogosphere, as reported by Steve Rubel AOL signed a deal with Intelliseek's BlogPulse, recently Trendum merged with BuzzMetrics so they are working on something to and as reported by Data Mining there is a new player: SonicHealth from Yoogli coming in the Consumer-Generated Media market and we have to pay attention what Cymfony, Umbria and maybe Google's Urchin will come up with.

If you are trying to do something with blog mining do not forget to review the information discussed in the Information Intelligence 2005 Summit and check MIT's ConceptNet Project for a knowledge base toolkit.

Sunday, October 16, 2005

Is GoogleHosting coming?

While searching for blog hosting providers today I went to the WebHostingTalk Forums where I found a discussion about who got the domain name googlehosting.com, the winner of the backorder process is Jim Yoon from Dotstar Inc. The domain is not in the SearchEngineWatch's Google Domains List so I wanted to check if this is something similar to what Google Addiction is reporting about DataDocket and MarkMonitor. So far I only found some relationship with therightplan.com and newscorporation.com and since there is no clear indication in Tony Ruscoe's Google Subdomains list that a hosting service is coming, it is just a rumour for now.

But the discussion got me thinking about it so I did a blog search about GoogleHosting and the only result found was about GoogleNet speculation - you can read my Google Global Net prediction too - and the reduction of bandwidth costs to enhance searching and indexing experiences.

As Sergey Brin stated at Web 2.0 conference, Google is not focused in producing content but giving access to it and analyzing their business ecosystem, they had commoditized storage via Gmail for the users, they are constantly adding new features - like Flash based ads - for advertisers but publishers are not getting enough love - unless you are Jason Calacanis - so a hosting service from Google could make web publishers happy too.

Google is already hosting blogs like this, videos and is a domain registrar so they know enough about the business. In their small business offerings Yahoo and Microsoft already offer hosting and since there are some reports about hosting providers inserting links to hosted content this is a chance to stop this and expand Google AdSense reach, this looks like a natural evolution of Google's existing online products so here is one use for Google Purchases, I am still wondering how the ad-free subset and micro-payments mentioned by Eric Schmidt will fit in.

Update: (20051017) Gary from ResourceShelf was kind enough to do some additional research and it seems that Dotstar is a squatter of the domain. So let's go back to search for a new host.

Monday, October 10, 2005

Tags and labels going mainstream

Browsing Memeorandum I noticed that everyone was talking about the release of gada.be, which is a meta-search engine with an interesting query mechanism that allows searching and filtering via URL, which is really useful for devices with limited display and keyboard capabilities like mobile phones. For more information you can follow the discussion via Memeorandum or read their about page. I don't know how fresh are their results because a search for gada brings me no results and as any new service they are having some problems with handling requests but I am sure they will fix it fast.

What interests me the most is that they are retrieving tagged results using API calls and giving back resources like this opml list for Web 2.0, so some development towards alternative user interfaces and ranking mechanisms can be build on top of this. I am already expecting an API for the recently launched Google Reader which allows me to label - Google's word for tagging - my subscriptions and each post and know how Yahoo's blog search results evolve, here are the blog results for Web 2.0 which include as expected Flickr and My Web 2.0 results.

With tags and labels going mainstream will them enable the KISS semantic web?

Thursday, October 06, 2005

About tagging in Web 2.0 conference

Blog posts about Web 2.0 conference are increasing so let's add one more, this one is related to the interesting technology we are using to organize information: tags.

In the tagging workshop they talked about interestingness which is a new metric used by Flickr, a preliminar comparison with meta-keywords on web pages, private and public tagging and the on-going research for automatic tagging, this phenomen looks interesting and we are going to get the idea of what this means when more tagging studies are available. This is definitely the KISS approach to what RDF is trying to achieve for the semantic web.

In the Open Source Infrastructure workshop open media concepts dominated the interaction between the panelists, worth to mention is
microformatting which is an attempt to organize blog content, combined with the launch of Wink to integrate tagging and search engines and the release of the Attention Trust Recorder Firefox extension to log a user's clickstream online ads will get better as suggested in the Web 2.0 Ad Models panel.

Tag driven ads are the next big thing on the web

Wednesday, October 05, 2005

Personal, specialized and general search engines

Another interesting discussion about search I missed on the weekend was started by Scoble and it is related about "Perfect Search" Chapter of John Battelle's book The Search, in the follow-ups concepts like query refinement, tags, freshness, clusters, the semantic web and natural language interaction are mentioned. I believe that the perfect search depends on what you are really looking for so I'll share what I need right now.

Things you know or you are interested in

My computer is my personal digital repository of things I've seen, read or browse and desktop search engines are already helping me find this stuff, though I need query refinement tools because I don't always remember the exact keywords in the documents. Also some are trying to find some stuff that may interest me using my browsing history patterns.

But what is really needed is some way to categorize this information, similar to what we do with folders in the file systems but with tagging flexibility, something like personal tagging that will cluster the query results and identify similar interests using these tags. The work in progress being done by the recently introduced Wink, the personal search engine Rollyo and the Structured Blogging organization is worth to watch in this area.

Things you don't know and you work with

When you need something for work or school that you don't know, the best places to find it are the intranet (enterprise search), the databases/digital libraries you have access to (vertical search) or online communities (social search); query refinement and clustering, some implementations of natural language related work, some kind of simple semantic web implementations like taghop category relater and the use of alternative interfaces to display the results like SearchIris will ease the finding of our information needs in this area.

Things you just don't know

Here is where Scoble's example fit and it is the main interest of the big players in the search engine market, I have different search patterns so I will search on new york hotels and will enter some of the sites suggested in the results page to find the ones with "free wifi and good view" but what will really help me is that I could enter new york hotels free wifi good view and it will return a list of hotels - obtained by querying the specialized databases provided by those sites - with some user opinions about their services - probably extracted from blogs or forums.

How far are we from it?, to get started let's pay attention to what the blogosphere will talk about Web 2.0's workshop New Ideas in Search and posts tagged Web 2.0, if only I could search on "search" in these results will be great...

Tuesday, October 04, 2005

What are Google and Sun working on

Today's Sun and Google news conference was widely expected, some where predicting an Ajax/XUL Google OpenOffice Suite and it would have been a real surprise because there was no indication that work related was being done, the official agreement is the distribution of their software technologies, although some are not excited about the announcement, there are important things to consider about this.

  • Last month Microsoft's PDC Conference courted developers for
    upcoming .Net technologies, Java needed something like these to keep being a big player in the developers wars and Google definitely could enhance their APIs to bring more developers aboard.

  • As Dana Gardner states nowadays it's good to be Google and voice and data network providers should be concerned because this could lead to an efficient VoIP competitor

  • Java, OpenOffice, Solaris working with Google's technologies is more food for my Google Mobile Computer prediction, in fact I believe both companies will be working in something similar to Morfik's JST technology to get software-as-a-service and the-network-is-the-computer concepts finally where they need to be.



Elsewhere David Berlind and Om Malik are giving their opinions about the announcement

Monday, October 03, 2005

What company does Scoble wants to acquire

Web 2.0 conference is starting this Wednesday and I decided to stay offline this weekend to prepare myself for the overload of announcements that are coming, that's why I missed Scoble's kick-off which was brilliant. He managed to keep us busy thinking why he posted the request in his blog, getting the blogosphere to suggest interesting startups to acquire and get more reasons to convince Microsoft execs that the acquisition is important showing blog power.

He states that the company he wants to acquire is working in areas where Vista is not strong, will make an announcement in Web 2.0 conference and it is not cheaper than Flickr and he already said it's not the breaking online news digest Memeorandum, neither the simple web-based tools developers 37signals, nor the innovative unplugged web applications newcomer Morfik, so let's review the other ones suggested:

  • Darwin Productions. Suggested in Scoble's Mudpit, but they don't look like a company that will help Microsoft to get a significant entry into Web 2.0 market.

  • Flock The social web browser will launch this week but their Firefox and open source roots are important obstacles.

  • Feedburner They have enough subscribers to get a good price and monetize feeds with Google AdSense, definitely something Microsoft could use, only the Vista reference does not fit.

  • Technorati It won't be cheap and Sifry's Linux roots won't make it easier.

  • JotSpot The business model will be more appealing to Microsoft and application wikis are important in Web 2.0 although it won't be cheap either.

  • Zimbra Interesting collaboration suite but Sharepoint, Microsoft Mail and Kahuna should take care of this.



No winner yet so let's check the Launch Pad Workshops to find the company.


  • Socialtext got recently SAP funding

  • Rollyo is a personal search engine but Scoble didn't know about them on Friday

  • Joyent sells a box with open source software

  • Bunchball is building a platform for Flash-based social applications

  • RealTravel focuses in sharing travel experiences

  • Zvents looks a lot similar to eventful

  • Knownow is a company exploring corporate RSS with enterprise syndication solutions for KM

  • Orb Networks is a developer of media delivery software

  • Wink is using nanotechnology to provide a social search service with tagging

  • AllPeers is a company focused on P2P applications

  • PubSub is a persistent query matching engine



I have no definitive winner yet, well let's wait if Scoble does it, I put some money on Feedburner and Orb Networks