Someone forwarded me this article from the BBC Online: BBC NEWS | Technology | World’s poor to get own search engine

Researchers at the Massachusetts Institute of Technology (MIT) are developing a search engine designed for people with a slow net connection.

Someone using the software would e-mail a query to a central server in Boston. The program would search the net, choose the most suitable webpages, compress them and e-mail the results a day later.

The project website is here:
TEK Homepage

My question is: Why would anyone want that?

It’s a 1.3Mb download if you already have Sun’s Java installed, if not, youll have to add at least 8Mb to that. For people with unstable, low-bandwidth connections that sounds like a definition of the word impossible. And their suggestion of distributing this on CD cia local libraries sounds a little utopian as well.

So let’s look at what this actually does. The way I see it, there are 2 things this solution delivers:

Asynchronous, off-line search results. Send in your query and the results will be delivered to you by e-mail.

But that exists already, in many different guiles, ranging from CapeClear’s Google by eMail, based on the google API’s, to www4mail, an open sourced mail gateway for web access.

Neither of these requires a download, and both deliver off-line asynchronous access to we content, and there are many more of these types of solutions out there.

That leaves us with the second part of what TEK delivers:
Delivering relevant search results

According to the TEK website:
The email delivery of the search results introduces a time delay. TEK uses that time to our advantage: the extra time will allow us to filter and refine the search results. We are working on clustering the found pages into similar concepts and then select examples of only the “strongest” to return to the user.

All pages downloaded to the client computer are stored in the local cache, and are there for other users to search at a later date.

So, they can use the fact that we are dealing with off-line asynchronous communication to deliver better and more relevant search results. If they succeed there, yeah, then we might have a search tool that will grow from low-bandwidth users to the rest of us, because everyone is always looking for fewer, more relevant search results. Unfortunately this part of the project is still a work in progress, although there are some indications on the site that they are indeed cooking up some interesting tech based on a way of clustering similar pages into groups, and delivering only “best of breed” results.

That of course opens them up for criticism in terms of limiting access to technology, based on rules defined at a single centralized location, a subtle form of censorship.

In any case, even if they deliver on the promise of a better search-engine, why would I want a 10Mb download when I already have a mail client. If you’re going to deliver search results via mail, why not let people actually enter the search queries that way too? The complexity of a local proxy seems wasteful in this situation.

2 thoughts on “Search Engine for low-bandwidth connections

  1. makes you wonder how they got covered by the BBC at all

    simon

  2. Hi,

    As a graduate student on the TEK project at MIT, let me try to answer some of your questions.

    It’s definately the case that there are other Web-over-email utilities, such as www4mail that you mention. While these are great services, there are several benefits that are added by having a client program on the user side. First, it greatly improves usability — users have a standard web browser interface that hides the details of search queries and email formatting from the user. Also, the results are returned as bonefied web pages, complete with colors and special formatting. When/if a community ever gains full Internet connectivity, it can seamlessly integrate into the new environment with the browser / navigation skills that were learned using TEK.

    Also, sometimes it is important to have a shared installation on a public kiosk, school, or library machine. In these cases, the TEK Client provides basic user management so that several people can share a single email account (which typically belongs to the site administrator.) The administrator can oversee the usage of the system and authorize the sending and receiving of mails.

    Finally, the client program provides a local search utility for indexing pages that have been downloaded previously. In this sense, it serves to build up a local information cache that is likely to have relevant information for the community.

    Regarding the distribution of the large client program, we have found so far that it’s feasible to distribute on CD, especially for shared machines where there is an organization or administrator that is actively installing new software across several related sites. Sometimes it is also possible to package TEK with other systems (e.g., those provided by FirstMileSolutions, http://www.firstmilesolutions.com ) that are providing the first sources of connectivity for a region.

    On the server side, there are benefits added besides the advanced searching technologies (which, as you correctly state, are still under development.) For example, the server keeps track of each client as well as the set of URL’s that they have previously downloaded; in this way, the server can avoid wasting bandwidth on duplicate pages, and can always send new pages that are adding information to a Client’s cache.

    We are very grateful for the recent interest in TEK, and we are continuing to improve and extend the system so that it can best meet the needs of low-connectivity users. Please check our website at http://tek.sourceforge.net for the latest information on TEK. Thank you.

Comments are closed.