Diego has been writing about a project of his the “Conversation Engine”. I thought I would add a few thoughts to the fray that might help.
For discovery a simple RPC-Ping can be used, this can then add that site onto the spidering queue. To maintain the small scale test of the engine to begin with a whitelist of sites that are allowed to ping. Each page should be spidered multiple times over time to take advantage of trackback and pings. This could be coupled with trying to take the date/time out of the published posts. This may be a little difficult but still possible.
The way Google (and I’m sure most other search engines) use the ordering of the search terms is quite interesting. I have a feeling it may be a by-effect of the way they implement the filtering of results within their databases. To improve results, a person can reorder their search terms, but does the search engine do this itself.
The search engine could use the most common usage of each word (noun/verb) to decide if it is the subject of the search or the action being performed on the search and reorder accordingly.
There has been talk (here and here) about fixing the standards support in IE. Microsoft thinks they have a lock-in with their ActiveX support and how they support the webstandards and they don’t want to compromise that.
I think the opposite is true, if they don’t fix it, customers (enterprise and consumers) will move to the standards compliant options because development is cheaper. 2 years is a long time and if they have to upgrade their intranet, they just might ditch ActiveX and goto an alternative. We have already seen that the consumers haven’t got a problem with switching.