bit.ly
If someone (say google) wants to acquire twitter, but worries about its fair market value, a cheaper alternative may be to buy bit.ly, assuming the goal of acquisition is to get crowdsourcing links in realtime for building a better search engine. Bit.ly is the default link shortening service for twitter. Most outgoing links on twitter are shortened by bit.ly. My experience on twitter tells me that most valuable tweets have links referring to useful sites. So bit.ly aggregates most useful sites shared by community. More important, bit.ly is a redirection service. It monitors the click rate of the links, which indicates the value of the links.
However, tradevibes shows that bit.ly is brought to you by the same people who built, acquired, or invested in twitter. So it may not be so easy to hijack twitter in this way. But bit.ly is really a useful link aggregation service. As a fact, bit.ly already provides a search engine for discovering tweet links. Bit.ly may be even more useful than those social bookmarks, since people use shortened urls not only publicly but also in private emails.
So why do people use bit.ly? Twitter 140-char limit is one reason. But that’s not all. Tracking purpose is another important reason. If you host a blog yourself, you may have some way to track the click rate. But if you share a photo on flickr, an article you happen to find, how do you know how many people really click your sharing? Even if you host your own blogs, how do you know if people really click 3rd-party links on your blog? Bit.ly is the answer. The dashboard on bit.ly distinguishes it from other url shortening services. It tracks the entire click history including the time histogram, referrers, and locations of clicks. That’s more advanced than some virtual hosting services or blog services.
What inspires me more is this. Large Internet services all have their own private redirection/tracking services. Now, this kind of non-user-perceived component can become an independent startup to enable other services to build on top of it. So the question is what other components like this are missing on the Internet? A lot.
Char By Char Synchronization
Today google is going to launch google wave to 100K users. Wave is a new form of communication channel, which makes group email more like wiki + instant message conference. One important feature is character by character synchronization among all participants. An interview with wave core engineer Dhanji Prasanna shows that the synchronization part can be traced back to a 1995 paper, High-Latency, Low-bandwidth Windowing in the Jupiter Collaboration System.
I am not sure if char-by-char synchronization is really useful for email. I feel that will be the first thing I am going to disable in my email client if it exists. I prefer the opposite direction by giving a second thought for everything I send. Isn’t that another google project, undo send? (Actually the dream may come true by replacing distributed emails with a central wave.) But I do see some other scenarios where char-by-char synchronization is very interesting. Live cooperation on google doc really introduced huge edge to beat microsoft. A new startup on social answers, flusher, distinguishes itself from other answer sites by instant answers. Also, some professional traders really want to share their trading actions in a more live way. Social needs to go to real time.
Connection Close in HttpClient
I recently made a mistake using Java Jakarta Commons HttpClient. I decided to dig deeper into the issue.
My code uses HttpClient to send HTTP requests to another machine. The load is very high. Over time, I often see exceptions of “Too many open files” in logs. But the problem may auto recover. Using netstat, I found that there were a lot of tcp connections in CLOSE_WAIT state on the machine. So the problem is that the application did not close the connections.
My code is very similar to the example in HttpClient tutorial.
HttpClient client = new HttpClient();
GetMethod httpget = new GetMethod("http://www.whatever.com/");
try {
client.executeMethod(httpget);
...
} finally {
httpget.releaseConnection();
}
The code calls releaseConnection at the end as specified by the tutorial. But what does this method do? To understand it, we need to understand what’s behind HttpClient object. Each HttpClient has an HttpConnectionManager responsible for maintaining connections. If we don’t pass an HttpConnectionManager to the constructor, HttpClient will initiate a SimpleHttpConnectionManager by default. SimpleHttpConnectionManager maintains a single connection and can only be used by a single thread. The main job of SimpleHttpConnectionManager is to keep the connection alive if the next request is to the same host. So the above releaseConnection call will not close the socket. If the next method to be executed is to a different host, it will close the prior connection at that time. Otherwise, it may reuse the connection.
The mistake I made is that I created HttpClient objects on demand instead of reusing a single instance as documented here. If the peer closes the socket first (by sending FIN), the connection will be in CLOSE_WAIT state on my side until my application layer closes the socket. CLOSE_WAIT is a state that will not time out. (It is not TIME_WAIT.) It is application’s responsibility to close it. So how to force HttpClient to close the socket? Actually HttpConnectionManager interface does not define a way to close the socket. But SimpleHttpConnectionManager introduced shutdown method since 3.1. So one possible way to close the connection is as follows.
HttpConnectionManager mgr = client.getHttpConnectionManager();
if (mgr instanceof SimpleHttpConnectionManager) {
((SimpleHttpConnectionManager)mgr).shutdown();
}
But why isn’t the problem deterministic? Shouldn’t it never recover once the problem starts to happen? The magic is Java garbage collection. I reproduced the effect by forcing garbage collection. It will clean CLOSE_WAIT connections. But, to be accurate, JVM garbage collection does not handle socket closing by itself. It only frees memory. It is Socket object who closes sockets in finanize method as discussed here.
SimpleHttpConnectionManager is not thread safe. If you need to maintain a reusable HttpClient instance shared by multiple threads, you should use MultiThreadedHttpConnectionManager. For example,
protected static HttpClient m_client = null;
static {
MultiThreadedHttpConnectionManager mgr = new MultiThreadedHttpConnectionManager();
mgr.getParams().setDefaultMaxConnectionsPerHost(1000);
mgr.getParams().setMaxTotalConnections(1000);
m_client = new HttpClient(mgr);
}
To win a game that is impossible to win
To win a game that is impossible to win, you need to first change the rule.
When Microsoft was busy fixing its IE security problems, Google introduced a browser that is fastest in executing javascript. Who care the difference in javascript execution speed at that time? But, that was a new rule to compare browsers. Gradually chrome gave me an impression that it is “fast”.
After I upgraded to firefox 3.5 beta, a browser with unbelievably long startup time, I started to seek alternatives and roughly remembered chrome is somehow “fast”. So I started to use chrome and did feel it is fast. All web sites I regularly visit work very well in it.
Now google starts to boost HTML5 using its strategic products. IE, once the king of browsers, needs google chrome frame to fully support HTML5.
No matter whether google can win this game or not, what impressed me is that I saw consistency in google strategies to win a game from all perspectives with patience year over year.
Cannot tell which browser is faster? Test this!
JSNES, a javascript nintendo game emulator. You can play Contra on it.
http://benfirshman.com/projects/jsnes/
It is almost only playable on Chrome. Firefox is so slow. My IE 6 cannot even display it.