Sunday, May 12, 2013

hash bang URL fragments and http redirects

First there were static pages.
Then came javascript and html forms.
Next came ajax.
Now we are talking about websockets and HTML5.

I remember writing JSP where the server would dynamically generate the content for a client. Today, we are still doing some of this at the server but as our web applications have started getting more sophisticated it has become imperative that we do more of this at the client. RequireJs text plugin is of great help for this. Similarly for a web application that is dynamic - it would be great if it were book-markable too.

Open GMail and then open a particular email, you will notice that the URL looks something like https://mail.google.com/mail/u/0/#inbox/13e9f320434b6c82. Now if you bookmarked that url, what do you think will happen the next time you open it?

The answer depends upon at least two factors - the browser you are using and whether you have automatic sign-in enabled.

If you logged in automatically (because of a saved cookie), you will see the proper email open in almost all browsers.

If you get redirected to the sign-in page, then its a different story. On IE9, you will see a URL like the following:
As you can see the #inbox/13e9f320434b6c82 fragment is lost! So after you sign in you will NOT see the email.

In Chrome, you would instead see: https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2#inbox/13e9f320434b6c82

While browsers do NOT send the '#' and the following string of a URL to the server (that would be a page-reload), most modern browsers will append the fragment to the redirected url. Now is this the right thing to do? It's at least not better than dropping the fragment. Why would it ever be bad in any ways? Answer: If the redirected page also interprets the fragment but in a different way? That would be an accidental collision of semantics.

Getting back to the question though, Chrome still does not open the email after I sign in! This is because the login page doesn't handle the window.hashchange event. If it did (e.g. called history.pushState()), the #fragment will be relayed to the next redirect too and we should be happy to see the email direcly.

Three pieces of advice -
  1. Do not use hashchange event directly but instead use something like jQuery which can simulate hashchange events (by a polling mechanism) for browsers like IE7
  2. Your sign-in page should relay hash fragments
  3. If the sign-in page is NOT under your control - you are mostly out of luck. Let's see how we can defend agains that below.
We could encourage users to not share the URL they see in their address bar directly but instead provide a "SHARE" button, which would generate another URL that captures, as a query-string parameter, the same information that would otherwise be in the #fragment. E.g. http://myapp.com#mypage is converted to http://myapp.com?hash=mypage

Next you would write you app to interpret the "hash" parameter as the hash fragment. But is that good enough? What happens with a url like http://myapp.com?hash=mypage#mycontacts ? Obviously, the fragment must take precedence over the query string. (You see why?)

Wednesday, May 8, 2013

Stupid download of browser self-generated content

HTML5 is here but we got IE 8 to support and then IE 9 is gonna be around for a while too. Often times we might be generating data on the browser that the server is not aware of and need not be aware of, e.g. we might be drawing some doodles and we want to save it and keep a copy locally on our desktop. Now since the content is already here at the browser, why should we not be able to write to a file? Well that's why we have data uri (also see Data_URI_scheme)
But tragically, it doesn't work with most contemporary browsers. We could then use something like downloadify (uses flash though).
So, is there a javascript/html only solution? None so far that works perfectly (Data-uri on Safari and Firefox 19- requires the user to enter a filename and there is no default name support. On IE there is the document.execCommand that you could use but it require pop-up to be enabled and once again you can't suggest a default file name).
Well it might seem very dumb but sacrificing latency and server cpu cycles and just doing a round-trip to the server and using an anchor element seems to be the safest bet:
  var linkEl = this.$('a.export-csv');
  linkEl.click( function _onExportCsv (evt) {
      var csvText = $('.scratch-pad').text();//get the content
      var filename = "mydata"+JSON.stringify(new Date()).replace(/"/g,'');//get a default filename"
      if (/chrome/i.test(navigator.userAgent) || /Mozilla\/5\.0.+\srv:2\d\.\d/i.test(navigator.userAgent)) {
        //this works only on Chrome and FF20+
        linkEl.attr('href',"data:plain/text," + encodeURIComponent(csvText));
        linkEl.attr('download',filename);
        return true; //DO follow the link NOW
      } else {
        var formTemplate = _.template(
        '<form action="<%=downloadEchoUrl%>" method="post">\
          <input type="hidden" name="filename" value="<%-filename%>" />\
          <input type="hidden" name="contentToEcho" value="<%-contentToEcho%>" />\
        &lt/form>');
        //using POST for safety - the payload could be large and IE doesn't suppor long URLs
        $(formTemplate({
          downloadEchoUrl: "http://ustaman.com/echo",
          filename:filename,
          contentToEcho: encodeURIComponent(csvText)
          //note the template will do an escape, use of &lt%- instead of $lt%= with underscore
        })).appendTo('body').submit().remove();
        return false; //do NOT follow the link
      }
  });
And here is a sample "echo" server in Java (one could write a simpler version using node or python or what not).
package com.ustaman;

import java.io.IOException;
import java.net.URLDecoder;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

@WebServlet(name = "FileDownloadEchoServlet", asyncSupported = false, urlPatterns = { "/echo/*" }, loadOnStartup = 4)
public class FileDownloadEchoServlet extends HttpServlet {

    private static final long serialVersionUID = 1343508390612726798L;

    private static Logger logger = LoggerFactory
            .getLogger(FileDownloadEchoServlet.class);

    @Override
    /* We could use Spring MVC with @RequestBody for production code */
    protected void doGet(HttpServletRequest req, HttpServletResponse resp)
            throws ServletException, IOException {
        this.doPost(req, resp);
    }

    @Override
    /* We could use Spring MVC with @RequestBody for production code */
    protected void doPost(HttpServletRequest req, HttpServletResponse resp)
            throws ServletException, IOException {
        resp.setContentType("data/plain");
        resp.setCharacterEncoding("UTF-8");
        resp.setHeader(
                "Content-Disposition",
                String.format("attachment; filename=%s",
                        req.getParameter("filename")));
        resp.getOutputStream().write(
                URLDecoder.decode(req.getParameter("contentToEcho"), "UTF-8")
                        .getBytes("UTF-8"));
        resp.getOutputStream().close();
        logger.info("File download echoed");
    }

}

Wednesday, April 24, 2013

Preempting Asynchronous Callbacks

With the single-threaded nature of JavaScript, the use of callbacks is innate to it. Promises (e.g. jQuery Deferred) are a nicer standardized way of making asynchronous function calls while at the same time tying up dependencies between different async tasks (e.g. $.when(), Promise.then(), etc.).
In the examples I present here I will use jQuery Deferreds/Promises.
[The difference in Deferred and Promise is the perspective. Deferred is from the perspective of the "producer", which will resolve or reject the deferred. While Promise is from the perspective of the "consumer", it attaches callbacks to the promise to be called upon failure or success. See jQuery Deffered]

The Problem

Most often we won't need to have multiple outstanding promises of the same type - e.g. let's say we have an input box for a search phrase and when one presses a submit button, the search phrase is submitted to a remote server and the result is displayed when the response comes back. Our code would look something like:
button.click( function() {
  var text = inputBox.text();
  $.post("test.php", { search: text } )
  .done(function (data) {
     //display the data
   });
});
Now lets say you typed something and clicked the submit button. Then before the server responded, you edited the text and clicked submit again. What would happen?
Whatever response comes first will be displayed and then overwritten with the response that comes second. Note, can you assume that the responses will arrive in the same order as the requests? Usually, NOT.

Lets prevent multiple outstanding requests

So, how would you solve this? [If you are wondering why would this be a problem in the first place, maybe in a real application, we perform some expensive operations on the response.] How about disabling the submit button as soon as one clicks on it and enabling it when the response comes back?
button.click( function() {
  inputBox.attr('disabled', 'disabled')
  var text = inputBox.text();
  $.post("test.php", { search: text } )
  .done(function (data) {
     //display the data
  })
  .fail(function (err) {
     //display err
  })
  .always(function () {
    inputBox.removeAttr('disabled');
  });
});
The reason we re-enabled the button in the always callback instead of the done callback is because we want to re-enable the button in case of either success(done) or failure(fail), so instead of repeating it in those two callbacks, always gives us a convenient shortcut to stay DRY.

But why make the user wait?

But lets say we are not happy with having to WAIT for success/failure. If we like to search something else and don't care about the current ongoing search, what should we do?

We preempt the obsolete callbacks. But how?
Would a boolean flag work, as follows?
button.click( function() {
  var text = inputBox.text();
  button.__preempt = false;
  $.post("test.php", { search: text } )
  .done(function (data) {
      if(button.__preempt) return;
      //display the data
      button.__preempt = true;
  });
});
NO the above is no different than the previous example; since we haven't gained any more expressive power. The attribute disabled could be thought of as the equivalent of __preempt variable!
Actually, now it's worse. We allow the user to hit submit button multiple times but we display only the first result instead of the last.

How about a counter?

button.click( function() {
  var text = inputBox.text();
  if(button.__clickCount) button.__clickCount++;
  else button.__clickCount = 1;
  $.post("test.php", { search: text } )
  .done(function (data) {
      if(--button.__clickCount) return;
      //display the data
  })
  .fail(function () {
      button.__clickCount--;
  });
});
While the above works, we are assuming that the responses come in the same order as the requests. If that is not the case, we might need to have the request (the search criterion) itself embedded in the response, which would enable us to place the related search phrase back in the input box whenever we receive a response. But chances are we cannot dictate the response payload (e.g. we don't control that server or it is shared by other consumers and we can't change the api).

Any other way to relax on response ordering?

How about we remember the request?
button.click( function() {
  var text = inputBox.text();
  botton.__unresolved = text;
  $.post("test.php", { search: text } )
  .done(function (data) {
      if(button.__unresolved !== text) return;
      //display the data
  });
});
The reason this works is because the callback is relying on the closure on two objects - button and text.
Since button is the same for all instances of the callback, it allows them to communicate and button.__unresolved is the means of that communication. On the other hand text is unique to each instance.

A slightly different and more flexible approach.
Instead of using "text" we should generalize by use of a GUID.
And how about preempting based not just on multiple click of the submit button but also another event (e.g. a cancel button)?
button.click( function() {
  var thisCallId = guid();
  botton.__validCallId = thisCallId;
  $.post("test.php", { search: inputBox.text() } )
  .done(function (data) {
      if(button.__validCallId !== thisCallId) return;
      //display the data
  });
});
cancelButton.click( function() {
  delete button.__validCallId;
});

Sunday, April 14, 2013

Scala and the price of not being purely functional

A functional programming language (e.g. Scala or SQL for that matter) encourages one to think in the problem domain and write solutions that are declarative and accurate (provably so indeed). However, that is usually a lie! After coming up with a solution, one then has to worry about performance and see how to tweak it. And to be able to do so one has to know how the language works - which contradicts the whole premise of being declarative in the first place.
With something like SQL, we have query optimizers and I believe they do a pretty good job.
Now the same could be true in a pure functional language like Haskell, which, is free to reorder all our expressions as long as the dependencies are maintained. However, with Scala, expressions are allowed to have side effects (of course one should strive to not have those) and thus in general reordering of expressions is something the compiler probably won't do for us. And thus we have (lazy) val v def. E.g.
abstract class MyList[T] {
    def size(): Int
    def head: T
    def tail: MyList[T]
}

class MyEmptyList[T] extends MyList[T] {
    val size = 0
    def head = throw new NoSuchElementException
    def tail = throw new NoSuchElementException
}

class MyNonEmptyList[T](val head: T, val tail: MyList[T]) extends MyList[T] {
    def size = {
        println("MyNonEmptyList.size()")
        1 + tail.size
    }
}

object Main extends App {
  val list:MyList[Int] = new MyNonEmptyList(7, new MyEmptyList);
  list.size
  list.size
}

This outputs:

MyNonEmptyList.size()
MyNonEmptyList.size()

If MyNonEmptyList#size was declared val instead, the output would be:

MyNonEmptyList.size()

However memoization of methods that have input parameters is not supported, e.g.

class MyClass {
    def get(in:Int):Int = {
        println("MyClass.get()")
        in * 2
    }
}

object Main extends App {
  val x  = new MyClass
  x.get(5)
  x.get(5)
}
The output:

MyClass.get()
MyClass.get()

If we wanted to memoize "get", we can't declare it as val but instead we'd have to keep a map ourselves.

Sunday, March 17, 2013

JAVASCRIPT setTimeout(fn, 0) for batching requests to a server

I was working on a web application where a user would subscribe for ticket prices for air-fares. She would enter the source and the destination and get live updates to prices and options.
Now, every time she enters a new pair, a new panel/tile would be added to her page that started ticking. Those panels were persistent; she could logout and when she logged back in later, the panels would load and start the subscriptions anew.
The browsers we needed to support included IE8. We were using socket.io for making the requests and receiving subscriptions. But the library would fall back to using jsonp-long polling or xhr-streaming for instance. Now, as we know, IE limits the number of Http connections one can have to 6. This meant if we made too many requests concurrently, we would experience delays as the request would get queued up. This was the case when loading a page with numerous persisted panels.
We thus allowed a batched request API, a simple array, e.g.
[ "NewYork-London, "NewYork-Paris", "NewYork-Jakarta" ]
It would be nicer to not have to differentiate between the user entering 3 individual pairs or a batched request being made on her behalf upon page load.
This is where setTimeout(..., 0) becomes useful. [Note: Underscore.js's _.defer() does the same]
You can look at the code in action at CLICK HERE
function Service (socket) {
  this.socket = socket;
  this.liveStreams = {};
  this.pendingSubscribe = [];
  socket.on('response', function (data) {
    // Assume response is a json object,e.g.
    //  {
    //    pair:'NewYork-London',
    //    data: {
    //      cost: '560',
    //      class: 'Economy',
    //      departing: 'EWR',
    //      arrving: 'LHR'
    //    }
    //  }
    data = JSON.parse(data);
    this.liveStreams[data.pair].call(null, data.update);
  }.bind(this));
}
   
function flush (service) {
  console.log('FLUSHING ' + service.pendingSubscribe.join(','));
  service.socket.emit('request', service.pendingSubscribe);
  // reset for next flushSubscribe
  service.pendingSubscribe = [];
}

Service.prototype.subscribe = function (pair, callback) {
  // in case of intial page load, this "deferring" will force batching
  if(this.pendingSubscribe.length === 0) setTimeout(flush, 0, this);

  this.pendingSubscribe.push(pair);
  this.liveStreams[pair] = callback;
};
When a number of requests are made in a loop (that is in the same call stack), the FIRST subscribe() call schedules a call to flushSubscribe() to be made when the current stack unrolls (not exactly! See the bottom of this post). All subsequent subscribe() calls in the loop will NOT unroll the stack and thus simply modify the payload, namely pendingSubscribe.

Now, if you are wondering why we don't just prepare the batched request where we loop, it is because when you are using a framework like Backbone, you want to have a layered architecture and that might not be easy. In my case, the panel was designed to be editable and would issue its own request; consequently, it would be unaware of any batching strategy.

Note: setTimeout(fn, 0) simply puts the function fn in the event queue's front but after any other similar functions already scheduled with setTimeout(otherFn, 0). CLICK HERE FOR PROOF. This doesn't affect the batching logic above per se. However, if we had an unsubscribe() call associated with a button click - then we'd need to be a bit craftier; as we don't want to send an unsubscribe request BEFORE the corresponding subscribe request. If anyone needs help with that, drop me a comment and I can update this post with a more complete example. But in brief, the unsubscribe request would check if we already have a pendingSubscribe, if so simple do an "early cancellation", i.e. remove the pair from that array.

Wednesday, February 8, 2012

Reflection on not using Reflection

SLF4J has done a good job by allowing users to continue using whatever logging framework they are currently using and then in time move on to LogBack. I like the way they require the adapter jar (e.g. slf4j-jcl.jar or slf4j-log4j12.jar) to have the org.slf4j.impl.StaticLoggerBinder and a few other classes for static binding. And those classes are supposed to implement interfaces from the aptly named org.slf4j.spi package

I thought they would load that class via reflection (=dynamically bind) but looking at the code in org.slf4j.LoggerFactory, it is not the case:
StaticLoggerBinder.getSingleton().getLoggerFactory();
So that might suggest that when they packaged the slf4j-api.jar, they had either slf4j-simple.jar or slf4j-nop.jar in the classpath, however that would result in a circular dependency as ILoggerFactory interface that the above method returns is defined in slf4j-api.jar. So, I think when they packaged slf4j-api.jar, they have a stub implementation of StaicLoggerBinder (and other similar classes) but then remove the .class files from the jar. Sounds a bit unkosher, doesn't it?


I was comparing this approach to how JDBC registers drivers:
java.sql.DriverManager.registerDriver(java.sql.Driver driver);
But this would be called from a static intialization block in the driver implementaion, e.g.
package com.usta.MyDatabase;
public class MyDriver implements Driver {
  static {
    Driver SINGLETON = new MyDriver(...);
    DriverManager.registerDriver(SINGLETON);
  }
...
And that is why we have to do the following:
Class.forName("com.usta.MyDatabase.MyDriver");
Connection con = DriverManager.getConnection("jdbc:mydatabase:dbname","guest","");
And getConnection() will ask the registered drivers if they can handle the url provided.

As we see with JDBC the Driver(the service provider) has to register itself, whereas, with SLF4J ILoggerFactory (the service provider), the registration (actually binding) is done automatically. This makes sense because with JDBC, we might want to use multiple databases at the same time.

Tuesday, January 3, 2012

String.intern() and concurrency

Have there been times when you needed some form of locking but could not find an easy way to granularize it? For example lets say you have

Map<Person, Double> salariesMap = Collections.synchronizedMap(new HashMap<Person, Double>());//BigDecimal might be a better value type

void giveRaise(final Person person, double raisePercentage){
    synchronized(salariesMap){
        oldSalary = salariesMap.get(person);
        Double newSalary = (1 + raisePercentage/100.0) * oldSalary;
        salariesMap.put(person, newSalary);
    }
}

Our first attempt at making the locking granular could be:

void giveRaise(final Person person, double raisePercentage){
    synchronized(person){
        Double oldSalary = salariesMap.get(person);
        Double newSalary = (1 + raisePercentage/100.0) * oldSalary;
        salariesMap.put(person, newSalary);
    }
}

Unless there is a guarantee that for any two Person instances p1 and p2, p1.equals(p2) implies p1 == p2, our locking on the Person object (the map key) would be wrong.

Such a guarantee can be obtained if all Person instances are looked up from a pool of unique Persons.

But then again what if an instance was obtained from the pool and a reference to it held long after it was removed from the pool? To avoid such a stale reference, would one now lock the pool itself too?

Map<String, Person> persons = Collections.synchronizedMap(new HashMap<String, Person>());

void giveRaise(String pName, double raisePercentage){
    synchronized(persons){
        final Person p = persons.get(pName);
        synchronized(p) {
            Double oldSalary = salariesMap.get(p);
            Double newSalary = (1 + raisePercentage/100.0) * oldSalary;
            salariesMap.put(person, newSalary);
        }
    }
}

But this beats the whole purpose of locking only on one Person key for granularity!

Luckily such a unique pool is maintained by the JVM itself for strings and the unique instances in the pool can be accessed via String.intern()

Thus we can obtain a granular locking as follows:
void giveRaise(final Person p, double raisePercentage){
    synchronized(p.getName().intern()){
        Double oldSalary = salariesMap.get(p);
        Double newSalary = (1 + raisePercentage/100.0) * oldSalary;
        salariesMap.put(p, newSalary);
    }
}

Note that the use of a ConcurrentMap instead of Collections.synchronizedMap in the above might lead to better parallelism in the calls to put above. Also, we could probably use replace() instead of simple put():
ConcurrentMap<Person, Double> salariesMap = new ConcurrentHashMap<Person, Double>();

void giveRaise(final Person person, double raisePercentage){
    synchronized(person.getName().intern()){
        oldSalary = salariesMap.get(person);
        Double newSalary = (1 + raisePercentage/100.0) * oldSalary;
        if(!salariesMap.replace(person, oldSalary, newSalary)) {
            logger.error("Salary was modified concurrently");
        }
    }
}

Also see comments on http://stackoverflow.com/questions/348985/deadlock-on-synchronized-string-intern

If the "key" is likely to collide with the key of another class of objects then synchronize(key.intern()) will introduce an unnecessary mutual exclusion among completely unrelated code. In such a case one might instead use a WeakHashMap<String, WeakReference<Object>> as a mapping from a string to a lock object. Then by having one such map for each class of objects, the accidental mutual exclusion because of shared keys can be avoided.