Home > Uncategorized > “Must scale” mill stone around the neck

“Must scale” mill stone around the neck

The API has been redesigned and restructured recently because of some flaws it used to have. Basically things tend to be simple when we demonstrate simple examples, but it is not about that actually. It is easy when we retrieve a record or a few of them. However if we say it is supposed to scale we may want to store a billion of them (one day ;)). It implies pointlessness of having something like a List or an array in the API when it comes to retrieve records. One problem is the capacity of the Java heap (always limited), elements addressing (most offen int size) and the other problem is how to pull records via REST to be efficient (one by one, chunks, all combined etc). The solution I have come up with seems to be … scalable at least :) From the API perspective we have a method that returns or to be more precise loads all records from the particular table (domain/table) and has a callback object passed as a parameter. Having done so, it can load no matter how many records sequentially and pass it to the given callback:

api.get(new RecordCallback() {
			@Override
			public void recordLoaded(Record record) {
				System.out.println("Record: " + record + " retrieved");
			}
		}, "myCoolApp1", "logs");

This code gets all records from the “logs” table under “myCoolApp1″ domain. If we assume this “get” may be further sliced by offset/num we may use that e.g. in some view rendering it as it is loaded (long scrollable something or just a webpage). It could be also used in some reporting/processing code - “process a million of records from table X”. Looking deeper into the code it is a single big streaming JSON response. I mean internally iterators are used and the REST resource logic iterates through the set of record keys and streams record by record (JSON expressions line by line). It means that we are kind of unlimited in terms of the number of returned records. Compressed JSON payload rendered from even huge volume of data may be still not that much to send from the cloud to the end-client. Retrieval like that is not the simplest, but meets the “must scale” requirement, so essential in any cloud-computing software and so hard to accomplish sometimes. Of course having implemented that it is easy to wrap it by some other methods that slice the result set from i to j and return a collection.

The last alpha-snapshot is available in the Download section. Sample usage of that method is
here

Uncategorized

  1. No comments yet.
  1. No trackbacks yet.