Thursday, April 24, 2008

OGSA-DAI over large sets

I had a few discussions with Alastair over how we think OGSA-DAI would work over large sets... like hundreds of thousands of distributed nodes

Some queries would be able to just crawl... namely things like aggregations (counts of an infection by zip code) where the set wouldn't get large just the counts would increase... but if you were doing joins on national or global data... it would probably need some sort of tiered architecture.

In my mind I see a MonALISAextension being built to monitor and manage OGSA-DAI instances... handling the delegation of what collects what data from 100 or so nodes and then propogates it up... Alastair pointed out that there were already plans for OD to manage itself in a assymetrical tree concept. It would probably be merged.. with MonALISA providing feedback about which nodes were reachable and their lag times... and OD then selecting to pull the queries through it's more efficient channels.

Otherwise, I have since started to focus on discovery concepts for my demonstrations, being able to use the APIs of globus and OGSA-DAI to figure out what data resources are available and at what locations.


Jeremy Espino with RODS Laboratory said...

Peter, excellent forward thinking. The ability to "reconfigure" the grid architecture (maybe Gnutella style) in a smart way for optimal processing will be important for scalable performance. I know that there has been also a good amount of work done in smart IP routing using network stats as you suggest.

Tom Savel, MD said...

Amazing work Peter!!!!!

John Stinn said...

Peter --

Adding to the kudos, these are the kinds of thoughts that will really move us forward. After deployability (addressed thanks to Dan), security (much of which will be addressed in the PHIN MS evaluation), the questions of those not intimately familiar with the work going on reside around these types of performance issues. Any exploratory research in that arena will pay huge dividends...