Friday, March 6, 2009

More PHGrid Architecture

On Monday, February 9th, GB held a session to discuss review the progress of the PHIN-SRM proof of concept project and potential GAARDS-related PoC next steps. During this meeting he requested an enterprise architecture review of how PHGrid will look and what components are necessary for deploying services throughout a grid to benefit public health informatics.

Over the past three weeks, Moses (PHIN contractor), Vaughn (PHINMS contractor), Charlie (PHIN-SRM contractor) and I (PHGrid contractor) have met to draft an initial release of four diagrams that seek to address this need for an architecture. Our goal was to create an architecture that is meaningful to business stewards, project managers, architects and developers. Statistician George Box once said "All models are wrong; some models are useful". This set of diagrams is meant to be useful.

All four models are available on the wiki at:

PHGrid Architecture - This model shows the basic components of the PHGrid and where they fit with CDC's partner structure. Some components must be hosted at the CDC (such as CDC application data), some components must be hosted by our partners (such as State application data) and some component lie in between (collaboration portals, registries, etc.).

PHGrid Service Stack- This model shows the services available within a specific node. It is separated into a small list of program services that different partners may wish to install and infrastructure services that are required for minimal functionality.

PHGrid Node Functional Deployment- This model shows the software components within a specific node.

PHGrid Roadmap- This model is perhaps the most arbitrary as it requires more input from the relevant program and system stewards. It attempts to show a timeline for how the services, components and infrastructure can be built out. Each timeline is marked with what has been completed or is currently in progress and show the required dependencies. To the right of this marker are the items that require prioritization based on your decisions. For example, along the Infrastructure Timeline,

Please post your feedback on these models and how we can change and improve them. Based on the feedback gathered in response to this email I may schedule a time for us all to meet again and review potential next steps.

Here is the visio in case anyone would like to submit their comments within the models.

1 comment:

Ron Price said...

Using GAARDS allows the security to be much more scalable, maintainable and flexible.
I've been involved with many grid projects over the years and raw GSI killed or nearly killed many of
them. I think GAARDS is an excellent choice.

Another challenge I've run into over and over again is that gridFTP and RFT are tied to a unix/linux service, so making them work
on windows is not easily accomplished. In the domain specific services section of the phgrid service stack I see "Transfer" and "SecureReliableFileTransfer". The caGRID
transfer service is not tied to any specific operationg system and last I new the reliable file transfer (RFT) was dependeant on unix/linux.
If RFT now works on any operatiing system then I highly recommend RFT. The reliable functionality of RFT would
make the phgrid data transfer mechanism much more robust.

It seems there may be a need for semantics regarding the UDDI registry. For instance the registry could contain
two services with the name "agent". Now just for example what if one is a viral agent and one is a realestate agent.
While using ISO 11179 with an ontology is one option to achieve semantic meaning, but maybe the wiki list of services
is enough. Although having a fully a automated discovery and then use of a service would not be possible with just the wiki because
a human would probably have to go figure out which agent they want and then use the one they need. I think it may be the case
that semantics will be necessary in other scenarios as well, but maybe it is best to see how the caBIG semantics project goes.

I think a delegation service needs to added to the list of services. Without a delegation service you cannot execute a federated query with both
authentication and authorization. You can only do it with authentication. When caGRID 1.3 comes out later this year (hopefully) the delegation service will be complete.

I noticed a portal coming up on the applications time line. For the EPHTN project at Utah we used the caGRID Portal which ties into GAARDS and you get a grid service mashup and some querying tools with it. The caGRID portal can be modified and extended as needed since it is open source.
Here is an example of the caGrid portal:
The link above is for th training grid and you can quickly setup an account to login as they are operating at LOA1 for now.
Using the caGRID portal should save some time.

Lastly, it may make sense to add a "Partner Service" to the grid service list. It seems that allowing advanced IT personal to create there own grid service is a real possibility. Especially given this thread on the blog: