Saturday, August 30, 2008

PHIN Impressions - from the Savel

Greetings all...

Those who know me...know that I am a profound PHIN evangelist. This year's conference had a significantly different feel from the previous years - and was...a great conference.

I moderated many sessions - and all were very strong and well attended. Of course, I thought that the GRID-related sessions were outstanding.

Take-away issues: There was a lot of buzz around "grid" - and it's clear that over the next few months, a lot of coordination and communication needs to take place.

From the sessions and ad-hoc meetings I attended, it is very clear to me that we have just cracked open the door into the world of grid / distributed computing. What's behind that door? Something amazing... of this I have no doubt. What will it look like? Time will tell - and that's the beauty of research and discovery.

We are discovering the pieces of the puzzle - and with the amazing collaborative effort of our national and international partners....it's clear that we will be making amazing progress in the months ahead. It's a true privilege to be part of this effort.

Just think for a moment about a few of the pieces of the puzzle we have discovered:

Taverna (essentially yahoo pipes for distributed computing)
DCQL
VBroswer
Globus
MonALISA
OGSA-DAI
GridGrouper
Gridifying Services via caGRID tools

As we work closely with our public health partners - we look forward to providing the public health community powerful new tools and resources.

I can only imagine the progress we will have made - before the 2009 PHIN Conference.

Friday, August 29, 2008

PHIN Impressions

Pros:
I found the PHIN conference to be a valuable learning experience. I particularly enjoyed visiting the vendors in the exhibit hall and viewing the various Public Health technology solutions.

The session that I enjoyed the most was the Natural Language Processor presentation. The speaker was enthusiastic about the technology and provided insight on real world use cases. It was also nice to understand the technology that we are currently integrating with grid.

Last but not least, I would like to thank the person who came up with the PHIN bag idea. It works great as a laptop bag!

Cons:
I didn’t enjoy the lunch sessions because I had a difficult time hearing the speakers due to the noise level in the room. A separate networking lunch would have been instrumental in reducing the amount of chatter during those sessions.

My First PHIN Conference Experience

My first PHIN conference experience was awesome. I started my conference experience on Sunday attending two pre-conference sessions, Web2.0 for Public Health and PHIN101, both sessions were very good and provided a wealth of useful information. The next three days were packed with many interesting presentations covering everything from GIS applications to vocabulary. I have included links to some of the tools and projects that were mentioned during some of the sessions I attended, enjoy.


Tools/Projects:

Eclipse Open Health Framework - http://www.eclipse.org/ohf/

Cobertura - http://cobertura.sourceforge.net/

JFreeChart - http://www.jfree.org/jfreechart/

Selenium - http://selenium.openqa.org/

Taverna - http://taverna.sourceforge.net/

ConceptHub - http://www.concepthub.org/wiki/Main_Page

Creative Commons - http://creativecommons.org/

Thin Wire - http://sourceforge.net/projects/thinwire/

Open ID - http://openid.net/

Cruise Control - http://cruisecontrol.sourceforge.net/

Ivy - http://ant.apache.org/ivy/

Public Health Partners - http://phpartners.org/

*** Orange Data Mining - http://www.ailab.si/orange - very interesting gui programming tool using python scripts

Thursday, August 28, 2008

Grid UI Update

Here are couple of more UI's for the Grid, I found
1) http://wiki.arcs.org.au/bin/view/Main/FileTransferSGGCUsageGuide
from Australian research collaboration service which has the capabililty for gridFTP, third party transfers, and also a cool feature which is multi replica transfers. Its sort of like bittorrent with the variation like when we have a large file whose copy exists in several locations, we can specify multiple locations from where we want to get the file which will potentially create a local copy of the file faster.

2) The other one is http://commons-vfs-ui.sourceforge.net/ which is a Commons VFS implementation from Apache. It provide support for Storage Resource Broker and GridFTP.

Face to face collaboration

Due to people being in town for the PHIN conference, we had a face to face meeting this morning with many of the phGrid collaborators.

It's nice to finally put faces to voices and email addresses and even though telepresence is useful, there's still no beating flesh and blood for easily communicating. Although we still had a dial in line as some of us couldn't be there in person.

We talked with Pittsburgh, WashingtonU, UtahU, Johns Hopkins APL, Ohio State BMI, Columbia University and NCPHI Lab.

We recapped the services built and demonstrated and also talked about next steps.

The immediate agreed upon next steps are to plan out how to connect together the services developed by different groups, set up some sort of simple description set of each service to let others know about service information, link to each others public source code repositories, find some specific public health use cases to drive future development and extend our current set of services.

There's actually a lot more around next steps but it is too much to describe in this post. Over the next week, we're collecting the next steps together and will post some more ideas for future proof of concept projects.

We also talked about some ways to approach collaboration and source control. Ohio State has two really good examples of how they do it:
CaGrid Incubator
CaGrid GForge

Tuesday, August 26, 2008

PHIN Conference

The blog has been quiescent this week because the entire team has been at the 2008 PHIN Conference in Atlanta, GA demoing and talking about grid, as well as learning more about public health informatics.

Friday, August 22, 2008

I wasn't able to get the demo to ncphi.phgrid.net yet.

While the demo is set up and running on our internal servers... and we have a way to get into the internal servers for the Phin conference, and while I was able to get the databases and ogsadai bits installed on Dallas (I think) and NCPHI (I know)... I will not be able to get jboss on NCPHI and the demo available for public consumption.

I faced a lot of hurdles that delayed me, including limited server access due to higher priority testing and a 4:00 PM brownout/surge that killed all of our virtual machines. I was tooting along despite that when at 5:30, I discovered that dallas wasn't accessible on port 8443. That is the port OGSA-DAI operates on, and without it I won't be able to see one of the databases needed for the demo.. as a RODSAdai demo isn't that impressive with only one server.

I also tried getting JBoss installed on the NCPHI node, only to find that there were lots of errors generated on startup and a realization that port 8080 is blocked coming in.

Thus, we are facing a big case of interrupted ports, and no-one is in the building who can un-interrupt them.

At least the internal demo is working, and very neat I think.

Thursday, August 21, 2008

Remaining SRGM Test Cases

The latest version of SRGM_POC test case document is on the wiki right now with test case 4G to be revisited with either a little more elaboration or a change of approach. I am currently working with Vaughn Mcmullin on that. Apart from that, the case 4J ) Evaluate user interface projects for RFT management and operation (including, but not limited to, job starting/stopping/resuming, route management) ( Req. No 16) is what I am looking into. Early in our work we came across vbrowser which covered a lot of ground, in terms of functionality, as one of the possible UI's. So before we move further in this area, upon Brian's recommendation, I am posting a couple of features that we would want to zero in on while evaluating various UI's
1) Support for manual/interactive as well as automatic transfers. Although, as discussed ,automatic transfers could be more of an added feature rather than a basic requirement.
2) Ability to see the status of jobs going on at any particular time by being able to provide a way to query the persistent storage.
3) Capability to stop the jobs manually in the middle of a transfer, which could remove parts of payload already transferred.
4) As we know,Vbrowser is applet based so making a decision whether we would like to have a web based UI for our purpose.
5) Raj(from Globus) mentioned in one of the calls that they do have a UI implementation also so I will be following up with him whether their implementation covers some of these already.

I also found another UI implementation for GridFTP of WISENT, which is an eclipse based, cross-platform, open-source GridFTP client which allows to transfer, view progress and delete files. Although it would be interesting if they provide support for RFT. Please feel free to check out the url (a flash presentation of the application)
http://bi.offis.de/gridftp/screens/gridftp.htm
Please feel free to add what you would think, could be a feature of the UI for GridFTP.

VBrowser

Communicated through email with Piter T. de Boer from the Informatics Instituut at the University of Amsterdam. We plan to discuss our use of VBrowser as an application and as a plugin development framework after the PHIN Conference next week.

Piter is our first contact with the team developing VBrowswer

Wednesday, August 20, 2008

caGrid: DCQL Federated Query Processor News

One thing that Ron just learned regarding DCQL is that the Federated Query Processor (FQP) in caGrid will not support delegation until caGrid1.3 which should be out early 2009. So, currently one can do a federated query with authentication (https only), but authorization does not work.

For instance when we tried this at Utah our data grid services rejected the query from the FQP because the query came in under user workflow and workflow is not authorized.

More successful testing of secure grid services!

Ron Price, with the University of Utah CoE, sent over a new demo project this morning to test out the secure invocation of his grid services.

Utah is using cagrid's gridgrouper to manage security configuration (phirg uses manual maintenance of certs at each node), so I created a userid with cagrid (https://cagrid-portal.nci.nih.gov/web/guest/register) and Ron added my cagrid user the PH Grid group.

Ron made it pretty easy to test by providing an Eclipse project export and the required certificates to authenticate with cagrid's Dorian services.

I copied the certificates to the ~.globus folder on my workstation, specified my cagrid provided userid and password, and ran the service from inside Eclipse. Pretty straightforward

When I tried with an invalid password I get an exception thrown back from Dorian (gov.nih.nci.cagrid.authentication.stubs.types.InvalidCredentialFault).

So we tested securely invoking a Utah phGrid service over SSL using the cagrid authentication management from the NCPHI Research Lab grid.

Next steps will be to test out invoking the service using phirg specific credentials.

Demodemodemo

Greetings all.

I was out last week, and this week I was putting my nose to the grindstone... and I got some pretty neat things done with the RODSAdai demo.

I cleaned it up, so it has a simple look with only intuitive controls.
I managed to get a server-merge in the demo app, thus allowing for data to be spread across different servers and aggregated when multiple servers are selected.
I managed to get some geocaching set up so that the page loaded a lot faster after the initial runs.

I owe many thanks to Brian for being a great sounding board. He was the one that turned major redesigns in my head into minor UI tweaks.

Tuesday, August 19, 2008

Distributed Query Diagram

Some important stakeholders had some questions about how the distributed query built for the RODSA-DAI demo works so Tom and I came up with some diagrams that should help to more clearly describe.

The demo is basically showing how you can run a distributed query on multiple nodes to perform an aggregate count analysis (by syndrome) and then return the results to a coordinator node that combines the aggregated results into a single collection and display it in an application (such as Google Maps).

Also we're currently only querying by syndrome and returning counts, you could easily add multiple dimensions on the criteria and the results (age, gender, date range, region, etc. etc.). Link to wiki entry with images and visio diagram.

In case you wonder why the blog is a bit quiet lately, it is because we're all preparing for the PHIN Conference's grid related presentations and demos.

Friday, August 15, 2008

SRGM Progress

Here's the update on SRGM test cases. With the combined efforts in the past couple of weeks we have been able to cover a lot of ground on these test cases. The ones which are still left are :
1) Transfer HL7 file from partner node (node B) to CDC lab node (node A) using PHINMS issued digital certificates.
2) Evaluate user interface projects for RFT management and operation (including, but not limited to, job starting/stopping/resuming, route management).
and the ones which are knocked off the list are :
1) Evaluate reliable GRIDftp and WS-RM against OCISO guidelines and requirements
2) Evaluate reliable GRIDftp and WS-RM against NIST guidelines and requirements (specifically, but not limited to, the FIPS 140-2 Cryptographic Module Validation Program; FIPS 200 Minimum Security Requirements for Federal Information and Information Systems).
3) Configure lab node and partner node to use PHINMS digital certificate.
4) Transfer HL7 file from partner node B to partner node C using stored at CDC lab node (node A).
5) Test ability of Globus components to integrate into existing PHINMS infrastructure (transfer of program payloads through both PHINMS and Globus).
6) Evaluate capabilities for unattended retries/delivery of payload file.
7) Evaluate capabilities for end to end payload level encryption (Done thanks to efforts of Vaughn Mcmullin, the document still needs to be updated with teminal snapshots)
8) Evaluate capabilities for guaranteed once and only once delivery of file and the robustness of its duplicate file detection
9) Evaluate capabilities for large payload (> 100 MB) reliable delivery.
10) Evaluate capabilities for file payload encryption for strength and validity.
11) Evaluate capabilities for reliable data exchange for once-and-only-once transport of payload data.
12) Evaluate reliable GRIDftp for reliable messaging for reliability due to node availability

So all in all of the 14 cases we have finished 12, 2 more to go, the completion of which we will be on working on next week.

Thursday, August 14, 2008

PH case reporting scenario development project

An update on a project started a couple of months back for developing a PH case reporting scenario

The purpose of this activity:
To better understand the epidemiologists' workflow so we can align the PHI research grid activities to public health practice as well as AHIC case reporting use case.

Work done so far:
  • Abstracted public health related information transaction steps from the AHIC public health case reporting use case
  • An epidemiologist mapped the AHIC use case to a scenario in her daily activities
  • The PHI research grid project team mapped the draft scenario activities to the PHI research grid activities (i.e., current Proof of Concept (PoC) projects and planned PoCs)

Plans:

  • Select 3 - 4 consecutive steps from the use case scenario to work in more detail, i.e. describe in terms of who, what, when, etc (described in the scenario draft).
  • Develop context flow diagrams and task flow diagrams for the 3-4 selected steps
  • The PHI grid developers' team will use the above mentioned diagrams to better align the technical use cases
  • The PHI grid project design team will use the above diagrams (and the full scenario) to plan further proof of concept projects

Ref material:

Comments and suggstions will be greatly appreciated!

Wednesday, August 13, 2008

Axis problem has functioning workaround

To address the axis mismatch problem (described here), we developed a separate RODSAdai download servlet that can be called by the RODS application to return either a TimeSeries or SpatialSeries object serialized over HTTP.

Jeremy has tested this in his Pittsburgh environment and it is working.

Mario also mentioned a tool called Jar Jar Links (http://code.google.com/p/jarjar/) that may be able to help. We'll investigate this once the time is found.

So good news is that we're no longer blocking on the custom Axis 1.2RC2 that Globus uses.

Tuesday, August 12, 2008

GridMedlee Progress

Making progress on GridMedlee testing. I have altered the initial client code to add security options. I have gotten it to the point to where I am receiving a error of Unknown CA. In speaking with my team members this issue should be resolvable by running the code from a controlled node rather than my local laptop. I am in the process of moving the code over to the controlled node and will (hopefully) get it running from there.

Along the way I picked up some knowledge with regard to the GlobusCredential class. When the Introduce framework created the test client for GridMedlee there were several versions of the GridMedleeClient method that were created, some included a parameter for a GlobusCredential object. Upon further investigation I found that the GlobusCredential class can be used to create secure credentials.

I will have another update later today after I get the GridMedlee client built on the controlled node.

Axis mismatch is hurting RODSAdai

Dr. Espino discovered a pretty serious bug with how we are planning on using Globus that is preventing the RODSAdai code from running within the same war as RODS.

It comes down to Globus 4.0.5 using a customized version of Apache Axis 1.2RC2 (modified to specifically support Globus services). RODS (like many modern Java applications) also uses Axis. RODS specifically uses version 1.4 of Axis. The version of Axis used by Globus conflicts with the version of Axis used by RODS and is causing the RODSAdai code to fail.

The idea of RODSAdai is that it is a jar that can be included in RODS (or other Java applications) that calls out to OGSA-DAI services on the Public Health Research Grid. So it is beneficial to have the RODSAdai code be as drop in and easy to use as possible.

Jeremy and I had a call with the OGSA-DAI folks (Alistair and Mario) where they basically helped us work out that this is a limitation of Globus, not necessarily of OGSA-DAI. Mario is taking this question to the Globus community to ask how they typically deal with this situation.

In the meantime, Jeremy and I came up with a hack that should work for demo purposes until we get a proper solution from the Globus folks. We're running separate wars for RODS and RODSAdai. This will allow RODS to use the Axis 1.4 jar and RODSAdai to use the Axis 1.2RC2 jar that Globus/OGSA-DAI requires. The hack is that the two wars need to communicate with a HTTP based IPC. Not beautiful, but we'll see if it works.

I added a new servlet to RODSAdai to support this and Jeremy will test it out in his environment later today. Updates will be posted here.

We also found out from OGSA-DAI that they are working on a version that will work with Globus 4.2. Which should come in handy as Dan begins our 4.2 transition testing and planning.

Friday, August 8, 2008

Different Databases

Today, I got the RODSAdai test data split across different databases. I still have a few things to work on before the next demo... and I am going to be on vacation next week so I spent a lot of today documenting steps and locations of things and fearing the fact that people would be looking at my "gotta get this working" code.

Otherwise, everyone seemed to like the demo's we did this morning. So that makes me happy... and I will still have a week to polish things up and help externalize the demo before the PHIN conference... and I feel good that I got the basis of tests I wanted working before I left.

See you in a week!

Testing, Testing and more testing

  • I spent the day working with the GridMedLee services and I feel I have made a lot of progress. I modified the client code sent to me to invoke the GridMedLee service. Currently working through what appears to be a security issue with getting a status back from the GridMedLeeContext service. I have some ideas of a few things to do on my end to try to make sure my set up is functioning properly.
  • I also went back and re-ran the client for the ZipCode service for Utah and I am getting the same message from their service that I am receiving with GridMedlee.
  • I also installed WireShark locally to be able to actually view the contents of data being sent between the client and the server.

Stay tuned friends more to come as I delve deeper into Globus security.

Thursday, August 7, 2008

We've got dots, and they are indicating levels.

After a lot of tinkering with the google maps API and a lot of things that theoretically should work but didn't... I managed to get dots showing up on a google map and color-code them based on the level of incidents indicated by the query. They also load a bit slowly because of how I had to encode the data series, but it works, and that is better than before. I also have lots of ideas of how to make things faster and prettier given time thanks to Alastair and Mario of Ogsa-Dai, who helped me kick around ideas.

Next steps are to bisect the databases and put them on different database servers... wire up such nifty bisections into the demo app, and if time allows, start integrating the results.

I am happy. I got dots.

Globus 4.2 & OGSA-DAI News

Learned the following through the grapevine:

OGSA-DAI will not be part of GT4.2, although it will be possible to download, install and configure it separately.

In short OGSA-DAI can be added as extensions to the core GT4.2 installation.

Globus thinks XQuery may bridge federated queries across an OGSA-DAI / DCQL (caGrid) federated query services.

Tuesday, August 5, 2008

Building Demos

I got some very cool news from Dr. Espino today, that he had gotten RODSAdai building and happy on his end, and will probably be able to integrate all the code within a week. Furthermore, he added much better logging code (log4j complete with properties files), some null checking, and some defaults. I have successfully updated everything and adjusted to fit my own environment.

On the demo end, I have discovered many things that will not work with google maps as I envisioned them. I was thinking of doing real-time geocoding of zip codes, but that would turn one javascript request to google into about 30, and there are already free databases of zipcode geospatial locations. Furthermore, I am not sure of the best way to just lay out all the data given google's finicky timeouts (thus, if your application takes a while to generate a KML file, you will need to generate the file and place it somewhere in an accessible spot and then tell google to find it there).

Considering I am not exactly working on an accessible node, I am beginning to think it might just be easier to use JSP loops to build an array and then place that array into one of googles marker manager objects. Perhaps there is a way to pass KML into a google map directly? I will probably tinker with this at home.

Friday, August 1, 2008

Preppin' for Demo's

So, today we had the initial big discussion of how we want to demonstrate the neat things that have been coded for the upcoming PHIN conference.

My first priority is to Dr. Espino to try and get RODSAdai working with RODS so he can show multiple data source RODS info pulls.

Otherwise, while I am not helping him, I have the following bits in my stable:

  • RODSAdai through google maps
    • Split data sources for to demonstrate different data in different places.
    • Set up a check-box "one or many" database location selection.
    • Enable a spatial series that can display through google maps and/or google earth.
  • Server Location Admin Screen
    • Have the ability to declare a new server/resource location and modify existing locations.

I'm looking forward to it. I have also reached out to the OGSA-DAI folks for help with the Google Maps portion of the demo since they already did it once with their demo.

SRGM Progress

I talked to Raj(at Globus) regarding providing more details regarding the RFT Flow diagram provided. I will be upgrading the version on the wiki as soon as I get it.

On the test cases, the PHINMS certificate has been installed on the nodes. So no a successful transfer will knock off another test case 4E : Transfer HL7 file from partner node (node B) to CDC lab node (node A) using PHINMS issued digital certificates (Req. No. 05 & 06).
The test case document on the wiki will be updated as soon as the test is finished.

Also as Brian, Dan,Vaughn and me talked, one of the enhancements mentioned was having the option to set flags for optional parameters regarding the file being transferred. One criteria which came up was having a flag to set for Time Sensitive Data(as in if the file does not reach within specified time frame, discard it). Globus folks agreed that it would be a good feature to have and they can include it in the next version. Also, please feel free to add if you think of some more enhancements and we can convey them to Globus folks further.

Local Globus Install Completed

I completed the local installation of Globus with a Simple CA (using section A.3 from the Globus 4 Programming book as a guide) on my laptop yesterday afternoon. This will allow me to deploy the Columbia medLee project locally for client coding and testing purposes. It took a couple of tries to get the Simple CA installation to function correctly with Globus.

Some of the lessons I learned when installing Globus:

  1. Certificates -- always select root install for certificates (option r) rather than the local install (option l)
  2. Manually copy the *.pem files from globus/etc directory to /etc/grid-certificates directory
  3. Edit the system bash resource file to source both the VDT setup and Globus user env scripts so that the necessary environment variables are available after logging into the system