Monday, June 30, 2008

Rodsadai compiling again, server move giving familiar errors

This morning I tried adding the client-settings.wsdd file to the resources directory (not the server-settings.wsdd) and the rodsadai tests magically started working again (yay!)

The rest of the afternoon was spent setting up Ogsa-dai on it's new home, and having familiar cryptic errors thrown the first time I attempt to connect. Tomorrow I shall poke both more.

Friday, June 27, 2008

Jar hunt finished, maven repository updated, musical servers commencing

So, the jars have been included in the new repository (I had to spend some extra time generating some extra files that is necessary for remote repositories)

Right now, we are dealing with moving a server between VMs, at the end we are hoping to have an ubuntu globus server (the previous server was corrupted by attempts to install dual mysql databases).

Also, for some odd reason, the tests that I was so elated to have working yesterday... are now not working citing an "OGSA-DAI resource null is unknown" error. I am thinking it might be some sort of "the IP has changed" conflict... but at the same time the other OGSA-DAI examples are still working. Either way, I don't think I am going to be able to diagnose it today, and am out of ideas of what to try until we get the other VM set up to match this one.

Thursday, June 26, 2008

File Transfer on the grid

Today, with the help of Dan we tried to recreate some functionality which the grid promises. It would be safe to say, we got pretty acceptable results. First we tried to transfer a file >100mb over the grid using just normal gridftp, it went through fine. The next step was to test the reliable file transfer. The way we decided to check the reliability was to power down one node while the file was transferring (to simulate a test case scenario). So Dan started the transfer, pulled the cord out for a minute and as soon as he plugged it back in, voila, it resumed as if nothing happened. Another thing, Dan informed me about was, that the timeout can be set by us as to how long we want to leave the file in the queue before it terminates the transfer. Right now it's set to 60 minutes, so we decided to increase the time and Dan will pull the cord before he leaves and plug it back in tomorrow morning. Now that would be a real extreme test of one facet of reliability. Another thing which we noticed was that upon powering back the node, it did not ask us to whether we want to resume or not, it just picked up automatically, which works for us because that ensures that as a user, once the transfer starts, you don't have to monitor whether the file reached or not and I am assuming that's the way PHINMS functions too.
Currently I am working on creating a test case document, to record all these cases and to outline what all are we going to achieve based on the requirements laid down by the project charter.

13 new jars

So... 13 new jars were added to the RODSAdai project in order to get the simple client test working (as opposed to just compiling). 2 of them were extra Ogsadai jars... and the other 11 were based in globus, and it took me all day to ferret them out.

I tried deploying the jars to our repository, but it seems that sourceforge doesn't like something because I kept getting 405 (method not allowed) errors, thus I have sent an email to Anurag to start the process with him since he set up the repository initially...

The other reason I sent him the jars and dependency info is because my main workstation is in the process of being cloned and I wanted to make an email-based backup.

Tomorrow will most likely be spent creating a working Ubuntu globus node, as my current one got corrupted.

Postgres Errors

The Postgres database on the staging node was failing to start the postmaster due to a recent IP address change. I updated the postgresql.conf file to reflect the recent IP change. Postgres started fine after the configuration file was modified.
Postgres version: 8.1
Original Postgres Error: autovacuum not started because of misconfiguration
Fix: Modified the /var/lib/pgsql/data/postgresql.conf

Tested 100+ MB file GridFTP and RFT file transfers with Anurag.

Wednesday, June 25, 2008

Going on a JAR hunt...

Today, I started beefing up the standard JUnit tests in RODSAdai... and remembered that a LOT more code is needed to run OGSA-DAI implementing clients than to just build it.

So, since the new test code runs an OGSA-DAI test... I am having a lot of attempts at "mvn compile", a test error or failure will result, so I check the log, find the class it needs, and make a reference to it, installing it in the maven repository if it was not in the global one, and repeating.

This will probably consume me tomorrow and maybe a bit of friday. I mean it when I say a lot of jars.

Tuesday, June 24, 2008

Spatial Series despite ambivalence

Today I put some thoughts down on creating a research-only node housed in places like developer's spare rooms... to be completely heuristic but also serve as a place to work on things when the CDC lab was inaccessible. I also found that the Ubuntu Globus node was corrupted for whatever reason (I think it was trying to get an independant MySQL server and VDT-installed globus to co-exist) so I started the process on creating a new node to rebuild and then transition-to.

I also crafted out how a spatial series could be loaded based on HL7 Table data and some of the constraints that would have to be defined (column order/naming, whether certain values were going to be parsed from the query, etc)

Tomorrow I am hoping I get some sample queries, test my spatial series processors, write a time series processor, and get some feedback on the constraints I have thought up (and how to make them flexible using properties)

Container issues on 1005

Globus node 1005 is failing to start the Globus container. I am currently troubleshooting the following error on node 1005:

: 55: ==: unexpected operator
.................................. ^[[31m*^[[39;49m
### 2008-06-24 10:09:56 vdt-control(do_init) enabling the init service 'globus-ws'
### 2008-06-24 10:09:56 vdt-control(do_init) starting globus-ws: /etc/init.d/globus-ws start
### 2008-06-24 10:09:56 vdt-control(system) /etc/init.d/globus-ws start
[: 55: ==: unexpected operator
WARNING: It seems like the container died directly
Please see $GLOBUS_LOCATION/var/container.log for more information
Starting Globus container. PID: 7047
### 2008-06-24 10:10:06 vdt-control(do_init) starting 'globus-ws' failed: 1024
### 2008-06-24 10:10:06 vdt-control(clean_up) all done

Monday, June 23, 2008

medLEE NLP Grid Service Progress

Dr. Albert Lai of Columbia CoE, on June 18, wrote...
Yes, we were able to develop a functioning wrapper service for MedLEE. We
ended up wrapping a batch job version of MedLEE instead of the client
server version.

We used Introduce, a piece of software developed by Shannon, which created
most of the stubs necessary to deploy software onto the grid. However,
since this used the Java core of Globus and not the C core currently being
used in the PHGrid efforts.

This leaves the current service unsecured. One of the steps we still need
to do is to somehow figure out how to integrate the certificates and all of
the authentication/authorization into the Java core.

There were a number of problems launching MedLEE from Java. However, I
have hacked together a (suboptimal) solution to make it run. There are
some other limitations of the version that we have put together, but it
seems to be sufficient for any demo needs. Right now, we package files and
transfer them to the server in a way that would limit the size of the data
transfer to available memory of the process as it is my understanding that
the files are serialized and then transferred. There are also some other
specific options that we currently do not support via the grid.


Now we just need to work on getting the grid service deployed on the public health research grid.

Friday, June 20, 2008

Project Management update

We held the a project kickoff meeting for the CDC and Contractor resources who will be working on the Secure Reliable Grid Messaging PoC. We'll be meeting over the following few days to get together a set of detailed requirements and tasks based on the project charter. Following from this will be timelines and resources on the tasks so we can establish more fine grained expectations.

I met with Tom Brinks this afternoon and we started working on a work breakdown structure to define the functional areas within the project. We'll post a draft on the wiki as soon as something digital exists.

Thursday, June 19, 2008

RODSAdai now working more like it is supposed to

So, today I moved a bunch of the argument building (so that you only need two arguments instead of 6) and query invocation logic into the rodsadai superstructure. Thus, things are working more as I envisioned them and as a whole, as opposed to being peices tested independantly of each other. And I did this despite a power outage in the lab that reset everything in the middle.

Tomorrow is moving the test into the automated structure, getting the secure peices working, and then I delve into translating a tuplerowset into a spatial or time series. I will also be meeting with Dr. Espino and it's nice to have some pretty good progress.

Reading and validating further documentation

Read through the FIPS 140-2 document sent by Alison regarding the acceptable cryptographic component and acceptable algorithm. Well here's my finding so far, GridFTP uses a x.509 certificate which has 3 main variables -
the certificate, the certificate signature algorithm and the certificate signature. The certificate has attributes such as version, algorithm ID, serial number, issuer, subject, validity, subject public key info, extensions and several other optional ones like subject and issuer unique identifier. The subject public key info attribute is further detailed by the public key algorithm and subject public key, while validity attribute comes has further options for an upper and lower date limit, which eventually decides the life of the certificate. The public key algorithm used is RSA which according to the same email by Alison is part of a list of FIPS approved algorithms. So looks like so far we are on the right track. Comments !!

OGSA-DAI license

Peter had sent an email earlier to Mario Antonioletti about us putting the OGSA-DAI jars in our local maven local repository for the ease of accomplishing a maven build on any local machine for the projects referencing OGSA-DAI jars. After some discussion between Mario and Brian, it was agreed upon that at present, it would suffice if the OGSA-DAI license resides in the same place as the jars, in the repository. So, in conformance with that, the OGSA-DAI license is now put under as part of the repository.

IHTC Grid Node Installation

I assisted IHTC with a VDT grid node installation. The install errors experienced by IHTC were due to a missing prerequisite software. (patch 2.5.9 or higher) The IHTC engineer installed the missing software package and restarted the VDT installation.

Previous Error message sh:patch: not found

Wednesday, June 18, 2008

First Weekly Status Report & New Wiki

The first weekly status report is up on the new wiki. These reports will be made available some time between COB Friday and SOB Monday each week.

The new wiki is up and is available for sharing documents and posts and will be a store for information until an official collaboration portal is made available by CDC resources. Our initial cap for storage space is 100MB, but that should be fine for a little while.

Making some progress.

Today was trying to get my bearings after my main test server was moved. I was able to confirm that I could get data via non-secure methods from said test server, but for some reason trying to connect securely resulted in a lot of connection refused type errors (it wasn't doing this yesterday)

That, and there is still some integration issues with this server's VM in it's new home that are being sorted out.

In the meantime, I will start working on the spatial series parsing and dynamic resource loading... then I can work on the security when things stabilize.

Otherwise, I had Jury Duty today, and that ate up a bunch of my time. I'm happy I made it in, however.

Tuesday, June 17, 2008

Progress Report

I read through some more vbrowser documentation. Looks like it'll help us do the job. Also got some screenshots from Dan for the current PHINMS application. Brian told me he would try to see how i can attend one of the their demo presentations. That will be a great help in outlining the expectations from our application in terms of usage and as well as functionality. Looking forward to the meeting on Thursday, hopefully that will also throw some more light on the requirements. I also have my hands on the guidelines for secure message transport, from Brian. I am still reading through it, which should help in grasping more of the information given in the Thursday presentation. Also talked to Brian on the phone, and the way we want to proceed is check off the items from the project charter so we can track down what's left to accomplish. So as of now, since I have secure access to Dallas node with the help of Dan, we have accomplished

No. 7 :Transfer HL7 file from partner node B to partner node C using stored at CDC lab node (node A) and right now I am working on

No. 2 Evaluate reliable GRIDftp and WS-RM against NIST guidelines and requirements (specifically, but not limited to, the FIPS 140-2 Cryptographic Module Validation Program; FIPS 200 Minimum Security Requirements for Federal Information and Information Systems).

Moving all that NCPHI example code into RODSAdai

So, Today was spent testing all the links between servers, and then a lot of it was spent creating new classes and pulling a lot of the implementation from the old NCPHI examples I was working on previously... making it so that the data pulling classes get the query and display the results through a text processor.

Now, the lab is undergoing a bit of musical VMs, but then it will be back to testing and enhancing.

Monday, June 16, 2008

Security Evaluations

I had a call with a NCPHI security steward to ask about some of the security evaluations that we'll be performing with the Secure Reliable Grid Messaging PoC project.

She pointed us specifically to the FIPS 140-2, NIST SP 800-53 and eAuthentication guidelines.

It's a lot of reading, but we'll need to evaluate the algorithm type and strength used by Reliable GridFTP, integrity checking of the message payload, and certificate usage.

MySQL to PostgreSQL

Today I spent a majority of my time setting up a PostgreSQL resource for the RODS sample database Dr. Espino sent me last week.

PostgreSQL doesn't support the SQL convention for inserting multiple rows at a time. But, I found a way around it by writing an insert function and re-invoking it (which is apparently faster anyways). Thank you Internet.

Tomorrow, I set up the test pulls, and if that goes quickly, I set up the converter and start producing time and spatial series.

Thursday, June 12, 2008

Test cases for Globus GridFTP

First draft of Test cases for Secure Reliable Grid Messaging

Here's how we plan to proceed

A couple of days ago, a blog entry was made about Vbrowser. I had a chance to go through some documentation. It does provide pretty neat GUI to test the Grid messaging system. At present, PHINMS provides
* Encryption
* Authentication
* Transport

So these are the test cases we'll be looking to test while doing transfer of some data files
1) The transfer must support Grid Security infrastructure and authentication
2) Test authenticated third party data transfers(between Dallas and Tarrant) among two nodes controlled by third node (SUSE)
3) Test authenticated data transfers between 2 nodes ( SUSE and Dallas)
4) Check for support of reliable and restartable data transfer (by bringing down a node during transfer)

New Potential Collaboration

Yesterday, I had a productive call with Shannon Hastings and Philip Payne from Ohio State University's caGrid grid architecture group, which has extensive (over 5 years) experience in designing, developing and implementing grid architecture for caBIG, NIH-UK and several grid projects in Ohio. Ohio State is also working with Columbia and Albert Lai to develop a grid service wrapper for medLEE. Albert is in Columbus today and tomorrow working with Shannon's team on the Globus service with medLEE. Based on these results, the PHRG may be able to deploy the service developed by Ohio State in addition (or hopefully instead of developing) to using our own wrapper.

Shannon also brainstormed about some potential future scenarios that involve paralleling the processing done by medLEE to allow for much quicker performance.

I will set up a call with Tom, Ken, Philip and Shannon once schedules allow to further explore collaboration opportunities.

Shannon also specifically mentioned the security architectures that they have worked with over the past five years and pointed us to GAARDS as a potential way to improve the management of security for nodes and services.

A quick draft set of test cases I am working towards

As recommended by Brian

RODSadaiProp test cases

  • Load first server location from the properties file and poke the server for list of data resources. Will check to make sure for connectivity and proper properties file formatting.

RODSadaiDataProcessor test cases

  • Pass in a set of data rows and process to data classes and make sure data it meets the expectations

RODSadaiQueryClient test cases

  • Query a known test server with known test parameters and make sure that it returns the proper data.

  • Query a known RODS server with known test parameters (querying a constants table) and make sure of results

Now loading server lists from properties files

I hunkered down to do some serious coding of Rodsadai, and I managed to learn a lot more about how maven likes to test things. The most important thing being this: if you are running a test, the properties files (and any other resources you put under the resource or META-INF directory) apparently need to be duplicated in a resource directory under the test peice too.

It makes sense when you think about it, that way you can put in test data you know is there and not risk the configurable data being changed or having reserved names just for the sake of testing, but it was one of those painful "what is going on!?" types of learning.

That being said, the RODSAdai class can now load a server list from the properties file in the format I laid out. Tomorrow, comes the setting up of the rods database, and architecting of how GTSecureClient will be modified to best pass to TimeSeries and SpatialSeries.

Daily Lab / POC Activities


  • Configured Postgres on lab 1005
  • Created littleblackbook and rods_ogsadai databases for OGSA-DAI development
  • Tested GridFTP transfer with Anurag. I will meet with Anurag on Friday at 8:30am to discuss grid functionality.
  • Mapped grid credentials on lab 1002 to a generic grid account
  • Reviewed VBrowser documentation

Wednesday, June 11, 2008

More code and a teensy unit test

After talking with Dr. Espino some more via IM, I showed him some of the code he entered and he updated the files for the time series and the spatial series.

I also created a factory for loading the test property updated a starting properties file and wrote a test-case that checks the properties file is loaded and that a simple test property shows up. Tomorrow I will load the property parsing and serverList building, and then it is onto the modifications of the GTSecureClient from Ogsadai.

I checked in all the new stuff, and for those of you looking for the rods-test directory... it is, but you will have to email Jeremy Espino for Access.

Status Reporting

In order to better track the three proof of concept projects, I'll be posting status reports to this blog weekly by close of business each Friday. These status reports will be fully compliant with the CDC Unified Process.

This should provide a good level of transparency to the community as to how the PHRGrid projects are progressing. This is also more testing as to how open source development can be used on projects.

Tuesday, June 10, 2008

ESP Syndrome Definitions

I received an email from Dr. Tokars (CDC) that he reviewed the ESP syndrome definitions and that they are fine to use in the ESP syndromic surveillance module for the Grid Biosurveillance Capacity PoC project.

Of course, this doesn't mean that BioSense is changing how it defines syndromes.

Why Maven?

Dr. Savel asked me a very valid question (offline) of why we are looking at Maven. I'm posting my answer here as it affects our programming method and others may ask the same question.

Maven is a build and configuration tool that lets programmers describe a software project in xml in such a way that any programmer anywhere can use the maven tool to compile, build, deploy and document a project in a single common, automated manner.

Without maven, Peter (until recently the only programmer working on the PHIRG) would need to spend hours helping you configure your environment, compile your changes and deploy your changes for testing. This is difficult as new programmers trickle onto the project, but impossible in a distributed & open source project.

So maven makes open source, distributed projects possible. Now that we have a build and a repository (thanks Anurag and Peter), any developer in the world can access the source code, make some changes and test them out. I hope that this will lead developers to send in change candidate snippets of code for a committer to evaluate and commit, but my hopes aren't too high.


I spent a lot of today drawing out how I wanted the interfaces to look for the new RODS over OGSA-DAI application; RODSAdai. I also talked over some of the specifics and test cases with Brian (who reminded me of the beauty of factories).

I also mavened up an eclipse project, tinkered with it a bit and got some preliminary Java files set up, and checked in the stubs to the RODS-NCPHI test server.

Tomorrow, expansions, filling in the code stubs, lots and lots of changes, and maybe some test cases run.

Update of maven local repository

Finally, we have a internal repository for maven at . Now in order for the code to use this repository, this is what needs to be done on the local machine.

I am assuming that maven has been set up on your machine. Navigate to {Maven_home}/conf folder and rename your Settings.xml to Settings.xml.bak. Copy Settings.xml from to your [Maven_Home]/conf folder and thats it, you're good to go. Just download the projects from, run mvn install and it should work.

Monday, June 9, 2008

A possible GUI to the grid...VBrowser (very cool)

At this past HealthGrid conference, I had the fortune to moderate a presentation by Silvia D. Olabarriaga, PhD, from the University of Amsterdam. Among other things, she presented issues around the use of the tool - VBrowser.

What's so great about Vbrowser? It is very elegant, and may provide our team with an open-source solution to a key requirement for the Public Health Informatics Research Grid - a robust and intuitive GUI interface to work with Grid services- both local and remote. VBrowser is part of the VL-e Toolkit.

The VL-e Toolkit (VLET) is intended to assimilate all useful tools developed in the VL-e project in a structured Software Development Environment (SDE). The main development platforms are Java 1.5 and the Globus Toolkit 4.0. (Also Eclipse is recommended as main IDE). All software developed in this project is compatible with these platforms.

Keep in mind that there are other initiatives: P-GRADE, g-Eclipse, Gridbus, and virolab.

Release Information
Download from gforge

Hope you all find this of inerest!

Maven Local Repository

I worked towards the creation of a maven repository. So far after reading much material on the web, it looked like a simple deal. However looks like I've hit a wall for now. I created a repository in the existing project and tried to deploy a sample junit jar to the folder with the proper structure. It would not go citing authentication failure. Well it figures that maven is supposed to pick up authentication info from the settings.xml file. So I put it in there but it still ignores it. Tried a deploy it to my local machine

D:\>mvn deploy:deploy-file -DgroupId=junit -DartifactId=junit -Dversion=3.8.1 -D
generatePom=true -Dpackaging=jar -Dfile=junit-3.8.1.jar -DrepositoryId=local-phg
rid-repository -Durl=file://
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'deploy'.
[INFO] -------------------------------------------------------------------------
[INFO] Building Maven Default Project
[INFO] task-segment: [deploy:deploy-file] (aggregator-style)
[INFO] -------------------------------------------------------------------------
[INFO] [deploy:deploy-file]
Uploading: file://
118K uploaded
[INFO] Retrieving previous metadata from local-phgrid-repository
[INFO] Uploading repository metadata for: 'artifact junit:junit'
[INFO] Retrieving previous metadata from local-phgrid-repository
[INFO] Uploading project information for junit 3.8.1
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 second
[INFO] Finished at: Mon Jun 09 16:53:21 EDT 2008
[INFO] Final Memory: 2M/4M
[INFO] ------------------------------------------------------------------------

it forms perfectly. I am sure there's some very small thing I am missing. Also need to get with Dan to set up my machine to start testing the grid nodes for messaging. Maybe will ping Peter also if I dont see any light till tomorrow afternoon.

Updates on ESP

I updated the charter for the Leveraging Grid to Enhance Biosurveillance Capacity PoC project to include goals and requirements including the Harvard Center of Excellence Electronic Support for Public Health (ESP) syndromic surveillance module. The charters are under review now but I will update the site with them once they get out of draft stage.

Regarding the ESP project, Ken and I had a call with Dr. Ross Lazarus about two weeks ago (blogged about previously). During this meeting Ross mentioned that he is looking to design: 1) the syndrome definitions used, 2) the data field format for the generated reports and 3) the means of data transport.

We've been exchanging emails regarding item #1 and now have a description of the syndrome definitions that ESP has used for 12 syndromes for the past 18 months+. I've sent the definitions on to Dr. Tokars in the Biosurveillance activity at the CDC for review. Based on the comments this should move us forward to determining which set of syndrome definitions that ESP can use for reporting syndromes.

Related to this is the BioPortal, a repository of open medical ontologies. Dr. Espino mentioned this as an example of ontology repositories. One of the approaches to multiple syndrome definitions is to tag syndromic surveillance data with a pointer to the syndrome definitions that were used in prearing the data. For technologists out there, this is very similar to XML Schema and Namespaces.

JUnit and Use Cases

Today I spent some time researching how JUnit works (JUnit is the preferred testing framework of maven, which is nice, considering that whenever you build to deploy, it will run the test cases and let you know if something you adjusted or refactored broke).

The next thing I did was to start drawing out the application by the top-level use cases (the use cases will derive test cases which will derive code). The apps I build for the RODS interface will basically be an extension of OGSA-DAI's discovery and query algorythms. A server list stored in a properties file (which can later be upgraded to a database of some sort) will be used to pull up the appropriate server (by key) and the default Ogsa-Dai data resource to hit on that server... the query gets passed on, and then the specialized class structure gets returned.

Now I am thinking of the smaller cases and how all the classes will break out around each other. After that I'll start actually modifying OGSA-DAI example classes and writing wrapper classes and interfaces (and their test cases). Right now I just want to get it all on paper so I have a good reference for which peice I am working on and where it fits.

Saturday, June 7, 2008

Syndrome Definitions

Ken asked...

So in a distributed system doesn't it make sense that these definitions
should be regionally (and/or organizationally) defined and maintained? A
truly distributed system should be able to handle this. Am I crazy?

Brian answered...

No, you're not crazy. This is actually the combination of the
distributed query use case with the distributed computing use case.
Because in order to get a valid aggregate view of combined regions you
will need to use distributed computing to reclassify syndromes using a
shared definition.

For now, we are assuming a shared definition (i.e. only using RODS nodes
for the distributed query). The next stage is to keep a registry of how
syndromes are classified (like the Stanford ontology registry) and query
based on systems that match particular classification schemes. The end
goal is to be able to send out classification algorithms and get a
single response back with each syndrome count classified in the same

Friday, June 6, 2008

Repositories and Builds and Tomcat, Oh My

Yesterday I was here for a while trying to get SVN to behave. I found that the "gov/cdc/phgrid/ogsadai/example" path was stuck in the root directory, and since I was trying to create a the same path inside the maven directory structure (under /main/src/java/, I think it was) it refused to add it there. Even after I removed the old gov structure. I was thinking about scouring the old repository, had Brian try a few things, and still hit roadblocks

Then I had the late arriving epiphany to just move the tree, and that worked, and it was good.

Then I gave Anurag a zip of my local repository files, so that he could create a public repository.

I also had a wonderful discussion with Jeremy about a RODS-OGSADAI interface. I had a bit of a paradigm shift, but in the meantime, I have a good idea for what sorts of use cases, test cases, and code I will need to develop. I have also taken up the task of finding out why Tomcat doesn't find the same security credentials that were find for command line operation, as that hurdle will need to be crossed eventually for happy JSP-based items pulling data from the grid.

Cheers, I hope everyone has a wonderful weekend!

Creation of a local repository

After the meeting with Brian, Dan and Peter, I was assigned to figure out the way where we can create a local repository so that anybody can download the code and give one command which will build the project automatically. I got the latest and the greatest copy from the subversion, and a list of jars to be put in the local repo. So next step is to find out how I can direct maven to look into our repository and once done, maybe another blog entry will follow with directions of how to build the project. Good enough material to think about over the weekend.

MyProxy / PURSE moved to lab 1001

The Myproxy installation has been moved to Lab 1001. During the installation process I noticed the $GLOBUS_LOCATION/share/myproxy/ required an entry of 7512/TCP in the /etc/services file. This port was not on the requested list of opened ports for lab 1002. MyProxy/PURSE will be evaluated on the internal grid before it is tested on PHgrid.

Thursday, June 5, 2008

Programming Playground Rules

The Lab Team (Dan, Peter, Anurag) met this afternoon to discuss some development methodology as we begin to develop more code and scripts to support the PoC Projects. The idea is to stay agile and lightweight enough to continue the rate of innovation, but to add some consistency that will make it easier to scale.

Some recommendations that we've made and decided to follow are:

  • Use Sourceforge Subversion site to store changes - All changes are checked into the Subversion site on the sourceforge project. Only security related changes (passwords, etc) are not checked in, and these are factored out into a property or configuration file.

  • Everything builds - Maven is used to build (compile and package) and deploy changes. This will allow for changes to be made on different desktops without spending time trying to manually configure a new environment. This also means that if the build breaks then the developer who broke it needs to fix it as soon as possible.

  • Follow the Sun Code Conventions for Java. These are old, but still applicable for what we're trying to do. We picked this sort of as a default and this can change based on feedback and practice, but we want some uniform style to the project.

  • Document code- document code to improve clarity. Use javadocs where necessary. This is not documentation for documentation's sake, but enough so that someone coming across the code will be able to follow. This doesn't absolve a programmer from clean, concise coding but should improve on the clarity of a source file.

  • Write a Use Case first- the first step to development should be to make a post on this blog describing the use case. Nothing too formal, just a description of the steps, who will perform them, alternate flows and error handing. This will provide a way to capture the requirements before any code is written.

  • Next write a Test Case before coding- after the use case is posted and there's some sort of agreement, then a test case is written while the coding is performed. This speeds up development by institutionalizing testing in a standard manner.

  • Each project will have a separate folder in the sourceforge root with a Maven pom.xml file for the build (including dependencies).

We'll add/subtract to this list as we get going. Once we're confortable with them, we can put them in a proper document and link to them from the left navigation bar.

Tagging Posts by Proof of Concept Project

During today's PHGrid call, someone suggested (I cannot remember who precicely) that we add tags for each of the Proof of Concept Projects running through the summer. This will allow a reader to quickly filter and sort posts by the area they are interested in. Some posters are already tagging their writings to allow for this. I want to propose the following tags that writers will assign whenever they write regarding a PoC:

  • Grid PoC Phase II PoC- CoE PoC

  • Secure Reliable Grid Messaging- SRGM PoC

  • Leveraging Grid to Enhance Biosurveillance Capacity - GBC PoC

I've linked to the searches in the list above. In the future you can click on these tags to search for posts specific to a PoC project.

HealthGrid 2008 Papers

The HealthGrid 2008 Programme site has links to the related papers. You can reach this site at:

The conference organizers have said they will post the conference presentations in the near future. When they release the presentations, I'll make a post here.

Programming Java Services

I came across the "Globus Toolkit 4: Programming Java Services" book while searching for Globus deployment best practices information. It appears that this book is a great reference for building Globus Web services.

A preview of this book is available on The preview allows you get a general idea on the contents of each chapter. Click here to view the preview.

There is also a book companion site, that has code and commands available for cutting and pasting.

Wednesday, June 4, 2008

Maven and it's lovely repository

So, I spent a good part of the day paring down how many libraries my example client code actually needed from globus and ogsadai respectively. The list dropped from 40 to about 8.

Then I spent a lot of time putting one of those 8 into the local maven repository in a way that maven seems to recognize... and adjusting the pom file to point to it as a dependency. I hit "mvn package" (ie. compile) and I got a long, thoughtful message about how the dependency wasn't found and how I can add a file with the helpful "mvn install:install-file" command. They also had a "mvn deploy" command for deploying code to a commonly owned repository... showing that deploying to a central and accessible location is

Then proceeded to add all 8 files in about 30 minutes, along with dependencies to the pom file, and then ran mvn compile successfully.

I am really beginning to like maven!

Tomorrow, I am going to put some of the interface and repository plans for the RODS<->OD interface to paper... perhaps start some of the code for simple things like resource discovery. Then, I will test to see whether the jar produced runs properly when put into the proper environment... then I will start reading up on where to put the JSP files. I will also make an eclipse project file and import it into eclipse.

There is also a meeting to discuss package layouts and plans and the like. I am looking forward to it.

Some code review

Got the code down from subversion and set it up in eclipse and started reviewing it. Used tortoise to do so, I am going to miss it when I finally reach the CDC lab because tortoise is windows based only. Anyhow, looking forward to meeting Peter and Brian tomorrow to discuss where we are and also to get a perspective on what Peter is doing so that we are not banging our heads on the same wall. I guess we can pick 2 different walls to bang our heads on after tomorrow's meeting.

Daily Lab / POC Activities


  • Reviewed Portal-based User Registration for grids document. (Purse)
  • Reviewed Myproxy Admin guide
  • Updated the myproxy-server.config file on lab1002
  • Moved a PURSE/MyProxy installation on lab1001 for internal grid testing
  • Tarrent County node back online

The next important question: what is the Public Health Grid?

Forgive me technologists on this blog, because I am going to abuse terms that hold specific meaning in your world. With that, another important question -- what do we mean when we say "Public Health Grid"?

To some, it's the specific technologies. To others, it's a conceptual technical architectures. To me, it's the business of public health.

For us to develop a technical framework that meets the needs of the public health community, we need to understand the public health community, and how it works.

In my mind the Public Health Community is already set as distributed social relationships and funding mechanisms. A gaping hole is the access to information and the supporting information framework to support that distribution. This is the potential value of grid technologies -- they are the framework that fits. We just need to align them to the reality (and perhaps work on realigning some of the details of the social and funding frameworks, without altering the mission).

So, as a non-techie, in my world, the Public Health Grid is:

1. The Social Network of Public Health. This includes public health departments, clinical partners, academics, and industry. Note: Epidemiologists are one very small -- albeit important -- part of the that social network.

2. The Funding Network. This is what pays for the Social Network to exist. One open question is it aligned to where it needs to be?

3. The Technical Network. What we are trying to build now IS realigning the legacy to better meet the Social Network.

Bear in mind, the numerical order of these is intentional. Work needs to be done in all three, but we are at a point where the potential of 3 can be explored, but at its maximum when 1 & 2 are understood.

Is this the real Public Health Use Case?

The HealthGrid conference has been very good for me to stew on a number of important questions spawned from my favorite: "so what?"

What struck me more than anything in listening to the various folks from the clinical, research, bioinformatics worlds (the primary attendees of this conference) is that the real problem they are trying to solve is to get access to data from trustworthy sources so that each can do their job. So, in the world of public health, would it follow that the use case that matters most can be generalized as "Data Access".

If we solve that -- and then use the public health programs (biosurveillance, reportable disease surveillance) as the jargon to describe that -- do we solve 80% of the problem?

2008 HealthGrid Conference (Chicago)

Met twice with Ian Foster, Jonathan Silverstein and their Argonne Globus colleagues. Discussed public health use cases. We also had an excellent meeting with the following Globus team members:

  • Ravi Madduri, closely involved in caBIG
  • Raj Kettimuthu, GridFTP project lead
  • Frank Siebenlist, security architect
Everyone was extremely generous with their time and support for CDC's nascent public health grid community.

It is obvious that working directly with the Globus team will give us a much higher chance of success in achieving our vision of a national public health grid.

Tuesday, June 3, 2008

SVN and Maven

I have checked in most of my code into the Sourceforge SVN, and I just got the note that we should be using the package gov.cdc.ncphi..... and not org.cdc...

I guess I get to learn about subversion directory changes next :).

Otherwise, I have hit a challenge on the maven-ifcation of the code. Maven prefers jar files stored in remote repositories somewhere. All the OGSA-DAI and Globus code is set up with a series of build files (especially globus considering it has a lot of C++ code in addition to Java code) but there is no maven repository. I experimented with moving jar files into a resource location, but I think I am just going to have to store the jar files in the local repository with the appropriate maven metadata. I am not sure of the best way to do that, whether there is some sort of tool, or whether I should try and get maven building the OGSA-DAI code.

I guess what I am looking for now is a resource for taking projects that already exist and getting it to a point where maven is now managing it, because a lot of the stuff that was handled by changing the classpath before now is going to be handled by formatting the myriad dependencies into an xml file.

I feel like in the end we will have the local repository and we will be able to publish it to our own repository so that other development groups can reference our files with the appropriate sets of Ogsa-Dai libraries and globus libraries.

I also need to start researching how to implement a globus WS extension and looking into the best ways to get data back and forth between RODS and Ogsa-Dai.

medLEE grid service

I met some of the grid architects from Ohio State University who were also attending the HealthGrid 2008 conference. OSU is working on grid services involving Columbia's Natural Language Processor, medLEE. I scheduled a call for next week to talk with some of the OSU team working with medLEE.

Since this is one of the activities we're working on for the Center of Excellence proof of concept project, it will be helpful to see what Ohio State's perspective is.

Medicus Install on Lab 1002

The attempt to build Medicus is failing due to the following missing project targets:

  • Target `install' does not exist in this project.
  • Target `deployService' does not exist in this project.
  • Target `deployResource' does not exist in this project.
  • Target `exposeResource' does not exist in this project.

I am currently researching the issue and searching for additional documentation.

Monday, June 2, 2008

High-Yield Public Health-Related Grid Services-Food for Thought

Had a very productive meeting with my colleagues today- around defining a handful of very high-yield and valuable services to be added to the core Globus services - which might be included with each Grid node.  

The conclusion was the development of 3 very elegant, distinct and lightweight services that would be able to perform the following - given successful node installation:

1.  Alerting to all (or a subset) of the members of the grid community.  To improve the granularity of the notification, this service could leverage some form of the existing PHIN Directory developed by CDC.

2.  Instant Messaging / Grid Chat - for those who want rapid communication / collaboration functionality between members of the PH Grid community.

3.  A vocabulary service to facilitate the rapid standardization of local data sets, to facilitate data sharing and integration.  As the use case for this service is refined, it is our hope that it can remain a very streamlined and lightweight service.

Given the availability of these services, a Grid portal must be created as well, to provide a user interface for all nodes/users to turn on/off the activity of these services.  

Of course, all these services would only be available to those with digital certificates.    And yes, the scaling of the digital certificate issues must be addressed, regardless of the other issues brought up in this post.

Look forward to further discussion on this.

Also working with Maven

Having just read the post below mine, I see that Anurag and I are on the same track.

I have created a "ncphi-examples" project to store all the little scripts and pages I have created already, and I now have moved in one of the peices of my modified OGSA-DAI code and already have it throwing compilation errors. I will probably fire off a quick email to Anurag about how best to import the OGSA-DAI code... at this point I am leaning towards just including it in the source tree, so that it will compile it locally and bundle it all into one big OGSA-DAI jar along with NCPHI-Specific modifications. By the end I imagine this particular Maven project will have the OGSA-DAI code, references to the jar repositories needed for builds provided by the maven site, and all the web code too.

As I play with Maven more and more, I find it perfect for just forcing you to have a sensible, realistic build structure. It will make you put your code in one place, your resources in another place, and show you the beauty of unit tests. In the end you get the deployable Jars and Wars that just make life that much easier... and I get the impression that your work will just be that much more legitimate when you go "oh, just install this maven tool, sync to the SVN, and then run 'mvn package' and you should be able to verify the compilation."

If anything this initial build will be a wonderful starting package for any future major NCPHI Globus OGSA-DAI code collaborations, and that is fortuitous considering I am also looking into defining the interface between RODS and OGSA-DAI for a outbreak detection solution that would use remote database access. Some of the interesting use of the extensible functionality that I can think of adding include the ability to deploy OGSA-DAI resources on the fly after filling out a simple form.

Otherwise, my vacation was lovely and personally productive, and it seems that a lot of documentation was completed after memorial day, but this gives me new solid directions and a lot of excitement.

Starting with maven

Worked on a sample project to illustrate how we can move towards a structured project which can be compiled with maven. Hopefully this will help in standardizing the application a little. Still have to meet Brian and Peter to discuss and finalize the project structure as I have tried to make it as general as possible. Once its done, it should become as simple as dropping your jsp or java file in the designated folder and compiling the whole app. in one go without worrying about the dependencies. Also, have to decide on the local repository url where we can store the jars relevant to our projects.

For more information on Maven:

NPCHI GRID Research: Value to DiSTRIBuTE

I presented to members of the DiSTRIBuTE initiative at the Markle Foundation in NYC.  Key points that were made during the presentation:

Steps to join the PHI Research Grid:

Installation of Globus Toolkit (Software) on Linux Operating System (Installation on Operating System Virtual

Machine (e.g., VMWare)).  Required time- 30-45 Minutes

NCPHI Digital Certificate installation

Opening specific ports on existing firewall  (24 hours - 24 days - depending on existing local policy)

Validate connection via Grid FTP

(ta da!)

Advantages of the grid:  

Non-centralization of data

Data can remain on local node

Grid FTP (multipoint)

Access to non-centralized Grid services:  

All nodes have the capability to run distributed analytics (customized) on demand

e.g., decentralize and open a DiSTRIBuTE analytics service 

Advantages in eventually leveraging Grid:

Implementation of Additional Use Cases

Extensibility & Flexibility

Can leverage the use of Intelligent Agents

Afford new degrees of redundancy

In DiSTRIBuTE connecting to overall public health grid

Can put DiSTRIBuTE services on overall grid

Can leverage other services and data on overall grid for DiSTRIBuTE 

Can develop a “DiSTRIBuTE summary data processing service” on grid

Significantly accelerate the growth of the DiSTRIBuTE network