Thursday, December 20, 2007

Daily Lab / POC Activities

  • Successfully annotated test UML model and validated against Semantic Integration Workbench (without errors)
  • Sent EAP & XMI files to SCI for submission to caBIG integration group for manual validation
  • Hopefully, they'll be able to expedite the process


  • Status call
  • University of Pittsburgh to installed VMWare on Tarrant County node computer and plans to install SUSE within the VM today
  • Tarrant County place HL7 files on their node computer
  • CDC successfully connected via VPN to Tarrant County node computer
  • NCPHI grid node operational in lab
  • NCPHI lab firewall changes have been submitted to lab management
  • Grid node firewall requirements posted to this blog in order that other nodes may request similar changes to their networked firewalls
  • OGSA-DAI call scheduled for first or second week in January, 2008, to discuss federating OpenMRS data model for public health use
  • Two hours scheduled for Thursday, January 4th status call to walk through node setup on Tarrant County grid node
  • We should be able to begin exchanging HL7 files the second week in January

Wednesday, December 19, 2007

Daily Lab / POC Activities

  • Annotated test Enterprise Architect model
  • Had errors attempting to run it through caAdapter & Semantic Integration Workbench
  • Plan to call help desk and revisit tomorrow
  • Successfully connected to Tarrant County VPN then used Windows Remote Desktop to connect to their soon-to-be-build grid node
  • Ken plans to install the Tarrant County grid node online through WebEx so that everybody can see how it's done
  • Rescheduled development meeting to first week in January, 2008
  • Realized that we were all looking at the OpenMRS software backwards. Decided to federate a view into the OpenMRS data model so that anybody may query multiple OpenMRS systems in a distributed fashion. This is more in line with original OGSA-DAI design principles. After this we'll look at gridifying OpenMRS.
  • Reported both PoC's progress to CDC PMO lead
  • Attended meeting to discuss new Diginet PoC in lab

Tuesday, December 18, 2007

Node Firewall Configuration

NAT from firewall/router to x.x.x.x all incoming traffic as specified below:
  • Open up port 2811/tcp (for GridFTP control channel connection) for incoming connections to x.x.x.x
  • Open up the range of ports 50000-51000 (for GridFTP data channel connections) for incoming connections to x.x.x.x
  • Open up port 22/tcp (for GSI-Enabled SSH) for incoming connections to x.x.x.x
  • Open up port 7512/tcp (for GridMyProxy) for incoming connections to x.x.x.x
  • Open up port 2119/tcp (for GRAM) for incoming connections to x.x.x.x
  • Open up port 2135/tcp (for MDS) for incoming connections to x.x.x.x

x.x.x.x = node server ip address behind firewall


The first grid service to be tested on the extramural research grid is GridFTP and the ability to move files between each node.

The document being used to setup GridFTP on the NCPHI node, including firewall restrictions, is located here.

Monday, December 17, 2007

Daily Lab / POC Activities

  • Requested & received user certificate under 'ken' user name
  • Setup time on Tuesday morning to go over VPN access to Tarrant County


  • Attended Semantic Integration Workbench training
  • Attended caCORE SDK training
  • Patel, Satish (NIH/NCI) [C] [] recommended the following:

Use SDK 4.0 version if you are planning to use it for a longer term. If you intend to use the SDK with grid in shorter timeframe then you would have to use SDK 3.2.1. You can create grid service 1.1 from it.

SDK is available for download here
SDK documentation is available here

Friday, December 14, 2007

OGSA-DAI Conference Call (Upcoming)

Conference call Wednesday, next week, to discuss potential use cases for their distributed data model use case and to offer any help to the OGSA-DAI team as they code to grid-enable OpenMRS. The distributed data model is their stronger, more extensible use case.

Please let Ken know if you want to participate.

VM Report

The last few days this week the network interface within the VM disappeared. The first time this happened a simple stop and restart of the VM fixed the problem. The second time the network adapter had to be re-created for the VM.

  • Is this a known bug in VMWare? It is being investigated by DISS.
  • How stable is the VM for a future production environment of grid? Further research needs to occur here, especially since the model of building a public health grid using a VM node environment has many advantages.

Wednesday, December 12, 2007

Accidental Interesting Find - Grid Australia

Link here. Potentially useful resources - to be added to stack.

Daily Lab / POC Activities

  • Received certificate instruction from Laura
  • Ran the following commands successfully:

$GLOBUS_LOCATION/sbin/gpt-build globus_simple_ca_ae084d2a_setup-0.19.tar.gz


$GLOBUS_LOCATION/setup/globus_simple_ca_ae084d2a_setup/setup-gsi -default

$GLOBUS_LOCATION/bin/grid-cert-request -host 'hostname'

  • Used unique domain name for grid node (not for public consumption)
  • This command created host certificate file
  • Mailed host public key to
  • Awaiting certificate from HealthGrid.US


  • This command created client certificate file
  • Mailed client public key to
  • Should have used account on server ('ken', for example) instead of 'root'
  • Will redo this Monday, send public key to address then await client certificate
Discussed differences between client and host certificates with Laura Pearlman

To do Monday

  • Dave at Tarrant to send vpn and remote terminal instructions to be able to connect to their server
  • Connect to Tarrant server, download sample HL7 file. Stephan is ready to "load" file into Medicus
  • Create client certificate request and email to HealthGrid folks
Conference call

  • Decided to walk through grid creation process using webex & conference call so that all can watch and ask questions
  • Ken will do this on Tarrant County server to create their node in a VM
  • Jeremy will watch, take notes and ask questions then follow procedure to create his own node
  • Ken will also make VM image of NCPHI node, ship to Jeremy to try that method of node creation

  • Exported XMI file into caAdapter
  • Created mapping in caAdapter
  • Imported mapped XML back into EA
  • Exported into new combined XMI file
  • We are ready to use SDK to create test application
  • Set up two conference calls for training on how to use SDK (Monday @ 10 AM & 3 PM)
Next Steps

  • Load caCore SDK in lab
  • Train on SDK
  • Create test tables in Oracle

Monday, December 10, 2007

Daily Lab / POC Activities


  • Reviewed & validated process to create model to run through caCORE SDK with Raghu Narasimha & Scott Halpine (SCI)
  • Next steps: 1) Conference call with Claire to double validate the process, 2) Load Oracle & SDK on lab machine, 3) Create tables then 4) run model through SDK to create test application
  • Enrolled at NCI web site to take Metadata Curator - Using UML Models training
  • Sent questions to Scott to send to Claire prior to conference call


  • Discussed Globus / Medicus / HealthGrid Live connectivity & installation strategy with Laura Pearlman and decided to get base Globus working with GridFTP followed by RLS (to set logical names for files) followed by metadata catalog.
  • Will run "gridcertrequest" to request certificate from HealthGrid.US Live
  • Note: host certificates are not encrypted while user certificates are
  • Plan to simultaneously look at client version of Medicus on Mac using OSRIRX client
  • Laura to send links for downloads

Ally Hume's Summary of last week's OGSA-DAI Discussion

We plan to try to use OGSA-DAIs schema mapping to map a database in some schema X to look like a database in the schema used by OpenMRS. One possible suggestion for X is a Harvard system and Jeremy is looking into this. From our point of view it could be great to get this schema, some data in this schema and hints as to how this would map to the HL7 schema used by OpenMRS.

You may read the complete conference call notes at this link.

Friday, December 7, 2007

Daily Lab / POC Activities

  • Stephan uploaded latest version of Medicus to FTP server
  • Stephan asked Laura Pearlman to help us connect to HealthGrid Live Certificate Authority Server
  • Laura provided the following information: 1. Load Medicus, 2. Run GridCertRequest to generate certificate, 3. Make sure email address points to, 4. Each grid node should have it's own certificate, 5. Each grid client should have it's own certificate, 5. Grid map file contains connection/naming information of each grid node server
  • Conference call with OGSA-DAI, U. of Pittsburgh, Tarrant County and two CDC EA team members to discuss potential Public Health use cases.

Tuesday, December 4, 2007

Daily Lab / POC Activities


  • Installed NFS server on NCPHI grid node
  • Attached Dallas County grid node to NCHPI grid node through NFS client to access Globus installation files (removes download step)


  • Conference call with SCI to map out next steps in process. Finalized test model for proof-of-concept.
  • Downloaded and installed on office desktop caAdapter (Windows version) from NCI web site
  • Next steps (Load EA test model into caAdapter. Map object model to data model map specification. Import result back into EA. Re-export. Prepare questions for Claire Wolfe, expert on semantic integration at NCI. Take caCORE training for the Metadata Curator - Using UML Models curriculum in order to give them a background on the EVS and caDSR.
  • Conference call with Claire at end of week.
  • Recommendation to use 3.2.1 caCore SDK when developing "application"

Saturday, December 1, 2007

OGSA-DAI & Public Health Data Grid Architecture

Discussed potential architectural models for data grids for public health and existing applications with EA (Moses & Brian). Forwarded security and data taxonomy articles to them and collectively decided to engage the OGSA-DAI folks in England to drill down into deeper detail.

Sent email to Ally Hume to schedule a conference call.

Ally's Contact information
Software Architect
EPCC, The University of Edinburgh
Tel: +44 131 651 3397

Interesting Grid Software Stack

Tom found an interesting grid software stack which claims to have a quick installation pathway here

It is self-described as a computational grid but all grids are computational grids at its heart.

Friday, November 30, 2007

Daily Lab / POC Activities

  • Requested lab team to install Oracle on one Linux VM
  • Tom & caBig team created preliminary data model to test
  • Set up meeting with Austin Kreisler (HL7 / Data Modeler) to verify test data model
  • Set up meeting the caBig team to review EA data model
  • Set up meeting with caBIG vocabulary team to validate data model against caBig vocabulary


  • Globus install complete
  • Next step: connect to HealthGrid security domain, then build other nodes

Thursday, November 29, 2007

Daily Lab / POC Activities

  • Error in attempt to copy SUSE install disks to hard drive (/home/labuser/SUSE). VM locked up after second disk copy. Deleted Disk01 and abandoned attempt
  • Internet access in VM not working. Informed IT team

Wednesday, November 28, 2007

Presentation from caBIG

Click here for presentation notes

Key points:
Make sure to use SDK vs. 3.2.1
For the DB – make sure to select either Oracle or mySQL

Essentially the steps include creating a domain model/object model
Then creating the data model (make sure to drag from foreign to primary to build connections)
Then map those two models together using a tool called caAdapter
In order to prep for seed after you need to export your models into XMI
The key is to make sure that you export at the highest level of the logical view
And make sure that you only have one check box marked in the upper right hand section
Once you run caAdapter, it will modify your XMI file
And then export it to then run the caCORE SDK
The caCORE SDK will generate the web front end to your data service

There are other steps to gridify this data service however there not required for this proof of concept

Conceptually, if everything above was step one, steps 2, 3 and 4 – involve the “semantic integration workbench” the UML Loader (this step registers the xmi with the caDSR), and the caCORE SDK.

A tool called “introduce” can gridify the application

You can access the SDK 3.2.1 presentation from following link.
SDK 3.2.1 Presentation:

The caAdapter tool that had mentioned earlier as being the webtool is actually a standalone application for version compatible with SDK 3.2.1. Their web version should get released in two to three weeks. You can download their standalone version from


Tuesday, November 27, 2007

Daily Lab / POC Activities


  • Talked to Scott Halpine & team at Software Consultants about next steps
  • His team and I have played with EA tool
  • His team will set up conference call this week with NCBI folks to help build UML model for PoC
  • Essentially, we need to understand which EVS attributes to use in model creation
  • Scheduled Enterprise Architect tool demonstration to go over the creation of our UML model [See next post]
  • Successfully loaded production version of EA tool with key (no longer under 30-day trial period)
Contact Information
Scott A. Halpine
Software Consultants, Inc.
4601 Presidents Drive, Suite 250
Lanham, MD, 20706
Tel: 301-306-5104
Fax: 240-260-0197
Cell: 301-996-3077

Raghu Narasimha ( 301-306-5103 (O)) & 240-355-4799 (C)) provided the following links for reference:

Here’s that link to NCI EVS – caCore 3.2 sequence diagrams and logical view.

This second link downloads an EA format.

This third link downloads another caCore component.

Click here for UML Example

Monday, November 26, 2007

Daily Lab / POC Activities

Continuation of prerequisite software installation for Globus

  • Began to install Ant
  • Ant installation failed because C compiler and associated dependencies were not installed in original SUSE install
  • Had initial difficulty installing C compiler, etc.
  • Figured out how to install software from CDs
  • Completed compiler installation
  • Plan to copy all 4 SUSE installation disks onto hard drive for future potential software needs

Friday, November 23, 2007

Daily Lab / POC Activities

Further setup of Globus environment
Prior to installation

  • Discussed installation pathway with Nigel at Argonne
  • Successfully installed jdk1.5.0.14
  • Downloaded apache-ant-1.6.5 and dpkg-1.14.7
  • Globus build can take up to 4 hours; if longer then something is wrong

Tuesday, November 20, 2007

Update to older blog item - with link to pdf document

Put up link to excellent Data Grid Taxonomy manuscript - in older blog item here.

Daily Lab / POC Activities

At AMIA in Chicago last week Jonathan Silverstein, president of HealthGrid. US Alliance, recommended that we tap Nigel Rarsad Nigel is a few weeks ahead of us in setting up Globus and may be a good technical resource going forward.

Ken email Nigel this morning.

This afternoon, spoke to Nigel via mobile phone (773) 524-1681.

Recommended installing Java 1.5.14 and Ant 1.6.5 into /usr/local directory on SUSE and all should go well. Also recommended using quickstart document.

Interesting Sites/Books/Concepts/Etc

X.509 Certificate & Grid

By Stephan Erberich

X.509 do not use IP information. The cryptographic key of the certificate authenticates the sender being part of the same security domain. That's why you can logon to your bank account from any computer because it uses single sign on via ssl and cert creation bases on your username/passwd.

In your case healthgrid security domain will issue you an end entityX.509 certificate which you can deposit in a cert bank, the myproxy service (also on healthgrid). Then you can checkout proxy certificates of limited lifetime (e.g. tokens) to authenticate yourself or a MEDICUS services to access the healthgrid resources.

One more word on this: Authentication is not equal authorization. Thus knowing that a service request comes from "Ken Hall", does not mean that you will gain access to all services. This will depend on your role in the Grid. Actually roles becoming obsolete with SAML assertions which use attributes instead of roles.

Now what you need in order to provide services on a server is a host certificate which allow the host to validate the authenticity of the user cert presented to services running on the host. These certificates use the FQHN as distinguished name (DN) in the certificate. Thus a host cert is not bound to an IP address, but to DNS FQHN.

Hope this untangled the matter a bit.

Monday, November 19, 2007


I am going to be heading the OGSA-DAI development team that will start to work to add the global schema to local schema mapping functionality into OGSA-DAI. To ensure we meet your requirements it would be great if you could get some data to design and test with.

Would it be possible for you to send me:
- a description of your global schema
- for two or three of your datasets:
* the local schema
* a sample of the data (just enough rows to cover various cases and experiment with)
* DBMS type (SQLServer, MySQL, Access, excel spreadsheet, flat file etc)
* a description of how the global schema should be mapped to this local schema

We plan to have some initial design sessions on this work in the week commencing 26 November. If it is possible to have some data before this it would really allow us to focus on your requirements.

I hope I am not asking for too much. Please let me know if this will be possible.



Ally Hume
Software Architect
EPCC, The University of Edinburgh
Tel: +44 131 651 3397

Daily Lab / POC Activities

  • Researched Biosurveillance Minimum Dataset
  • Researched caCore
  • Identified key documents within caCore
  • Confirmed that we must use existing vocabulary items in caDSR/caVocabulary (sp?) in our test UML model
  • Downloaded and installed EA tool (used predominantly by caBIG folks). Tool is a windows application so it was installed on Ken's CDC computer and is used for UML modelling. Unsure at this point how much information required by caCORE for UML model. Tool only useful for 30 days without license. SCI guys trying to attain license.


  • Testing two models for Globus installation
  • Model #1 uses VDT
  • Model #2 use the Globus installation binary
  • Also have questions out to Stephan, Victor and fellow at Argonne for best practice

Useful caBIG Resources


HITSP - Minimum Data Set for Biosurveillance

caCORE Tech Guide

caCORE: A semantically integrated bioinformatics software system

caDSR UML Modeler Browser

caDSR CDE Browser

Sunday, November 18, 2007

caBIG and OGSA - DAI

If needed, we may find it interesting to hear the caBIG perspective, why they chose not to use OGSA-DAI.

Excellent Data Grid Taxonomy Paper

I have shared the paper with Les and others in NCPHI. The link to the paper is here.


ACM Computing Surveys (CSUR) Volume 38 , Issue 1
Article No. 3
Year of Publication: 2006

Thursday, November 15, 2007

caBIG Software Correction

Scott (SCI) called to tell me that they misspoke about the EA software package that caBIG folks use. The software is called Enterprise Architect but it's sold by Sparcs (sp?). Mike Keller has software. Scott will talk to Mike about getting me a copy of the software so that I can begin creating our UML model.

Scott's telephone number is 301-306-5104.

Wednesday, November 14, 2007

Tom's thoughts back from AMIA

Tom's Tasks:

1. Need to write paper on Envisioning a Public Health Grid - using AMIG PHWG presentation
2. Need to read the stephan and Data grid papers
3. Learn a ton from OGSA DAI (data access integration)

Welcome to PHGrid Research Blog

We are working on building a national public health grid from the ground floor up. We have two proofs-of-concept:

  1. caBIG
  2. Tarrent County (TX), Dallas County (TX), CDC NCPHI Lab, University of Pittsburgh, Argonne National Laboratory, USC and SuraGrid

caBIG will test basic setup and establishment of node connection to caBIG

Tarrant County will test distributed data grid capability and grid services