Friday, November 30, 2007

Daily Lab / POC Activities

  • Requested lab team to install Oracle on one Linux VM
  • Tom & caBig team created preliminary data model to test
  • Set up meeting with Austin Kreisler (HL7 / Data Modeler) to verify test data model
  • Set up meeting the caBig team to review EA data model
  • Set up meeting with caBIG vocabulary team to validate data model against caBig vocabulary


  • Globus install complete
  • Next step: connect to HealthGrid security domain, then build other nodes

Thursday, November 29, 2007

Daily Lab / POC Activities

  • Error in attempt to copy SUSE install disks to hard drive (/home/labuser/SUSE). VM locked up after second disk copy. Deleted Disk01 and abandoned attempt
  • Internet access in VM not working. Informed IT team

Wednesday, November 28, 2007

Presentation from caBIG

Click here for presentation notes

Key points:
Make sure to use SDK vs. 3.2.1
For the DB – make sure to select either Oracle or mySQL

Essentially the steps include creating a domain model/object model
Then creating the data model (make sure to drag from foreign to primary to build connections)
Then map those two models together using a tool called caAdapter
In order to prep for seed after you need to export your models into XMI
The key is to make sure that you export at the highest level of the logical view
And make sure that you only have one check box marked in the upper right hand section
Once you run caAdapter, it will modify your XMI file
And then export it to then run the caCORE SDK
The caCORE SDK will generate the web front end to your data service

There are other steps to gridify this data service however there not required for this proof of concept

Conceptually, if everything above was step one, steps 2, 3 and 4 – involve the “semantic integration workbench” the UML Loader (this step registers the xmi with the caDSR), and the caCORE SDK.

A tool called “introduce” can gridify the application

You can access the SDK 3.2.1 presentation from following link.
SDK 3.2.1 Presentation:

The caAdapter tool that had mentioned earlier as being the webtool is actually a standalone application for version compatible with SDK 3.2.1. Their web version should get released in two to three weeks. You can download their standalone version from


Tuesday, November 27, 2007

Daily Lab / POC Activities


  • Talked to Scott Halpine & team at Software Consultants about next steps
  • His team and I have played with EA tool
  • His team will set up conference call this week with NCBI folks to help build UML model for PoC
  • Essentially, we need to understand which EVS attributes to use in model creation
  • Scheduled Enterprise Architect tool demonstration to go over the creation of our UML model [See next post]
  • Successfully loaded production version of EA tool with key (no longer under 30-day trial period)
Contact Information
Scott A. Halpine
Software Consultants, Inc.
4601 Presidents Drive, Suite 250
Lanham, MD, 20706
Tel: 301-306-5104
Fax: 240-260-0197
Cell: 301-996-3077

Raghu Narasimha ( 301-306-5103 (O)) & 240-355-4799 (C)) provided the following links for reference:

Here’s that link to NCI EVS – caCore 3.2 sequence diagrams and logical view.

This second link downloads an EA format.

This third link downloads another caCore component.

Click here for UML Example

Monday, November 26, 2007

Daily Lab / POC Activities

Continuation of prerequisite software installation for Globus

  • Began to install Ant
  • Ant installation failed because C compiler and associated dependencies were not installed in original SUSE install
  • Had initial difficulty installing C compiler, etc.
  • Figured out how to install software from CDs
  • Completed compiler installation
  • Plan to copy all 4 SUSE installation disks onto hard drive for future potential software needs

Friday, November 23, 2007

Daily Lab / POC Activities

Further setup of Globus environment
Prior to installation

  • Discussed installation pathway with Nigel at Argonne
  • Successfully installed jdk1.5.0.14
  • Downloaded apache-ant-1.6.5 and dpkg-1.14.7
  • Globus build can take up to 4 hours; if longer then something is wrong

Tuesday, November 20, 2007

Update to older blog item - with link to pdf document

Put up link to excellent Data Grid Taxonomy manuscript - in older blog item here.

Daily Lab / POC Activities

At AMIA in Chicago last week Jonathan Silverstein, president of HealthGrid. US Alliance, recommended that we tap Nigel Rarsad Nigel is a few weeks ahead of us in setting up Globus and may be a good technical resource going forward.

Ken email Nigel this morning.

This afternoon, spoke to Nigel via mobile phone (773) 524-1681.

Recommended installing Java 1.5.14 and Ant 1.6.5 into /usr/local directory on SUSE and all should go well. Also recommended using quickstart document.

Interesting Sites/Books/Concepts/Etc

X.509 Certificate & Grid

By Stephan Erberich

X.509 do not use IP information. The cryptographic key of the certificate authenticates the sender being part of the same security domain. That's why you can logon to your bank account from any computer because it uses single sign on via ssl and cert creation bases on your username/passwd.

In your case healthgrid security domain will issue you an end entityX.509 certificate which you can deposit in a cert bank, the myproxy service (also on healthgrid). Then you can checkout proxy certificates of limited lifetime (e.g. tokens) to authenticate yourself or a MEDICUS services to access the healthgrid resources.

One more word on this: Authentication is not equal authorization. Thus knowing that a service request comes from "Ken Hall", does not mean that you will gain access to all services. This will depend on your role in the Grid. Actually roles becoming obsolete with SAML assertions which use attributes instead of roles.

Now what you need in order to provide services on a server is a host certificate which allow the host to validate the authenticity of the user cert presented to services running on the host. These certificates use the FQHN as distinguished name (DN) in the certificate. Thus a host cert is not bound to an IP address, but to DNS FQHN.

Hope this untangled the matter a bit.

Monday, November 19, 2007


I am going to be heading the OGSA-DAI development team that will start to work to add the global schema to local schema mapping functionality into OGSA-DAI. To ensure we meet your requirements it would be great if you could get some data to design and test with.

Would it be possible for you to send me:
- a description of your global schema
- for two or three of your datasets:
* the local schema
* a sample of the data (just enough rows to cover various cases and experiment with)
* DBMS type (SQLServer, MySQL, Access, excel spreadsheet, flat file etc)
* a description of how the global schema should be mapped to this local schema

We plan to have some initial design sessions on this work in the week commencing 26 November. If it is possible to have some data before this it would really allow us to focus on your requirements.

I hope I am not asking for too much. Please let me know if this will be possible.



Ally Hume
Software Architect
EPCC, The University of Edinburgh
Tel: +44 131 651 3397

Daily Lab / POC Activities

  • Researched Biosurveillance Minimum Dataset
  • Researched caCore
  • Identified key documents within caCore
  • Confirmed that we must use existing vocabulary items in caDSR/caVocabulary (sp?) in our test UML model
  • Downloaded and installed EA tool (used predominantly by caBIG folks). Tool is a windows application so it was installed on Ken's CDC computer and is used for UML modelling. Unsure at this point how much information required by caCORE for UML model. Tool only useful for 30 days without license. SCI guys trying to attain license.


  • Testing two models for Globus installation
  • Model #1 uses VDT
  • Model #2 use the Globus installation binary
  • Also have questions out to Stephan, Victor and fellow at Argonne for best practice

Useful caBIG Resources


HITSP - Minimum Data Set for Biosurveillance

caCORE Tech Guide

caCORE: A semantically integrated bioinformatics software system

caDSR UML Modeler Browser

caDSR CDE Browser

Sunday, November 18, 2007

caBIG and OGSA - DAI

If needed, we may find it interesting to hear the caBIG perspective, why they chose not to use OGSA-DAI.

Excellent Data Grid Taxonomy Paper

I have shared the paper with Les and others in NCPHI. The link to the paper is here.


ACM Computing Surveys (CSUR) Volume 38 , Issue 1
Article No. 3
Year of Publication: 2006

Thursday, November 15, 2007

caBIG Software Correction

Scott (SCI) called to tell me that they misspoke about the EA software package that caBIG folks use. The software is called Enterprise Architect but it's sold by Sparcs (sp?). Mike Keller has software. Scott will talk to Mike about getting me a copy of the software so that I can begin creating our UML model.

Scott's telephone number is 301-306-5104.

Wednesday, November 14, 2007

Tom's thoughts back from AMIA

Tom's Tasks:

1. Need to write paper on Envisioning a Public Health Grid - using AMIG PHWG presentation
2. Need to read the stephan and Data grid papers
3. Learn a ton from OGSA DAI (data access integration)

Welcome to PHGrid Research Blog

We are working on building a national public health grid from the ground floor up. We have two proofs-of-concept:

  1. caBIG
  2. Tarrent County (TX), Dallas County (TX), CDC NCPHI Lab, University of Pittsburgh, Argonne National Laboratory, USC and SuraGrid

caBIG will test basic setup and establishment of node connection to caBIG

Tarrant County will test distributed data grid capability and grid services