Friday, May 30, 2008
Integration Architecture
As we develop these services we have to keep in mind how to create metadata to describe the function, use and test in addition to the standard service definitions of syntax, vocabulary and grammar.
We don't have anything yet, but the end goal of working with the three Proof of Concept projects (charters linked to on the left) is that we will have an architecture for service interoperability. This will be documented here as well as in the service metadata. Suggestions are welcome and as the group develops more, results will be posted.
Syndrome Definition Discussion (Food for Thought & Discussion)
HARVARD ESP Program / U. of Pittsburgh / NCPHI Lab
Brian Lee wrote,
I mentioned the Harvard ESP CoE project's need for syndrome definition to Jeremy Espino during our call this morning. Jeremy told me about an effort that Wendy Chapman, also from Pitt, is working on to develop syndrome definitions with participation across the US, Japan and Canada.
Jeremy explained that there can't really be one set of definitions for everyone to use. There is really a need for a definition repository and then programs can pick what they are using to define syndromes.
So it looks like there will be multiple choices that Ross Lazarus (Harvard) can pick from to classify his signs and symptoms into syndromes. I don't know how he can pick the "best" one, but there are others working on the same problem.
Sourceforge Project
If you go to this page, you will see very little as it is brand new. As the lab team develops software, it will be tracked and stored on this site until we can migrate over to dedicated infrastructure.
For developers out there, the (empty for now) SVN repository is at phgrid.svn.sourceforge.net.
To answer John Stinn's valid call for the "public health so what" of this post. This project gives us a location available to all the participants in the research grid to post, review and comment on software issues (bugs, change requests, enhancements), store related documents and directly collaborate over how best to build public health informatics (mission) related services on top of the grid connectivity in place. This project on sourceforge is also the first step along the "open source way" as all software on it is available via the Apache license.
Thursday, May 29, 2008
Future Medicus Node @ University of Chicago
...here is a little update. Dan, VDT's Globus install and your rapid certificate authorization worked as advertised. Really a fantastic time saver!
...as I mentioned to Dan, I have the medicus code and old install info...I pointed Dan to what I have. I've installed what I could it but now have to configure a gateway and test it.
...I am in the midst of setting up a PACS server on a machine in our lab. Once done, I want to be able to test Medicus's ability to do basic DIMSE-C services (e.g. query, store, and retrieve medical images on a basic level.
...If this succeeds (and I should know by tomorrow), I can test Medicus's SSP functionality between a PACS server and an SSP as well as a PACS client and an SSP utilizing DGIS. Actually, I'd just be happy with a DGIS Gateway but why stop now ....
...I need to go through these steps to better understand what Medicus can do. If I am reinventing the wheel, let me know - I need to reinvent it anyway for my own edification but still, any pointers are most appreciated!
Potential New Node & Preliminary Source Control
Dan's sending over the install guide to Craig so he can review it with his staff.
An interesting point is that since IHTC primarily uses Windows they may want to look at running a grid instance through a Linux image running in VMWare on top of a Windows machine.
Also, to save time I've applied for a sourceforge.net project for the software components that we will be developing shortly. This is a temporary measure until we can get a proper subversion, collaboration and issue tracking server in place that the public can hit. For now the license is Apache v2 (since that's what Globus is using). But as the CDC develops an official open source licensing policy, this project will migrate the source to licenses as appropriate.
I'll post the URL to the sourceforge project as soon as it is approved (assuming it is approved).
The Public Health Part of The Informatics Research Grid
And now I raise a challenge: with all the cool techie stuff that is going on, with all the innovation and collaboration we are fostering, with all the teaching and learning from each other, I still am not sure we have made the case that this is the right thing for Public Health (as an industry) or the Public's Health (as a population). "So what" is a question that we will soon have to answer - constantly.
I am optimistic that we can answer this challenge. The next PoC activities are much more programmatic in nature. We are going to investigate tools that could make public health information management as secure, reliable, and easier than is currently being done (via the PHIN MS PoC). We are going to examine how to connect instances of a widely used biosurveillance application to other biosurveillance tools so that state and local public health departments can maintain data for their purposes, but also share information efficiently across jurisdictions, borders, and political boundaries (the RODS PoC). Finally, we will investigate how services can serve other programmatic interests from integration / interoperability (via natural language processing) to specific public health case reporting (via specialized services). It is through these projects that "so what" becomes "ah -ha"!
I encourage blog posters - from developers to programmatic folks like myself -- to keep "so what" in mind, and to address that question specifically in each blog post, no matter how small or techically deep the post may be. What will the problem you are solving mean to public health? What does that tweak of the integration service or deployment script or presentation layer mean to a public health user or system administrator. Those are the people we need to help, and I know we can.
Getting Started
$ Got to know the project I will be working on
$ Started to read documentation about PHIN Messaging System
$ Brian told me Dan has already done some groundwork on this, so looking forward to picking his brain
Wednesday, May 28, 2008
Federated Query
Interestingly, the thread includes responses from both the OGSA-DAI and the caBIG teams.
More thoughts from the University of Utah (Ron Price)
Five Distributed Query Initatives (that we know of)
1) ONC use case
2) Barry Rhodes' experimentation with PostgreSQL
3) University of Utah's (NCPHI COE) use of DCQL (caBIG / Globus) component
4) OGSA-DAI (specifically, NCPHI lab's work with RODS & general federated query use case)
5) electronic Primary Care Research Network (ePCRN) at the University of Minnesota (lead by Dr. Peterson)
Daily Lab / POC Activities
- Issued the University of Chicago host and user certificates
- Worked with JHUAPL on troubleshooting GridFTP
- Work on Subversion configurations
- Tarrant node offline due to T1 issues
I heard an old Song today...
Sung to the Beatles Tune "Let it Be"
"When I find my Code has got in trouble, my Supervisor comes to me.
Speaking words of wisdom Write in C..."
I decided I should cover my Pickle with a different flavor of Jam. It turns out, the Globus Alliance foresaw a partial solution to My NLP grid project. Its called GT 4.0: C WS Core. So I will not be using gSoap or Axis C++ to Grid enable the Medley NLP. I will also explore using libxml2, a set of apache tools used to create web - services from C.
Once I have wrapped the API and deployed it as a web service, The WSDL should be callable in a myriad of ways, including technologies yet to be created.
I can't wait to see the documentation on the Medley API. If my suspicions prove correct, we should be able to create a web-service from the native API ( i was told it was written in C). I now have a sweeter jam and a larger pickle to spread it upon.
I gotta go download some more tools and start setting them up on my Fedora
I just love old songs don't you? They bring back memories.
"... and when my program is clobbered by sub-routines unknown to me,
I recite those words of wisdom, Write in C..."
Tuesday, May 27, 2008
Example-Land: More tasty than Jam on Pickles
To my delight, I learned/discovered web-services support included in the overarching architecture for Globus. (For this example, call this a Pickle). This suggested that we may be able to wrapper the Medley NLP API, hosted at Columbia with a WSDL using bindings created by an open source toolkit such as gSOAP or Axis C++, I need to point out to the newbies, that the C development language is a Subset of C++. (Henceforth call this Jam). Once we have a WSDL, we might be able to deploy Medley as a web-service to the Globus container. Once deployed we should be able to call the NLP in a variety of ways; Java/JSP, Another Web-Service, A LINUX/UNIX command line, etc
This week I intend to spread jam on pickles, and taste sweet success. Here is the baby "use case scenario"
The Goal: Submit a file of Free Text information, such as an admissions record residing on a remote hospital management system. Receive an HL7 or ICD9 encoded file in XML format on your local node/file-system.
Preconditions: A web-service enabled Globus container is installed on All nodes in the system, security privileges for the user have been setup to allow the operation. A set of free text admissions records in ASCII.
Primary Flow
1. User opens a browser.
2. System prompts user to specify a file to be scanned by Medley-NLP.
3. User selects the desired file from the local file system and submits it.
4. Remote Medley-NLP parses the file
5. Medley-NLP emits an encoded in XML format.
6. Globus transfers the file to the local file system.
7. The file is displayed in the Browser.
Alt. Flow 1.
A. In step 2, the user may select the encoding to receive, HL7 or ICD9
Alt Flow 2.
A. The user can select multiple files for NLP processing
Friday, May 23, 2008
Daily Lab / POC Activities
- Solved software dependency issues and installed Subversion-1.3.0-20, Subversion-tools-1.3.0-20, and Subversion-Server-1.3.0-20 on lab 1002.
- Reviewed the Medicus installation process that was sent by Nigel Parsad.
Reading The Fabulous Manual (RTFM)
Thursday, May 22, 2008
Notes from University of Utah (Thanks Ron Price)
http://www-unix.mcs.anl.gov/~neillm/ravi/
The GT4 book is the best reference I know on grid service development:
http://www.amazon.com/Globus%C2%AE-Toolkit-Programming-Services-Computing/dp/0123694043/ref=sr_1_4?ie=UTF8&s=books&qid=1211488358&sr=8-4
Here's some tools and comments on GSI management tools:
Also here is an interesting link: http://www.gridpma.org/
Here's the GT4 best practices for grid service development:
http://www.globus.org/toolkit/docs/4.0/best_practices.html
Daily Lab / POC Activities
We had a great meeting with Ron Price from Utah. We discussed some of the reasons he choose to use caGrid ( http://www.cagrid.org/wiki/CaGrid:Software:Release:1.0 ) over Globus. The main reason for choosing caGrid over Globus was that it removed some of the complexity when it came to creating grid services. The underlying technology of caGrid is Globus, but it comes with tools that aid in grid service development.
Ron currently has three grid nodes running caGrid. We discussed methods for connecting the two grids in order to run a grid to grid federated query. One way to do this is sharing our trusted CA information and the trusted CA hash file to the /etc/grid-security/certificates directory. This will be tested later.
We also discussed using PURSE for our certificate management. Ron mentioned this was a popular certificate management application used by other grids. I will continue researching PURSE for this purpose.
I also met with Nigel from the University of Chicago. He is interested in getting Medicus to work over the grid. I sent him a copy of our grid node installation document and CA setup information. He currently has a working Medicus installation that he was able to test with internal grid nodes. He agreed to send me his Medicus installation notes. He's going to get an installation update from his Medicus engineer in Peru and we will attempt to test Medicus over next week.
I am currently installing the software dependencies for Subversion 1.4.6-1 on lab 1002. Current software dependencies:
apr >= 1.2.7 is needed by subversion-1.4.6-1.i386
apr-util >= 1.2.7 is needed by subversion-1.4.6-1.i386
db4 >= 4.2.52 is needed by subversion-1.4.6-1.i386
libapr-1.so.0 is needed by subversion-1.4.6-1.i386
libaprutil-1.so.0 is needed by subversion-1.4.6-1.i386
libcrypto.so.6 is needed by subversion-1.4.6-1.i386
libexpat.so.0 is needed by subversion-1.4.6-1.i386
libneon.so.25 is needed by subversion-1.4.6-1.i386
libssl.so.6 is needed by subversion-1.4.6-1.i386
neon >= 0.25.5 is needed by subversion-1.4.6-1.i386
rtld(GNU_HASH) is needed by subversion-1.4.6-1.i386
Met with Anurag, talked with Ronald
We also talked with Ron Price from Utah who has been working with the grid for Quite Some Time, and has several caGRID nodes talking with each other and accessing different databases through the caGRID federated query processing engine. It seems that the FQP is simular to the DQP that OGSA-DAI is working-on. Thus, I anticipate a lot of discussion contrasting and comparing the two and seeing if pertinent differences can be applied to future generations.
Cheers,
Peter W.
Chat with University Of Utah
Wednesday, May 21, 2008
First Day Greet and setup
New people and new things.
One of the things that is being heavily discussed is setting up all the archetypes that will make this a Real Project. This means setting up some sort of code repository and build scripts for making something that can be deployed somewhere.
In preparation for all this, I have been researching maven... and honestly it's really neat how it will just build out a full project with two inputs... complete with test scripts and apache style documentation pages.
Internal Grid Node
New person / NLP Service Wrapper
To work with the Centers of Excellence, we had a call with Albert Lai of Columbia University on how to provide an interface to the Columbia Natural Language Processor (I think is called Medley).
We talked about some of the best ways to provide a wrapper to the existing NLP components and right now the thinking is to provide a web service wrapper through Globus that accepts medical free text (in either body text or files) that can invoke the NLP and respond with structured data in xml format. We'll post more as we start testing and designing more.
Also, Albert shared the URLs for two potential web service frameworks that we might be able to use in the test:
Tuesday, May 20, 2008
New developers and new codebases
Also, some new developers are going to be showing up in the coming days... and I look forward to sitting and chatting up all this nifty technology and getting some new perspectives and other ideas.
Monday, May 19, 2008
...About that secure access
Ferreting things out from a few logs... I get this error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.
For some reason, the handshake works just fine when run from the command line... but when invoked from within the tomcat container... the certification credentials cannot be found. And I am not familiar enough with how tomcat security prefers to work to know what to try next.
Some web searching suggested that the public key of the computer I am trying to connect-to be added to the key-store... but I couldn't figure out the keystore password... and furthermore it bothers me that I can run a client making secure connections from the command line when logged in as the globus user, but that tomcat running as the globus user cannot.
I caught Alastair a bit and asked him about it, and he alluded to adding the server to tomcat in some way, and also posed the issue that if we had to add servers, it would make it very difficult to keep track of all the nodes in the grid from the JSP client perspective. Thus, some authentication discovery schemes will probably need to be thought through... they will probably be similar to something like Grid-Proxy that Dan is researching. Perhaps some sort of Kerberos or LDAP schemes can be adapted here.
Otherwise, I got SVN working a bit better in eclipse, but I still want to look at intelliJ and see how well it deals with JSP projects and the like.
Friday, May 16, 2008
Secure access
My next step is to build it into a suite of JSP code that will actually manage things like grid-proxy creation and server list configuration... but it's neat that I can query databases on other computers via secure means using grid certificates. Now to just make easily deployable archives using Mavis.
Dr. Tom Saville also introduced us to another Grid developer who seemed to have bundled most of the Grid and OGSA-DAI code into a single deployable (InstallShield) archive. He also brought up a lot of the issues of potentially storing contented data when we were discussing distributed issues.
I tried to get things synced via SVN with RODS, and I don't think Eclipse likes the way SVN stores things... that or I have no idea the best way to pull in projects (very likely). I am going to look into IntelliJ IDEA because it seems to be much smoother about syncing with SVN Repositories. Otherwise, lots of ideas still spinning in my head.
Have a good weekend!
Daily Lab / POC Activities
MonALISA is unable to display node status when the client is launched from a CDC network connected desktop. The MonALISA client works fine outside the CDC's network. This is due to an outgoing port restriction on the CDC's firewall. This condition was verified on multiple CDC desktops.
Thursday, May 15, 2008
Daily Lab / POC Activities
Lab 1001 and 1003 are back online.
- Lab 1001 was unable to locate shared library files in the /opt/vdt/globus/lib/sasl directory. This error was experienced while trying to run the grid-proxy-init command. Globus was reinstalled to correct the issue.
- The Globus container on Lab 1003 was failing to start due to in secure mode due to Java Authentication errors. The issue was corrected by requesting and reissuing host certificates to the grid node.
Wednesday, May 14, 2008
Dr. Jeremy Espino and the wonderful RODS project
Furthermore, he created an SVN repository for grid-related code (whether it be related to RODS, Globus, OGSA-DAI, &c), showed me the wonders of Maven, let me see the prettiness that is IntelliJ IDEA, and generally got me really familiar with how RODS works and what peices he wants me to focus on.
The SVN repository is grand, because it gives me a place to put some of the code I have been working on... and more importantly will force me to clean it up to the point where I am not scared to have other people looking at it.
As for the POC and RODS... RODS has UI functionality with the ability to basically take counts associated with locational data and actually get it plotted to various geolocational displays (think Google Maps). Thus, it would be the perfect UI for any sort of grid monitoring system, and if we can make data plugins that pull from OGSA-DAI resources instead of the normal RODS database (which is populated over time by a listener that parses HL7 hospital data)... Needless to say I have had an explosion of ideas for how to orient grid nodes and how RODS code can be used. I look forward to discussing them.
Daily Lab / POC Activities
I am currently troubleshooting the Java connection errors that happen while starting the Globus Container on lab 1003. I've searched the Globus forums and found several references to these errors, but no solid leads on how to correct the problem.
The Globus Container starts fine if the -nosec (no security) parameter is used, but the Container fails to start in secure mode. Based on this symptom, I plan to reissue the host and user certificates in case something was corrupted during the OGSADAI database modifications.
Click here for log output!
Tuesday, May 13, 2008
To do: stop breaking globus' database
I also noticed that I tinkered with the MySQL databases running crucial parts of the globus software... I am not sure I managed to mess up the nodes, but it is quite likely I didn't help things.
Otherwise, it bought up a very important point, I should probably not be trying to use databases on the same server running globus, but databases running on separate VMs... as all the scenarios for phgrid deployment in my mind seem to focus on having the grid node running globus be separate from the database server(s).
Thus, Using separate VMs will better simulate what an actual grid will look like, and help us realize whether remote subnet access is something easy to configure... and better portray a wide range of databases and authentication schemes... since there won't be an ability to go "connect to the database on localhost" which implies a large amount of trust.
Monday, May 12, 2008
Daily Lab / POC Activities
- Upgraded Globus to 4.0.7 on lab 1001
- Configured WebMDS on lab 1005
- Researching: edg-mkgridmap - Process will update the gridmap file automatically for local and remote users. This may solve the issue of simultaneously updating the gridmap file for multiple grid nodes.
- Researching: GUMS (Grid User Management System) A grid identity mapping service that could be used to automatically map user accounts to grid resources. This could reduce the burden for user management.
OGSA-DAI location demonstrated, now for GLOBUS
Now, the next step is going to be discovering all of the grid resources that have OGSA-DAI resources... and for that I need to familiarize myself with Globus MDS and how to code for it, I might also look at MonALISA and see whether it will have better APIs for globus resource location. Before that, I am going to have to orient most of the resources that I have recently deployed through tomcat to work instead through the globus container and orient my JSPs to hit the globus-related URLs instead.
Another thing that has been brought up is how to integrate OGSA-DAI and globus on a larger scale... and one of the things that OGSA-DAI is purported to be able to do is essentially merge heterogeneous databases.... and that's not entirely the case.
I feel that what OGSA-DAI does is it will expose some metadata on, and enable access to, views or tables in different databases. The Views themselves will have to be homogenous... if they don't have the same schema then it will not be possible for databases to simply be plugged in.. at least not at first. Later generations might have data mapping capability, but for the case of polling systems and scanning algorythms everything is still going to need to be relatable to some master data schema.
Sunday, May 11, 2008
Scientific American - Science 2.0 -- Is Open Access Science the Future?
http://www.sciam.com/article.cfm?id=science-2-point-0
Interesting article that I saw while killing time in a book store. The work being done on the PH Research Grid is a very interesting case study on the use of Web 2.0 technologies in conducting research, much less applying those technologies to practice. Very interesting parallels between open source development, and the scientific process.
Thursday, May 8, 2008
Progress
Otherwise, after reading through the Ogsa-Dai documentaion more, there are files you can make that simplify the resource creation process, and it would be easy to write scripts and UI's that create/modify such files... so it appears that frustration has created some ideas for more useful applications.
Dan also helped point out that it was very likely a MySQL JDBC issue.
Wednesday, May 7, 2008
A killer app for OGSA-DAI
Perhaps I am using the wrong JDBC connector... as the version of MySQL on this machine (4.1.22 installed by VDT for RFT) is older than the more common version 5. Perhaps I am using the wrong version of the JDBC connector or am invoking it improperly. I have checked the resource files and the logons.txt files and haven't found anything out of spec there... I will probably just avoid using it tomorrow and instead focus on enhancing the JSP I've built rather than continue hammering on trying to get it to connect to this new resource.
Either way... a significant portion of my time has been hunting down configuration mishaps. Needless to say Ant build scripts run from command line with lots of variables are very dodgy... and it is very easy to make mistakes which have to be hunted down later. I really think a killer app for OGSA-DAI is going to be a easy-configuration engine with a connection checker... just so that it becomes much easier to make sure the ducks are all in a row for pulling data from a given view in a given database at a given medical center... otherwise we'll probably be sending Centers of Excellence 15-page debugging guides and dedicated a 20-head help-center to correcting the bugs I keep running into.
Tuesday, May 6, 2008
New Demo using RODS Data
New version of the demo using RODS data -
http://test.ogsadai.org.uk:8080/dai/zipcode-queryfile.jsp
Note:
The data has been split into two databases and these are queried and
the results are displayed to the user. The query is a drop down list
of conditions based on some that are available in the rods dataset.
-- great to see this demo in action!
Daily Lab / POC Activities
Successful grid test with Washington and Columbia
Columbia MonALISA installation completed
T1 installed in lab and lab servers have been reconfigured
Configured PGSQL OGSADAI databases on LAB 1005
Sent grid install instructions to the Mayo Clinic
Additional grid IP's requested for LAB 1001
Variable JSP
Ubuntu does not gracefully handle an IP change... at least not one that is rigged to run VDT installed Globus. The main issue seems to be with MySQL and PostgreSQL, which are configured strangely on ubuntu... and are causing all sorts of connection havoc.
But, I moved a bunch of the code I was working on to a Suse server, got the JSP running again, and now have the ability to pass in different queries. My next set of plans is to get the OpenMRS dataset onto the Suse server and then allow for different pulls from different databases... after that I want to implement a resource discovery subroutine that will work to populate the available resources of a given OGSA-DAI instance.
Thursday, May 1, 2008
Ian Foster and PHGrid
Our team was contacted by Ian today, and we are honored by his support and look forward to globus grid discussions with him in the future.
For more information, you can visit his personal blog.
That's one JSP down
After dynamic inputs, I will want to start the dynamic population of inputs... so that the client queries the grid to figure out the OD resources at a given node... and on a separate front, I will start looking at trying to dynamically select from multiple sources at once.
Another thing that was discussed today was how to implement distributed Natural Language Processing using Globus and it's tools. There was lots of discussions of Apache AXIS and OPAL and which might be a better "wrap this interface and make a set of WS code ready to deploy to a container" tool... One of the notable quotes brought up during this was that "Web services are pretty much the Rube Goldberg devices of the Internet."
Daily Lab / POC Activities
- We are currently having a T1 installed in the NCPHI lab. This is causing port issues on the previous NCPHI grid node in the 50000 to 51000 port range. The NCPHI grid node's internal and external IP address will be changing once the connection goes live. Chris is working on remapping the Globus ports to the new IP address. For the time being, we are asking the new sites to test GridFTP with Dallas instead of the NCPHI node.
- Columbia was able to perform a successful GridFTP file transfer with the Dallas node. Columbia was also able to receive data from NCPHI, but was not able to send data to NCPHI due to firewall issues on the NCPHI side.
- Washington was issued new certificates and attempted a GridFTP transfer with NCPHI that resulted in failure. (NCPHI firewall issues) I sent new test instructions to Washington so they can run a GridFTP test on the Dallas server. This test should happen sometime today.