Friday, May 30, 2008

Integration Architecture

Jeremy Espino and I had a call this morning where we talking about some of the software structure for the Proof of Concept projects. The summation is that across the projects, we'll be developing some new grid services: some as wrappers for existing functionality and some as new functionality.

As we develop these services we have to keep in mind how to create metadata to describe the function, use and test in addition to the standard service definitions of syntax, vocabulary and grammar.

We don't have anything yet, but the end goal of working with the three Proof of Concept projects (charters linked to on the left) is that we will have an architecture for service interoperability. This will be documented here as well as in the service metadata. Suggestions are welcome and as the group develops more, results will be posted.

Syndrome Definition Discussion (Food for Thought & Discussion)

HARVARD ESP Program / U. of Pittsburgh / NCPHI Lab

Brian Lee wrote,

I mentioned the Harvard ESP CoE project's need for syndrome definition to Jeremy Espino during our call this morning. Jeremy told me about an effort that Wendy Chapman, also from Pitt, is working on to develop syndrome definitions with participation across the US, Japan and Canada.

Jeremy explained that there can't really be one set of definitions for everyone to use. There is really a need for a definition repository and then programs can pick what they are using to define syndromes.

So it looks like there will be multiple choices that Ross Lazarus (Harvard) can pick from to classify his signs and symptoms into syndromes. I don't know how he can pick the "best" one, but there are others working on the same problem.

Sourceforge Project

The Public Health Informatics Research Grid now has a sourceforge project. It is available at: https://sourceforge.net/projects/phgrid/.

If you go to this page, you will see very little as it is brand new. As the lab team develops software, it will be tracked and stored on this site until we can migrate over to dedicated infrastructure.

For developers out there, the (empty for now) SVN repository is at phgrid.svn.sourceforge.net.

To answer John Stinn's valid call for the "public health so what" of this post. This project gives us a location available to all the participants in the research grid to post, review and comment on software issues (bugs, change requests, enhancements), store related documents and directly collaborate over how best to build public health informatics (mission) related services on top of the grid connectivity in place. This project on sourceforge is also the first step along the "open source way" as all software on it is available via the Apache license.

Thursday, May 29, 2008

Future Medicus Node @ University of Chicago

Nigel Parsad wrote,

...here is a little update. Dan, VDT's Globus install and your rapid certificate authorization worked as advertised. Really a fantastic time saver!

...as I mentioned to Dan, I have the medicus code and old install info...I pointed Dan to what I have. I've installed what I could it but now have to configure a gateway and test it.

...I am in the midst of setting up a PACS server on a machine in our lab. Once done, I want to be able to test Medicus's ability to do basic DIMSE-C services (e.g. query, store, and retrieve medical images on a basic level.

...If this succeeds (and I should know by tomorrow), I can test Medicus's SSP functionality between a PACS server and an SSP as well as a PACS client and an SSP utilizing DGIS. Actually, I'd just be happy with a DGIS Gateway but why stop now ....

...I need to go through these steps to better understand what Medicus can do. If I am reinventing the wheel, let me know - I need to reinvent it anyway for my own edification but still, any pointers are most appreciated!

Potential New Node & Preliminary Source Control

Dan and I had a productive call with Craig Haddix, the Data Coordinator with the Indiana Hemophilia & Thrombosis Center, about them connecting to the public health research grid. Craig was enthusiastic and Dan covered some of the requirements for starting the node install (linux server, specific ports to be opened, etc. etc.).

Dan's sending over the install guide to Craig so he can review it with his staff.

An interesting point is that since IHTC primarily uses Windows they may want to look at running a grid instance through a Linux image running in VMWare on top of a Windows machine.

Also, to save time I've applied for a sourceforge.net project for the software components that we will be developing shortly. This is a temporary measure until we can get a proper subversion, collaboration and issue tracking server in place that the public can hit. For now the license is Apache v2 (since that's what Globus is using). But as the CDC develops an official open source licensing policy, this project will migrate the source to licenses as appropriate.

I'll post the URL to the sourceforge project as soon as it is approved (assuming it is approved).

The Public Health Part of The Informatics Research Grid

Hello all --As an observer and occasional contributer to the blog, I have been fascinated the daily activities and impressed with the progress that has been made to this date. When this was dreamed up several months ago, my initial feeling was that we were going to prove that we were overly ambitious and somewhat crazy in the inital PoC's. While we all still may be crazy, I can say I see vast potential a Public Health Grid might have. To that end, I extend a "job well done" to the collective team.



And now I raise a challenge: with all the cool techie stuff that is going on, with all the innovation and collaboration we are fostering, with all the teaching and learning from each other, I still am not sure we have made the case that this is the right thing for Public Health (as an industry) or the Public's Health (as a population). "So what" is a question that we will soon have to answer - constantly.



I am optimistic that we can answer this challenge. The next PoC activities are much more programmatic in nature. We are going to investigate tools that could make public health information management as secure, reliable, and easier than is currently being done (via the PHIN MS PoC). We are going to examine how to connect instances of a widely used biosurveillance application to other biosurveillance tools so that state and local public health departments can maintain data for their purposes, but also share information efficiently across jurisdictions, borders, and political boundaries (the RODS PoC). Finally, we will investigate how services can serve other programmatic interests from integration / interoperability (via natural language processing) to specific public health case reporting (via specialized services). It is through these projects that "so what" becomes "ah -ha"!



I encourage blog posters - from developers to programmatic folks like myself -- to keep "so what" in mind, and to address that question specifically in each blog post, no matter how small or techically deep the post may be. What will the problem you are solving mean to public health? What does that tweak of the integration service or deployment script or presentation layer mean to a public health user or system administrator. Those are the people we need to help, and I know we can.

Getting Started

Finally, I got my hands on some documentation. Well..that would be wrong to say. I had gotten my hands on these docs earlier only but after the morning meeting with Brian, I got a clearer perspective on why am I reading, what am I reading. Sort of like Scalar vs Vector (going back to old physics days). Anyhow
$ Got to know the project I will be working on
$ Started to read documentation about PHIN Messaging System
$ Brian told me Dan has already done some groundwork on this, so looking forward to picking his brain

Wednesday, May 28, 2008

Federated Query

This link points to a discussion thread generated by Ron Price at the University of Utah. The discussion centers around a Federated query across OGSA-DAI and dcql_resources (caBIG).

Interestingly, the thread includes responses from both the OGSA-DAI and the caBIG teams.

More thoughts from the University of Utah (Ron Price)

Also, I meant to mention that for GSI management and implementation I plan on using GAARD and Dorain and the caGrid portal to offload a lot of the overhead of GSI. I plan on having users setup an account (get their certs) by using the caGrid production Dorian so that I don't have to deal with CA issues and user cert management (basically I'm making use of someone else's CA and when we join our grids I think it will just be a matter of putting the caGrid production Dorian public key in /etc/grid_security). I also plan on leveraging the caGrid portal ( the one that can be installed with caGrid) to provide a secure web application for the phGrid project. Lastly I think a person can use GAARDS separately from caGrid and I'm not sure about Dorian. Anyway, just wanted to share that with you and your team and I realize that the paragraph above may be hard to parse so please feel free to give me a call.

Five Distributed Query Initatives (that we know of)

As of today, we know of five distributed query initiatives:

1) ONC use case
2) Barry Rhodes' experimentation with PostgreSQL
3) University of Utah's (NCPHI COE) use of DCQL (caBIG / Globus) component
4) OGSA-DAI (specifically, NCPHI lab's work with RODS & general federated query use case)
5) electronic Primary Care Research Network (ePCRN) at the University of Minnesota (lead by Dr. Peterson)

Daily Lab / POC Activities

Extramural:

  • Issued the University of Chicago host and user certificates
  • Worked with JHUAPL on troubleshooting GridFTP
  • Work on Subversion configurations
  • Tarrant node offline due to T1 issues

I heard an old Song today...

As a wee programmer, I often found my self testing new programs created by others more initiated in the craft of Software. Most of these folks swore allegiance to one particular language or computer system, creating anthems to the objects of their devotion. A buddy at the time shared such a tribute with me after his code passed muster on the test bench.

Sung to the Beatles Tune "Let it Be"

"When I find my Code has got in trouble, my Supervisor comes to me.
Speaking words of wisdom Write in C..."

I decided I should cover my Pickle with a different flavor of Jam. It turns out, the Globus Alliance foresaw a partial solution to My NLP grid project. Its called
GT 4.0: C WS Core. So I will not be using gSoap or Axis C++ to Grid enable the Medley NLP. I will also explore using libxml2, a set of apache tools used to create web - services from C.

Once I have wrapped the API and deployed it as a web service, The WSDL should be callable in a myriad of ways, including technologies yet to be created.

I can't wait to see the documentation on the Medley API. If my suspicions prove correct, we should be able to create a web-service from the native API ( i was told it was written in C). I now have a sweeter jam and a larger pickle to spread it upon.

I gotta go download some more tools and start setting them up on my Fedora

I just love old songs don't you? They bring back memories.

"... and when my program is clobbered by sub-routines unknown to me,
I recite those words of wisdom, Write in C..."

Tuesday, May 27, 2008

Example-Land: More tasty than Jam on Pickles

Today, I finished reading the OGSA-DAI and Globus user manuals. I found them very informative.

To my delight, I learned/discovered web-services support included in the overarching architecture for Globus. (For this example, call this a Pickle). This suggested that we may be able to wrapper the Medley NLP API, hosted at Columbia with a WSDL using bindings created by an open source toolkit such as gSOAP or Axis C++, I need to point out to the newbies, that the C development language is a Subset of C++. (Henceforth call this Jam). Once we have a WSDL, we might be able to deploy Medley as a web-service to the Globus container. Once deployed we should be able to call the NLP in a variety of ways; Java/JSP, Another Web-Service, A LINUX/UNIX command line, etc

This week I intend to spread jam on pickles, and taste sweet success. Here is the baby "use case scenario"

The Goal: Submit a file of Free Text information, such as an admissions record residing on a remote hospital management system. Receive an HL7 or ICD9 encoded file in XML format on your local node/file-system.

Preconditions: A web-service enabled Globus container is installed on All nodes in the system, security privileges for the user have been setup to allow the operation. A set of free text admissions records in ASCII.

Primary Flow

1. User opens a browser.
2. System prompts user to specify a file to be scanned by Medley-NLP.
3. User selects the desired file from the local file system and submits it.
4. Remote Medley-NLP parses the file
5. Medley-NLP emits an encoded in XML format.
6. Globus transfers the file to the local file system.
7. The file is displayed in the Browser.

Alt. Flow 1.

A. In step 2, the user may select the encoding to receive, HL7 or ICD9

Alt Flow 2.

A. The user can select multiple files for NLP processing

Friday, May 23, 2008

Daily Lab / POC Activities

Extramural:

  • Solved software dependency issues and installed Subversion-1.3.0-20, Subversion-tools-1.3.0-20, and Subversion-Server-1.3.0-20 on lab 1002.
NOTE:A Subversion repository will be created once the programmers meet and decide on a standard directory structure for version control. This meeting will take place on Tuesday.

  • Reviewed the Medicus installation process that was sent by Nigel Parsad.
NOTE: Based on the Medicus installation procedures, I decided the best location for the first install would be lab 1001. This is because Medicus requires OGSA-DAI and MySQL to be installed and modified. We currently have a working instance of OGSA-DAI and MySQL installed on lab 1001. Medicus will be moved to the NCPHI server once we have evaluated it's behavior on the internal grid.

Reading The Fabulous Manual (RTFM)

Today I read the Administrator and User Guide for OGSA-DAI. I finished setting up eclipse EUROPA. I also downloaded more Java tools, the ones recommended in the manuals. I am thinking we may want to snag a copy of XMLDB for lab use. Its open source and could be useful for the NLP project as one of the outputs from the free text scan is an xml file.

Thursday, May 22, 2008

Notes from University of Utah (Thanks Ron Price)

RAVI (used to be named gravey) is the best tool I know of to create grid services:
http://www-unix.mcs.anl.gov/~neillm/ravi/

The GT4 book is the best reference I know on grid service development:
http://www.amazon.com/Globus%C2%AE-Toolkit-Programming-Services-Computing/dp/0123694043/ref=sr_1_4?ie=UTF8&s=books&qid=1211488358&sr=8-4

Here's some tools and comments on GSI management tools:
I think many projects use just myproxy and some user PURSe, but there are a couple of others some of which escape my mind at the moment and some of which are listed here: http://dev.globus.org/wiki/Incubator

A note on CAs: Previously I wrote something called DigitalSherpa (a custom extenstion of WS-GRAM) and during this project we realized that it was very beneficial to offload certificate management on to a more formal CA. This lessened the the amount of certificate management we had to do and allowed us to interact with more grid resources because are certs now came from http://www.doegrids.org/ . A huge problem with setting up your own CA is that no one else will trust it and this may or may not be a problem.
Also here is an interesting link: http://www.gridpma.org/

Here's the GT4 best practices for grid service development:
http://www.globus.org/toolkit/docs/4.0/best_practices.html

Daily Lab / POC Activities

Extramural:

We had a great meeting with Ron Price from Utah. We discussed some of the reasons he choose to use caGrid ( http://www.cagrid.org/wiki/CaGrid:Software:Release:1.0 ) over Globus. The main reason for choosing caGrid over Globus was that it removed some of the complexity when it came to creating grid services. The underlying technology of caGrid is Globus, but it comes with tools that aid in grid service development.

Ron currently has three grid nodes running caGrid. We discussed methods for connecting the two grids in order to run a grid to grid federated query. One way to do this is sharing our trusted CA information and the trusted CA hash file to the /etc/grid-security/certificates directory. This will be tested later.

We also discussed using PURSE for our certificate management. Ron mentioned this was a popular certificate management application used by other grids. I will continue researching PURSE for this purpose.

I also met with Nigel from the University of Chicago. He is interested in getting Medicus to work over the grid. I sent him a copy of our grid node installation document and CA setup information. He currently has a working Medicus installation that he was able to test with internal grid nodes. He agreed to send me his Medicus installation notes. He's going to get an installation update from his Medicus engineer in Peru and we will attempt to test Medicus over next week.

I am currently installing the software dependencies for Subversion 1.4.6-1 on lab 1002. Current software dependencies:

apr >= 1.2.7 is needed by subversion-1.4.6-1.i386
apr-util >= 1.2.7 is needed by subversion-1.4.6-1.i386
db4 >= 4.2.52 is needed by subversion-1.4.6-1.i386
libapr-1.so.0 is needed by subversion-1.4.6-1.i386
libaprutil-1.so.0 is needed by subversion-1.4.6-1.i386
libcrypto.so.6 is needed by subversion-1.4.6-1.i386
libexpat.so.0 is needed by subversion-1.4.6-1.i386
libneon.so.25 is needed by subversion-1.4.6-1.i386
libssl.so.6 is needed by subversion-1.4.6-1.i386
neon >= 0.25.5 is needed by subversion-1.4.6-1.i386
rtld(GNU_HASH) is needed by subversion-1.4.6-1.i386

Met with Anurag, talked with Ronald

Today I met the new developer Anurag at lunch and discussed the tenets and goals of the various Grid projects. He also expressed a fascination with the blog (shout out to you Anurag! :) ). Unfortunately, I will be out on vacation next week, so I will not be able to hang out with him when he gets acquainted, but he seems very neat and it will be lovely to have him on the project.

We also talked with Ron Price from Utah who has been working with the grid for Quite Some Time, and has several caGRID nodes talking with each other and accessing different databases through the caGRID federated query processing engine. It seems that the FQP is simular to the DQP that OGSA-DAI is working-on. Thus, I anticipate a lot of discussion contrasting and comparing the two and seeing if pertinent differences can be applied to future generations.

Cheers,
Peter W.

Chat with University Of Utah

The conversation went well. The most interesting part came when we agreed to explore interoperability between the two grid enabling technologies. If we can come up with a set of well defined test scenarios, around distributed queries, with security layered on top, we may learn much about what problems we might encounter in future deployments.

Wednesday, May 21, 2008

First Day Greet and setup

Today was my first. I worked with Peter W and Dan to setup my Linux ( Fedora ) workstation. I reviewed the candidate Architectures and began setting up my Java tools. I also attended a meeting about bringing the NLP applications hosted at Columbia University onto the grid as web services.

New people and new things.

Today I talked a lot with Peter Casey (the new developer), planned a meeting with Anurag Chawla (the second new developer) sat in a meeting with Brian Lai (as detailed below by) and Brian Lee.

One of the things that is being heavily discussed is setting up all the archetypes that will make this a Real Project. This means setting up some sort of code repository and build scripts for making something that can be deployed somewhere.

In preparation for all this, I have been researching maven... and honestly it's really neat how it will just build out a full project with two inputs... complete with test scripts and apache style documentation pages.

Internal Grid Node

I Setup a Red Hat server and a new Globus grid node on lab 1006 for our new software engineer.

New person / NLP Service Wrapper

Hello, I'm Brian Alexander Lee and I'm starting to work with the public health research grid team to help out on some of the proof of concept projects. I'm an enterprise architect and my background is in web services and software development, so I hope that I'll be of use to the projects.

To work with the Centers of Excellence, we had a call with Albert Lai of Columbia University on how to provide an interface to the Columbia Natural Language Processor (I think is called Medley).

We talked about some of the best ways to provide a wrapper to the existing NLP components and right now the thinking is to provide a web service wrapper through Globus that accepts medical free text (in either body text or files) that can invoke the NLP and respond with structured data in xml format. We'll post more as we start testing and designing more.

Also, Albert shared the URLs for two potential web service frameworks that we might be able to use in the test:

Tuesday, May 20, 2008

New developers and new codebases

Today I spent a good portion of time trying to find the best way to download RODS... I think I have gotten most of the code into a local Eclipse project... and only half of them are breaking with strange missing code... so I imagine there is some code hiding in other places.

Also, some new developers are going to be showing up in the coming days... and I look forward to sitting and chatting up all this nifty technology and getting some new perspectives and other ideas.

Monday, May 19, 2008

...About that secure access

Unfortunately, I spoke too soon... after testing a bit more... I found that Tomcat was still getting errors when trying to connect to a secure globus container port...

Ferreting things out from a few logs... I get this error: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target.

For some reason, the handshake works just fine when run from the command line... but when invoked from within the tomcat container... the certification credentials cannot be found. And I am not familiar enough with how tomcat security prefers to work to know what to try next.

Some web searching suggested that the public key of the computer I am trying to connect-to be added to the key-store... but I couldn't figure out the keystore password... and furthermore it bothers me that I can run a client making secure connections from the command line when logged in as the globus user, but that tomcat running as the globus user cannot.

I caught Alastair a bit and asked him about it, and he alluded to adding the server to tomcat in some way, and also posed the issue that if we had to add servers, it would make it very difficult to keep track of all the nodes in the grid from the JSP client perspective. Thus, some authentication discovery schemes will probably need to be thought through... they will probably be similar to something like Grid-Proxy that Dan is researching. Perhaps some sort of Kerberos or LDAP schemes can be adapted here.

Otherwise, I got SVN working a bit better in eclipse, but I still want to look at intelliJ and see how well it deals with JSP projects and the like.

Friday, May 16, 2008

Secure access

I have now converted the Secure access client provided by OGSA-DAI to output in HTML and play through the JSP page.

My next step is to build it into a suite of JSP code that will actually manage things like grid-proxy creation and server list configuration... but it's neat that I can query databases on other computers via secure means using grid certificates. Now to just make easily deployable archives using Mavis.

Dr. Tom Saville also introduced us to another Grid developer who seemed to have bundled most of the Grid and OGSA-DAI code into a single deployable (InstallShield) archive. He also brought up a lot of the issues of potentially storing contented data when we were discussing distributed issues.

I tried to get things synced via SVN with RODS, and I don't think Eclipse likes the way SVN stores things... that or I have no idea the best way to pull in projects (very likely). I am going to look into IntelliJ IDEA because it seems to be much smoother about syncing with SVN Repositories. Otherwise, lots of ideas still spinning in my head.

Have a good weekend!

Daily Lab / POC Activities

Extramural:

MonALISA is unable to display node status when the client is launched from a CDC network connected desktop. The MonALISA client works fine outside the CDC's network. This is due to an outgoing port restriction on the CDC's firewall. This condition was verified on multiple CDC desktops.

Thursday, May 15, 2008

Daily Lab / POC Activities

Extramural:

Lab 1001 and 1003 are back online.

  • Lab 1001 was unable to locate shared library files in the /opt/vdt/globus/lib/sasl directory. This error was experienced while trying to run the grid-proxy-init command. Globus was reinstalled to correct the issue.
  • The Globus container on Lab 1003 was failing to start due to in secure mode due to Java Authentication errors. The issue was corrected by requesting and reissuing host certificates to the grid node.

Wednesday, May 14, 2008

Dr. Jeremy Espino and the wonderful RODS project

I had a rather long and very informative desktop-share and teleconference with Dr. Jeremy Espino going over RODS (Realtime Outbreak Detection System) and how it can be incorporated into POCs for the PHIN conference in September.

Furthermore, he created an SVN repository for grid-related code (whether it be related to RODS, Globus, OGSA-DAI, &c), showed me the wonders of Maven, let me see the prettiness that is IntelliJ IDEA, and generally got me really familiar with how RODS works and what peices he wants me to focus on.

The SVN repository is grand, because it gives me a place to put some of the code I have been working on... and more importantly will force me to clean it up to the point where I am not scared to have other people looking at it.

As for the POC and RODS... RODS has UI functionality with the ability to basically take counts associated with locational data and actually get it plotted to various geolocational displays (think Google Maps). Thus, it would be the perfect UI for any sort of grid monitoring system, and if we can make data plugins that pull from OGSA-DAI resources instead of the normal RODS database (which is populated over time by a listener that parses HL7 hospital data)... Needless to say I have had an explosion of ideas for how to orient grid nodes and how RODS code can be used. I look forward to discussing them.

Daily Lab / POC Activities

Extramural:

I am currently troubleshooting the Java connection errors that happen while starting the Globus Container on lab 1003. I've searched the Globus forums and found several references to these errors, but no solid leads on how to correct the problem.

The Globus Container starts fine if the -nosec (no security) parameter is used, but the Container fails to start in secure mode. Based on this symptom, I plan to reissue the host and user certificates in case something was corrupted during the OGSADAI database modifications.

Click here for log output!

Tuesday, May 13, 2008

To do: stop breaking globus' database

So, while trying to do secure queries using globus... I found that globus wasn't working on several (well, any) of the test nodes we are using.

I also noticed that I tinkered with the MySQL databases running crucial parts of the globus software... I am not sure I managed to mess up the nodes, but it is quite likely I didn't help things.

Otherwise, it bought up a very important point, I should probably not be trying to use databases on the same server running globus, but databases running on separate VMs... as all the scenarios for phgrid deployment in my mind seem to focus on having the grid node running globus be separate from the database server(s).

Thus, Using separate VMs will better simulate what an actual grid will look like, and help us realize whether remote subnet access is something easy to configure... and better portray a wide range of databases and authentication schemes... since there won't be an ability to go "connect to the database on localhost" which implies a large amount of trust.

Monday, May 12, 2008

Daily Lab / POC Activities

Extramural:

  • Upgraded Globus to 4.0.7 on lab 1001
  • Configured WebMDS on lab 1005
  • Researching: edg-mkgridmap - Process will update the gridmap file automatically for local and remote users. This may solve the issue of simultaneously updating the gridmap file for multiple grid nodes.
  • Researching: GUMS (Grid User Management System) A grid identity mapping service that could be used to automatically map user accounts to grid resources. This could reduce the burden for user management.

OGSA-DAI location demonstrated, now for GLOBUS

Earlier this afternoon I got the classes and JSP all aligned so that you could have a dropdown for selecting a deployed ogsa-dai resource.

Now, the next step is going to be discovering all of the grid resources that have OGSA-DAI resources... and for that I need to familiarize myself with Globus MDS and how to code for it, I might also look at MonALISA and see whether it will have better APIs for globus resource location. Before that, I am going to have to orient most of the resources that I have recently deployed through tomcat to work instead through the globus container and orient my JSPs to hit the globus-related URLs instead.

Another thing that has been brought up is how to integrate OGSA-DAI and globus on a larger scale... and one of the things that OGSA-DAI is purported to be able to do is essentially merge heterogeneous databases.... and that's not entirely the case.

I feel that what OGSA-DAI does is it will expose some metadata on, and enable access to, views or tables in different databases. The Views themselves will have to be homogenous... if they don't have the same schema then it will not be possible for databases to simply be plugged in.. at least not at first. Later generations might have data mapping capability, but for the case of polling systems and scanning algorythms everything is still going to need to be relatable to some master data schema.

Sunday, May 11, 2008

Scientific American - Science 2.0 -- Is Open Access Science the Future?

http://www.sciam.com/article.cfm?id=science-2-point-0

Interesting article that I saw while killing time in a book store. The work being done on the PH Research Grid is a very interesting case study on the use of Web 2.0 technologies in conducting research, much less applying those technologies to practice. Very interesting parallels between open source development, and the scientific process.

Thursday, May 8, 2008

Progress

I have finished adjusting the ServerClient example code so that it will be able to be used by the simple Query JSP I am building to populate a drop down of available resources. The next step after this would be to use Globus MDS to figure out which nodes have OGSA-DAI installed, and then you basically can pick a node, pick a resource, and run a query... which is very well along the lines of then picking multiple nodes/resources and running a query on them...

Otherwise, after reading through the Ogsa-Dai documentaion more, there are files you can make that simplify the resource creation process, and it would be easy to write scripts and UI's that create/modify such files... so it appears that frustration has created some ideas for more useful applications.

Dan also helped point out that it was very likely a MySQL JDBC issue.

Wednesday, May 7, 2008

A killer app for OGSA-DAI

I unfortunately spent most of the day today trying to get a simple client to pull a simple query from a newly created MySQL database. I keep checking the logs, seeing that the connection was refused, and I haven't figured out why... What makes it more puzzling is that I can connect to MySQL using the MySQL administrator and the generic MySQL client.

Perhaps I am using the wrong JDBC connector... as the version of MySQL on this machine (4.1.22 installed by VDT for RFT) is older than the more common version 5. Perhaps I am using the wrong version of the JDBC connector or am invoking it improperly. I have checked the resource files and the logons.txt files and haven't found anything out of spec there... I will probably just avoid using it tomorrow and instead focus on enhancing the JSP I've built rather than continue hammering on trying to get it to connect to this new resource.

Either way... a significant portion of my time has been hunting down configuration mishaps. Needless to say Ant build scripts run from command line with lots of variables are very dodgy... and it is very easy to make mistakes which have to be hunted down later. I really think a killer app for OGSA-DAI is going to be a easy-configuration engine with a connection checker... just so that it becomes much easier to make sure the ducks are all in a row for pulling data from a given view in a given database at a given medical center... otherwise we'll probably be sending Centers of Excellence 15-page debugging guides and dedicated a 20-head help-center to correcting the bugs I keep running into.

Tuesday, May 6, 2008

New Demo using RODS Data

Special thanks to Jeremy (RODS) and Alistair (OGSADAI):

New version of the demo using RODS data -

http://test.ogsadai.org.uk:8080/dai/zipcode-queryfile.jsp

Note:

The data has been split into two databases and these are queried and
the results are displayed to the user. The query is a drop down list
of conditions based on some that are available in the rods dataset.

-- great to see this demo in action!

Daily Lab / POC Activities

Extramural:

Successful grid test with Washington and Columbia
Columbia MonALISA installation completed
T1 installed in lab and lab servers have been reconfigured
Configured PGSQL OGSADAI databases on LAB 1005
Sent grid install instructions to the Mayo Clinic
Additional grid IP's requested for LAB 1001

Variable JSP

Friday was interesting, the internet was upgraded in the lab and unfortunately that meant all the IP addresses needed to be changed.

Ubuntu does not gracefully handle an IP change... at least not one that is rigged to run VDT installed Globus. The main issue seems to be with MySQL and PostgreSQL, which are configured strangely on ubuntu... and are causing all sorts of connection havoc.

But, I moved a bunch of the code I was working on to a Suse server, got the JSP running again, and now have the ability to pass in different queries. My next set of plans is to get the OpenMRS dataset onto the Suse server and then allow for different pulls from different databases... after that I want to implement a resource discovery subroutine that will work to populate the available resources of a given OGSA-DAI instance.

Thursday, May 1, 2008

Ian Foster and PHGrid

Ian Foster is the director of the Computation Institute and is the Arthur Holly Compton Distinguished Service Professor of Computer Science at the Argonne National Laboratory & University of Chicago. He is also involved with both the Open Grid Forum and with the Globus Alliance as an open source strategist. In 2006, he was appointed director of the Computation Institute, a joint project between the University of Chicago, and Argonne. An earlier project, Strand, received the British Computer Society Award for technical innovation. In addition, he is on the board of the HealthGrid.US alliance.

Our team was contacted by Ian today, and we are honored by his support and look forward to globus grid discussions with him in the future.

For more information, you can visit his personal blog.

That's one JSP down

This morning was wonderful, Alastair of the OGSA-DAI project helped me get the log4j configured properly in tomcat, and then I was able to view logs and discover the nature of the errors. With that, I got the first JSP running (a simple extension of the existing SQLClient example class). The rest of today was spent modifying the JSP and the attached classes to be more HTML friendly, and now I am trying to make the inputs dynamic (so you can type in the query you want to try and select the resource you are polling and select the URL of the box you wish to query)

After dynamic inputs, I will want to start the dynamic population of inputs... so that the client queries the grid to figure out the OD resources at a given node... and on a separate front, I will start looking at trying to dynamically select from multiple sources at once.

Another thing that was discussed today was how to implement distributed Natural Language Processing using Globus and it's tools. There was lots of discussions of Apache AXIS and OPAL and which might be a better "wrap this interface and make a set of WS code ready to deploy to a container" tool... One of the notable quotes brought up during this was that "Web services are pretty much the Rube Goldberg devices of the Internet."

Daily Lab / POC Activities

Extramural:

  • We are currently having a T1 installed in the NCPHI lab. This is causing port issues on the previous NCPHI grid node in the 50000 to 51000 port range. The NCPHI grid node's internal and external IP address will be changing once the connection goes live. Chris is working on remapping the Globus ports to the new IP address. For the time being, we are asking the new sites to test GridFTP with Dallas instead of the NCPHI node.
  • Columbia was able to perform a successful GridFTP file transfer with the Dallas node. Columbia was also able to receive data from NCPHI, but was not able to send data to NCPHI due to firewall issues on the NCPHI side.
  • Washington was issued new certificates and attempted a GridFTP transfer with NCPHI that resulted in failure. (NCPHI firewall issues) I sent new test instructions to Washington so they can run a GridFTP test on the Dallas server. This test should happen sometime today.