Wednesday, November 26, 2008

Poicondai-web has polygons, and you can install it.

Hello everyone.

Poicondai-web now has zipcode polygons. You can see them here

Also, I have updated the poicondai-web service registry page with some more information about how to download and install the poicondai-web, poicondai-util, poicondai-loader, and NPDS-WS-Client. That is here

Next up is putting those polygons into Rodsadai... but that will be after the holidays.

Have a happy Thanksgiving everyone!

Thursday, November 20, 2008

I have polygons, but not all the zips that might be sought out.

I have polygons, and have shown that I can get all the polygons possible showing up in Colorado.... but there is a problem...

The list of zipcodes I have and the list of polygons for the zipcodes I have show some discrepancies... And it all stems from the fact that zipcodes can change. Thus, there are several areas that are blank in my "map of Colorado" because they have to do with zipcodes that might have split recently or were otherwise not in the Geolocation data I have been given.

That doesn't mean that the NPDS doesn't have a few results for them.

Thus, I am in a bit of a quandary. I guess the best thing I can do at this point is have a little table at the bottom of the map that says "zipcode ##### was not in the polygon database." Because even if we changed the polygons to fit with old zipcodes it means that there would have to be polygon overlaps and it would get very confusing.

Otherwise, I imagine poicondai with zipcode capability (at least the ones we have polygons-for) will be ready for testing sometime tomorrow.

Grid Enabling Existing/Legacy Applications With gRAVI

I recently wrapped SatSCan in a grid service using gRAVI and gRAVI treated me
well. gRAVI can be downloaded as an Introduce plugin and it is designed to wrap
a grid service around an executable. Your job can then be treated as GRAM job
which is great because the status of the job is then represented via GRAM (staging, running, finished, ...). Also, by default gRAVI stages your files in for you and transfers your results back to you via byte array. I think in gRAVI 1.4 will support the following transfer mechanisms: gridFTP, byte array, caGrid Transfer.

If you need to grid enable an existing/legacy application I highly recommend gRAVI. It will save you time.

Anyway, I just started my second iteration on grid enabling SatSCan and I have some work to do on the client plus the Cloud is on the horizon for this service. I better get back work:)

World Wide Grid

All of that dark fiber and computing power has to be used for more than just YouTube videos. The EU has invested 2.5m Euros into a project that will make worldwide Grid computing more accessible.

Wednesday, November 19, 2008

Zipcode Polygons are working in testing.

So I have zipcode polygons enabled.... I also have modified poicondai-web to use maven filters.

Note to self: the test resources are different objects in the pom.xml than regular resources.. thus, if you need to enable filtering on the test resources in a maven2 project, you will need to create a section for it...

Tomorrow, I will modify the main pages to get zipcodes encoding and popping up search lists... and then I think I will be examining rearchitecting poicondai-web to have a much simpler structure with a class returning all the polygon javascript instead of doing it in the JSP.

Then, it'll be polygons for Rodsadai.

Tuesday, November 18, 2008

AMDS Sample Structure

So Jeremy and I have been tossing some ideas back about an initial draft for the AMDS data structures. Based on discussions led by Tom Savel on what fields should be included in the AMDS, we're going to start testing development using a basic AMDS that includes:

  • Date

  • Patient Zip3

  • Syndrome

  • Syndrome Classifier (i.e. which classifier was used to assess the syndrome)(e.g. BioSense, EARS, RODS, ESSENCE, etc.)

  • Count

  • Denominator ( count of all syndromes for that zip3 on that date)

Felicia is starting to think about converting the RODS-HDS service (developed to meet this feature request) to RODS-AMDS to meet the draft xml structures. These xml structures are very lightweight and we will modify them as the AMDS data structure undergoes changes based on scientific comment.

The next step is to add the sample xml and schemas to the wiki so we can start getting comments in. Based on comments, we will start to plan services that provide BioSense VA and DoD sample data using the AMDS structure so that it can be combined with RODS data (and eventually EARS, ESSENCE, other systems).

Updated DRN Design drafts

I updated the DRN Design Drafts wiki page with a data flow diagram to show the flow of detailed data --> aggregate data --> combined aggregate data based on some feedback from Roy Pardee, Ross Lazarus and Jeff Brown.

Zipcodes polygons in the database

So, I have updated the poicondai-loader project and the poicondai-util project and now we have zipcode polygons in the polygon database.

Also, the deploy of tomcat to the staging node went swimmingly. Dan switched the tomcat server over to port 8443, and rodsadai and poicondai worked like nothing had changed.

So, tomorrow is starting to do some testing with the zipcode polygons, and then updating the map application to have zipcode polygons in addition to the county polygons.

Another Distributed Aggregated Query Project (SHRINE)

The Harvard Catalyst's Informatics Program has developed technology in lock step with regulatory and ethical requirements to allow authorized investigators to acquire robust sample sizes across all Harvard-affiliated healthcare institutions. We call this querying system SHRINE (Shared Health Research Information Network). As shown in the diagrams below, there is no central database but rather the SHRINE queries are distributed across each of the participating institutional databases. In this way, each institution maintains autonomy, control, and monitoring of all transactions on behalf of its patients.

Consumer Health Informatics and the Grid?

I ran across this article this morning. It's about a new program using Google Health in association with Medicare programs in Arizona and Utah. Conceptually, given the Medicare bent, this may be rich information for chronic disease interventions and surveillance. In terms of services, I could see decision support / alerts. Are there others?
(edited 2008.11.18 by BAL to add link)

Monday, November 17, 2008

Prepping for new rodsadai

So, today I spent some time prepping for the poicondai loader load of a lot of zip code polygons into a database.

But a lions share of the day was spent setting up the shiny new secure tomcat installation on the staging node. Also known as "boy, I love being able to configure different ports!"

So, the window for taking down the node and updating RODSAdai to call secure tomcat is tomorrow... but that didn't stop us from making sure we couldn't set up secure globus and make some ogsadai calls to it. That took a little bit of time mainly because the tarball I set up apparently broke or was not happy in it's new environment, so we had to build a new one from scratch.

Many thanks to Felicia and Dan for pretty much doing most of the footwork before me so all I had to do was go "get this, put that there, lemme check... yay!" Dan was awesome with the configuration and Felicia knew most of this stuff from having to deal with it before a couple of times.

Tomorrow is loading polygons, updating build styles, and starting the poicondai-web modifications.

DRN Design drafts

I've put together some design sketches to show the planned deployment and sequence of events for the DRN SAS automation scripts that Dan has begun writing.

Basically we'll have three components:

  1. - transfers files to Globus nodes and aggregates the output documents into a single TSV report (to be developed)

  2. Secure Simple Transfer Service - runs on the Globus nodes and allows for listing, getting and putting of sas programs and output files (already completed)

  3. - configures and runs sas programs against databases (to be developed)

Please let me know any comments you may have as Dan is beginning to develop the scripts. The source code will be stored in

(here's the visio file should anyone want to make direct comments and changes.)

Apache Tomcat Update

Tomcat has been updated on the NCPHI node. We are currently running version 5.5.27. Globus has been deployed to Tomcat on port 9443.

Friday, November 14, 2008

Demos are pretty

Sorry for the lack of updates on my part. I was spending most of the week improving the already nifty Poicondai demo to have variable y-axises and start grouping by week if queries over 180 days (about 6 months) are chosen.

You can check it out by going to

The other thing I did was get ogsadai running on secure-tomcat and verified that rodsadai could connect to it. Many many many thanks to Felicia for helping me with that yesterday. She had already found out all the crazy things that had to be done to get tomcat working and was able to help me get a similar setup working in about a half hour.

Next up is zipcodes. I will need to convert the zipcodes into a geolocational database for poicondai, and then I will be updating poicondai-web to select zipcodes, updating rodsadai-web to use zipcode polygons (which I hope will turn out pretty cool), all while hopefully implementing some new filtering so I can pull all the different configuration options into one file.

Wednesday, November 12, 2008

Update: Globus on Windows

Corrected the following error by setting these variables:

set X509_USER_CERT=C:\Documents and Settings\bubba-gump\.globus\usercert.pem
set X509_USER_KEY=C:\Documents and Settings\bubba-gump\.globus\userkey.pem
set X509_CA_CERT=1234abcd.0
set X509_CERT_DIR=C:\etc\grid-security\certificates

The corrected error:

Your identity: O=Grid,CN=bubba-gump
Enter GRID pass phrase for this identity:
Creating proxy, please wait...
Proxy verify failed: Unable to load CA ceritificates

New Error Message:

Your identity: O=Grid,CN=bubba-gump
Enter GRID pass phrase for this identity:
Creating proxy, please wait...
Proxy verify failed: "/O=Grid/CN=bubba-gump" violates the signing policy defined for CA "/O=xxx/OU=zzz/OU=szzzz/CN=xxxx
Simple CA" in file "C:\etc\grid-security\certificates\1234abcd.signing_policy"

Next step:
Create a new certificate request with the correct subject line. This should fix the security issue.

New Book about Scientific Collaboration on the Internet (Ian Foster Contributing)

Scientific Collaboration on the Internet

0262151200-medium I'm looking forward to receiving my copy of Scientific Collaboration on the Internet. I have an article in it on lessons learned from the NEESgrid project (an earlier version is here, I think it's a good read, especially between the lines), but the other articles are probably far more interesting:

The Contemporary Collaboratory Vision

  • E-Science, Cyberinfrastructure, and Scholarly Communication -- Tony Hey and Anne Trefethen
  • Cyberscience: The Age of Digitized Collaboration? -- Michael Nentwich

Perspectives on Distributed, Collaborative Science

  • From Shared Databases to Communities of Practice: A Taxonomy of Collaboratories -- Nathan Bos, Ann Zimmerman, Judith S. Olson, Jude Yew, Jason Yerkie, Erik Dahl, Daniel Cooney and Gary M. Olson
  • A Theory of Remote Scientific Collaboration -- Judith S. Olson, Eric C. Hofer, Nathan Bos, Ann Zimmerman, Gary M. Olson, Daniel Cooney and Ixchel Faniel
  • Collaborative Research across Disciplinary and Organizational Boundaries -- Jonathon N. Cummings and Sara Kiesler

Physical Sciences

  • A National User Facility That Fits on Your Desk: The Evolution of Collaboratories at the Pacific Northwest National Laboratory -- James D. Myers
  • The National Virtual Observatory -- Mark S. Ackerman, Eric C. Hofer and Robert J. Hanisch
  • High-Energy Physics: The Large Hadron Collider Collaborations -- Eric C. Hofer, Shawn McKee, Jeremy P. Birnholtz and Paul Avery
  • The Upper Atmospheric Research Collaboratory and the Space Physics and Aeronomy Research Collaboratory -- Gary M. Olson and Timothy L. Killeen; Assisted by Thomas A. Finholt
  • Evaluation of a Scientific Collaboratory System: Investigating Utility before Deployment -- Diane H. Sonnenwald, Mary C. Whitton and Kelly L. Maglaughlin

Biological and Health Sciences

  • The National Institute of General Medical Sciences Glue Grant Program -- Michael E. Rogers and James Onken
  • The Biomedical Informatics Research Network -- Judith S. Olson, Mark Ellisman, Mark James, Jeffrey S. Grethe and Mary Puetz
  • Three Distributed Biomedical Research Centers -- Stephanie D. Teasley, Titus Schleyer, Libby Hemphill and Eric Cook
  • Motivation to Contribute to Collaboratories: A Public Goods Approach -- Nathan Bos

Earth and Environmental Sciences

  • Ecology Transformed: The National Center for Ecological Analysis and Synthesis and the Changing Patterns of Ecological Research -- Edward J. Hackett, John N. Parker, David Conz, Diana Rhoten and Andrew Parker
  • The Evolution of Collaboration in Ecology: Lessons from the U.S. Long-Term Ecological Research Program -- William K. Michener and Robert B. Waide
  • Organizing for Multidisciplinary Collaboration: The Case of the Geosciences Network -- David Ribes and Geoffrey C. Bowker
  • NEESgrid: Lessons Learned for Future Cyberinfrastructure Development -- B. F. Spencer, Jr., Randal Butler, Kathleen Ricker, Doru Marcusiu, Thomas A. Finholt, Ian Foster, Carl Kesselman and Jeremy P. Birnholtz

The Developing World

  • International AIDS Research Collaboratories: The HIV Pathogenesis Program -- Matthew Bietz, Marsha Naidoo and Gary M. Olson
  • How Collaboratories Affect Scientists from Developing Countries -- Airong Luo and Judith S. Olson


  • Final Thoughts: Is There a Science of Collaboratories? -- Nathan Bos, Gary M. Olson and Ann Zimmerman


Tom showed me the PopSciGrid by Science of Networks in Communities (SONIC) that he learned about at AMIA.

It's interesting as it is combining multiple data sets over a grid, but also has a rather useful user interface that we might be able to co-opt for the PHGrid UI.

Thoughts from AMIA

Dialing in from AMIA, I thought it'd be important to capture random thoughts and specific feedback from the conference.

1. The AMIA community seems to have embraced the public health community with great enthusiasm. All of the sessions were well attended by members from across all sectors -- clinical informatics, vendors, academia, international stakeholders, and consultants.

2. The PH Research Grid session was a very strong session, with lots of great interaction from the audience. With the COE's sharing their experiences and findings, they were able to give their honest appraisal of the grid approach, and while there are many, many kinks to work out, it seems there is strong agreement of the movement toward standard services is the way to go. Now if we get real crazy, stringing together a couple of services to show this would be a great next step.

3. Dr. Lenert was able to successfully demonstrate PH-DGINet, which helped the audience of his session appreciate some of the simple use cases we are aiming to satisfy (i.e. summary counts).

4. In the longer term, there is some potential to collaborate on the Clinical Decision Support work being led by Nedra Garrett. First, in using the NCPHI lab infrastructure, second, in publishing some alerting services to interface with clinical EMR vendors, HIE's, and other agencies. Still just an idea at this point, but something to explore.

5. Two pragmatic questions continue to raise themselves. Specifically:

What value to state and locals derive from 'the grid'?
Is syndromic surveillance valuable enough to foster adoption?


Tuesday, November 11, 2008

Tracking Flu trends - Google

From today's official Google Blog!
From today's front page of the New York Times!
Or go right to their tool here!

Tracking flu trends

11/11/2008 12:51:00 PM
Like many Googlers, we're fascinated by trends in online search queries. Whether you're interested in U.S. elections, today's hot trends, or each year's Zeitgeist, patterns in Google search queries can be very informative. Last year, a small team of software engineers began to explore if we could go beyond simple trends and accurately model real-world phenomena using patterns in search queries. After meeting with the public health gurus on's Predict and Prevent team, we decided to focus on outbreaks of infectious disease, which are responsible for millions of deaths around the world each year. You've probably heard of one such disease: influenza, commonly known as "the flu," which is responsible for up to 500,000 deaths worldwide each year. If you or your kids have ever caught the flu, you know just how awful it can be.
(more on their site)

Monday, November 10, 2008

Globus on Windows

I'm able to run the Java WSCore container with no security, but I get an error when I try to start the container with a certificate. The next step will be to recreate the Linux based certificate directory structure on the Windows machine and troubleshoot from there. I will also try to load the ca-setup file from the internal grid machine.

C:\gt4\bin>grid-proxy-init -debugFiles used: proxy : C:\DOCUME~1\bubba-gump\LOCALS~1\Temp\x509up_u_bubba-gump user key : C:\Documents and Settings\bubba-gump\.globus\userkey.pem user cert : C:\Documents and Settings\bubba-gump\.globus\usercert.pemYour identity: xxx xxx xxx Enter GRID pass phrase for this identity:Using 512 bits for private keyCreating proxy, please wait...Proxy verify failed: Unable to load CA ceritificatesjava.lang.Exception: Unable to load CA ceritificates at at at at at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.globus.bootstrap.BootstrapBase.launch( at org.globus.bootstrap.Bootstrap.main(\gt4\bin>globus-start-container[JWSCORE-114] Failed to start container: [JWSCORE-200] Container failed to initialize [Caused by: [JWSSEC-248] Secure container requires valid credentials. No container descriptor file configured and default proxy not found. Run grid-proxy-init to create default proxy credential.]

C:\gt4\bin>globus-start-container -nosecStarting SOAP server at
With the following services:

caGrid / TeraGrid Security & Interoperability Concerns

>> From what I understand so far caGrid 1.2 uses Globus 4.0.3 Java WS-CORE and Globus in caGrid 1.2 is as up-to-date as possible, but not necessarily hardened. Is this right?

Yes. Frankly I’d be surprised to learn that TeraGrid is running modified Globus code that has not been contributed back given the significant overlap in personnel on those projects. However, if you’d like to follow up with them on specifics we’d be happy to work with you to assess any applicability to caGrid. I’m sure what they were referring to was something like the GSI OpenSSH libraries which TeraGrid uses to allow Globus credentials to be used via ssh. As I’m sure you are aware, the ubiquity and power of ssh makes it a prime candidate for potential attack and there is a large active community analyzing and addressing any such vulnerabilities. It is important for an infrastructure like TeraGrid to stay up to date with any such ssh patches, and those trickle down to the Globus libraries which use them. As stated before, we use no such libraries as we only use SSL for securing the communication channel of web service calls. While obviously this is still critically important, its scope and therefore potential for exploit is significantly less (e.g. you can’t run arbitrary commands on the remote machine). As Steve mentioned, we monitor the Globus releases and community security advisories to ensure our infrastructure is not vulnerable.

>> It seems that caGrid 1.2 is installed at NCI, so it has meet the federal guidlines that are required to have it installed at a place like NCI, right?

Yes, that is correct. Before we deploy the grid we have to go through a series of vulnerability scans.

Some caGrid Considerations on Globus Hardening

Globus is a large toolkit, caGrid is a service oriented architecture and leverages the ws-core component of Globus. The TeraGrid infrastructure is different in that it mostly leverages other features of Globus such as GRAM and GridFTP. Thus each projects opinion on whether or not Globus is hardened is going to be closely tied to their experience with the components of the toolkit they use. Currently caGrid is using Globus 4.0.3, however many of our services will operate with Globus 4.0.X. The distribution of Globus that we link to on contains additional features/enhancements which have been added by working closely with the Globus team. The caGrid team monitors the Globus project closely to make sure any critical bug fixes are addressed as appropriate. With each Globus release, we look at and evaluate new features, some of which have been incorporated into the 4.0.3 distribution we provide. The main difference between the latest release of Globus and the version we are using is specification changes. Adopting the specification changes in caGrid would cause services developed on the earlier specifications to NOT interoperate with services developed on the newer specification. The specification differences are minor and this point not worth breaking interoperability between services. We do plan on adopting these specification changes with caGrid 2.0 but are waiting for the Globus folks to upgrade there web services environment, which should significantly improve performance. We would like to combine the specification upgrades and the web service environment upgrade into one release so that our users only need to upgrade their services once. Before answering your question I wanted to give you some insight on why caGrid uses the version of Globus that it does, however to answer your question, caGrid does not use a hardened version of Globus.

Friday, November 7, 2008

Slides From U of Utah CoE

Today GRID can do many exciting things that could not be done before. Also, the GRID is in the process of having security evaluated for PH purposes. Lastly, virtualising your data on the GRID does NOT mean you loose control of your data as the the slides
here illustrate via a demo of instantaneous authorization revocation. The slides also go over the benefits of grid today and where security is today.

Minimum Node Hardware Specs

We've been getting some chatter about what hardware is required to run Globus and PH-DGInet nodes (combined as a PHGrid node). The idea is that the node has very little important data and really just runs simple web services, it can run on commodity hardware.

So the initial spec we're working with is something like:
2GHz processor
1GB RAM (Linux) / 1.5GB RAM (Windows)
8GB HardDisk storage for Globus / 10GB HardDisk storage for PH-DGInet (mainly the geospatial databases)

So the spec looks like a top of the line server from the year 2000. Nowadays this should be something like a mac mini ($599) or something more respectable like a Dell PowerEdge 1U server (about $700). Note that both of these blow away the processor specs (because they are dual core and quad core) and hard disk specs.

Security Policy Document - New Draft

Raja and Joseph put out a new draft of the PHGrid Security Policy Document.

It's available on the wiki for your review.

Another Demo

February 24/25, 2009. AAPCC Mid-Year Director's Meeting in Albuquerque, NM.

edit: BAL- specific dates and name of meeting

Thursday, November 6, 2008

Upcoming Demos

There are a few demos scheduled for PHGrid:
Today, we'll be showing off PH-DGInet and the Poison Control WS demo application to Dr. Alvin Bronstein.
Next week (Tuesday, Nov 11) Drs Lenert and Savel will present PHGrid at the AMIA Conference.
Dr. Savel might present RODSA-DAI (as a potential NHIN Domain Service) at the NHIN Public Forum on Dec 15/16.

Wednesday, November 5, 2008

Project Mgmt Update

A lot of moving parts, so at the suggestion of the team, thought it'd be good to note the important happenings:

1. A PH Grid Charter draft has been produced and is currently in review by program staff.

2. A draft PH Grid Project schedule is being drawn up now, with the first draft nailed down by Friday COB.

3. We are preparing for a Poison Control visit tomorrow (nice work on the demo, Peter), where we will have a tour of the lab, and discuss two major topics. A) future enhancements to the Poi Con web service and B) Poi Con web service security requirements

4. We are working with ESRI, the NCPHI lab, and South Carolina to get the PH-DGINet nodes, services, and demos working as they were post-PHIN. Some changes to all the environments have led to poor communication handoffs on my end.

5. We have received feedback on the proposed AMDS, and we will be working internally by the end of the week to discuss next steps.

6. Tom Savel and I have discussed how best to recruit and provide nodes (be them DGINet or Globus) to state and local health agencies. Right now, a multi-prong approach including possible regional collaborative coordination, HIE recruiting, GIS community recruiting, and program priorities is recommended. If you have any ideas or any interest in having a node (especially if you are a public health department or some of your best friends are public health departments), please let me know.

7. A draft PH Grid security policies document and conceptual architecture has been produced. Please let us know if you're interested in seeing either.

8. Next week: AMIA demos

So, Poicondai is out there and pretty.

I have been making several modifications to the Poicondai-Web demonstration.

You could go to the service registry of the PH-Grid wiki to find the link, but I am going to go ahead and post it here:

Please go ahead and poke it, and especially the poicondaiMap.jsp page. Any searches you do will help build the cache and improve the responsiveness.

Also, please email me any bugs you may find, so that I might start thwacking accordingly.


Windows Core Update

  • The system requirements document has been completed.
  • I am currently focusing my attention on the installation and configuration of Globus 4.2.1 Java WS core on the Windows Dev node. At first glance, many of the files used within the C core are present in the WS Core.
  • The internal grid certificate configurations have been moved to the Windows server.
  • I had to uninstall the Java 5 EE SDK because it did not seem to function with the WS core install. I will try version 1.4.2.
  • Hash file on NCPHI was rebuilt due to corrupted file.