Public Health Grid (PHGrid) - Research and Development: 2008

Monday, December 29, 2008

Columbia’s Natural Language Processing Technology Licensed to NLP International

Congratulations to our friends at Columbia

New York — Columbia University reports that it has licensed MedLEE, a natural language processing technology, to Columbia spin-out, NLP International Corp.

Millions of dictated and typed medical reports require laborious and time-consuming processing by highly trained and expensive experts who manually review and extract the required information. MedLEE reduces the time and expense associated with these processes by automatically extracting and encoding all relevant clinical information. Once coded, the information is easily available and accessible for further clinical processes like billing, reimbursement, quality assurance analytics, data mining, accreditation and others.

“A significant proportion of the [electronic] health care record (EHR) resides in the form of unstructured, natural language text and both the amount of data and the number of records is growing dramatically,” said Bernie Keppler, founder and chief executive of NLP International. “Ready access to this information is critical to save costs and improve the quality and access to health care.”

MedLEE has been successfully tested by large hospital systems and government agencies, including the NewYork-Presbyterian Hospital , the National Cancer Institute and the U.S. Department of Defense. Several pharmaceutical companies and healthcare information system vendors are currently evaluating MedLEE for a variety of applications.

“I am excited that this technology will now be more broadly available to hospitals and other health care organizations, where it can continue to contribute to improving patient care,” said Carol Friedman Ph.D., Professor of Biomedical Informatics at Columbia University , who developed the technology. “MedLEE has been used in the academic community for many years to develop clinical applications that have been shown to improve the quality of health care.”

“MedLEE is considered by many in the field as the gold standard for unstructured medical text processing, but it has not been available as a commercial, enterprise-ready product,” said Donna See, Director of Strategic Initiatives at STV, which brokered the deal. “We are very pleased to be partnering with NLP International to introduce MedLEE to these markets.”

Robert Sideli, M.D., CIO at Columbia University Medical Center , said the already widespread deployment and use of MedLEE throughout the research and healthcare communities prove the system’s future commercial success. “It will contribute substantially to higher efficiency in the electronic medical record industry due to its superior functionality in medical data extraction, coding, analytics and data mining,” Sideli said.

David Lerner, who oversees new ventures for STV, said Columbia has found the right partner in NLP International. “This venture will add to the many successful technology spin-off ventures for which Columbia University is known,” Lerner said.

Tuesday, December 23, 2008

A build Point

I have reached a point where I was able to hand the code for gmap-polygon to Felicia so she could use it to draw polygons for data that her services were returning. Many thanks to Felicia and Brian. Felicia for her patience for all the "oh crap, that's not doing what I thought, sync and try again" moments. Brian for doing some textedit niftiness to turn the KML files into easier to parse flat-files.

Otherwise, I am going to be off on vacation for the next couple of days, and some of them Felicia will still be coding, so I am glad I got a chance to explain the inner workings of the code and that I spent a lot of time trying to make it as agnostic as possible.

Otherwise, that being said, I really need to step up my Javadoc documentation. I'm committing to spending the last hour of each day documenting all my new stuff (and old stuff I forgot to document) and checking it in. That way, people who AREN'T sitting right next to me will have a much easier time figuring out how to do stuff and why I did the things I did the way I did them.

Cheers, and Happy Holidays.

AMDS White Paper

Wil posted an initial draft of an Aggregate Minimum Data Set white paper over on the wiki. Please read and submit comments on how to improve the document. The purpose of the document is to show how aggregate data can be useful for biosurveillance. This isn't a new concept, but we wanted to put a clarification document of how our AMDS-related activities can benefit biosurveillance.

I'll be offline due to Christmas and New Year's and won't be back online (I hope) until January 5. So Merry Christmas, Happy Hanukkah and Happy New Year.

Monday, December 22, 2008

A plague of ticks (and by ticks, I mean little truncating errors)

So, I am sort of excited because I think that I am ready with Gmap Polygon, but on Friday I noticed that some of the polygons weren't getting drawn well at all, namely because the polygon strings (long lists of coordinates separated by spaces) were way too short and were getting cut off.

So today I spent a lot of time monkeying around with the little SAX engine I was using, and now I am sure my issue is that I need to just not use the sax engine anymore, because while it was a quick class and rather fast, it is really screwing up some of the stuff I am trying to do and some of the cases I am trying to handle and I am no further along figuring out how to make it stop doing that.

So tonight, I am going to read up on JAXB and maybe work a tutorial, and tomorrow, I hope to get this fixed.

Otherwise, there are some other fun questions about what everything will end up looking-like and how crazy some borders are. It seems that zip3s can be inside other zip3s and it also might seem there are more than one area covered by a zip3. I guess that's what happens when you base your centers on mailing routes. Zip3s have a lot of 'outer boundaries', that make me wonder about the best way to store the polygons, we might have to go to a many to one relationship.

I'll keep ya'll posted, and hopefully there will be a cool little demo app soon next year, or even cooler, my mapper on top of Felicia's RODS service.

Cheers!

DRN Test

We ran into some network connectivity problems and configuration issues during the the Geisinger test. We were unable to connect to port 8443 from the NCPHI node. Joel and I worked with Geisinger network support to resolve the issue. Once this was resolved, we began configuring the Simple Transfer Service. Joel was able to download the service, but we ran into some Java exceptions while trying to test the client from the command prompt. Felicia assisted us in resolving the Java exceptions. I sent Joel instructions on setting up the container to run the Transfer service.

We need to schedule another test of the DRN service. I talked to Beth and I left a voice-mail message for Jeff Brown informing him of the testing status.

SaTScan on the Cloud Update

Lately I've been working on the first draft of paper regarding SaTScan on the cloud

and familiarizing myself the the client-side objects of Nimbus that I will instantiate

to programmaticlly create SaTScan resources on the cloud on-demand.

So, things are going pretty well.

Happy Holidays!

Wednesday, December 17, 2008

NHIN Presentation, AMDS

The presentation that would have been presented at the NHIN forum is now on the wiki. Unfortunately, the agenda gods conspired against us. Regardless, it is a good intro to the ideas of the AMDS, and how this is the type of service that enables population health situational awareness.

Monday, December 15, 2008

RODSA-DAI Script for NHIN Demo

In order to support the demonstration of the RODSA-DAI services tomorrow morning at the 5th NHIN Public Forum, John and I wrote a script to explain our PH scenario.

It goes something like this:
Setting:
1. This is all sample data
2. The counts and colors are not statistically meaningful. This is for demonstration purposes only.
Scenario:
Two public health data providers with overlapping catchment areas assist a local epidemiologist with biosurveillance. In this scenario, the systems do not share detailed data electronically due to technical or policy incompatibility.
As a part of his routine work, a county epidemiologist must check to see if there is a cumulative spike in flu activity in the region. He can do this by sending a query for ‘fever’ to a service on each node. This query is based on a draft AMDS that is mapped to each system.
• Step 1 – Launch page (http://ncphi.phgrid.net:8080/rodsadai-web/jsmulticache.jsp). You will see that NCPHI Server is ‘on’. This shows data from the NCPHI research lab node, with the pin points representing counts of cases of fever per ZIP Code. When the user mouses over the pin for ZIP 15227, it is Green, indicating 10 cases.

• Step 2 – Click NCPHI Server ‘off’ and DallasServer ‘on’, and press submit. This map shows data from the other node, again with counts of fever per ZIP code. ZIP 15227 is Yellow, with 14 cases.

• Step 3 – Check both boxes, press submit and you will see the cumulative count. As a result of this query, you will see that ZIP code 15227 is now Red, with 24 cases.

Friday, December 12, 2008

OSU/Emory Meeting

We had some interesting meetings with the Emory and Ohio State grid teams this week. Specifically we were able to sit in on the Emory/OSU collaboration session and to meet with Steve Langella in person and Shannon Hastings over the phone about the activities ongoing and planned for PHGrid.

Steve was especially useful in meeting with Dan, Raja and Joseph about how cagrid's security components work and how they can help with our planning for the AMDS Pilot that is ramping up.

We're going to be updating our architecture diagrams to incorporate Steve's ideas and cagrid's infrastructure components.

Gmap Polygon objectier.

I basically have the test I was running before working. Ironically, there are still about 5 lines of java code in the JSP when there should be more like 2 or 3.

But, this is because I need to go ahead and implement the framework that will provide the polygons rather than coding some into the test JSP. This means the two courses of action are to reformat poicondai to use the new system, or to make a set of forms and controllers that are more generic and allow for the automation of a lot of the map controls.

After some discussions with Brian and Felicia, it appears I will be working on the latter: A set of generic map controls... thus making it easier for anyone else to use the map controls for their own display needs.

I think it will also make it even easier to revamp poicondai and integrate with Felicias RodsHDS.

SaTScan on the Cloud Progress

I just successfully deployed the SatScan grid service to the Nimbus cloud and invoked the SaTscan service on the cloud from my Windows XP notebook. Essentially, I stood up a Grid node on demand on cloud and then ran a SaTSCan job which uploaded files to the cloud and obtained results from the cloud. For now the Grid Security Infrastructure is not being used. Now to move on to programmtically standing up several SaTScan services on demand on the Nimbus cloud.

Have a great weekend.

-Ron

Distributed computing with Linux and Hadoop

Every day people rely on search engines to find specific content in the many terabytes of data that exist on the Internet, but have you ever wondered how this search is actually performed? One approach is Apache's Hadoop, which is a software framework that enables distributed manipulation of vast amounts of data. One application of Hadoop is parallel indexing of Internet Web pages. Hadoop is an Apache project with support from Yahoo!, Google, IBM, and others. This article introduces the Hadoop framework and shows you why it's one of the most important Linux®-based distributed computing frameworks.

Capturing this project here for potential future research. Read full article here.

Wednesday, December 10, 2008

Zicam Cold & Flu Companion Mobile app

Zicam has an app for the G1 (soon to be for the iPhone) that shows syndromes occurring in your zip code. It is available at http://www.zicam.com/tools/mobile.

Pretty interesting and seems similar to the AMDS concept of the National Retail Data Monitor service put out by RODS.

ColorHandler updated, and boy is it huge.

Well, it seems that having a range-conscious coloring object is a bit more difficult than even I was expecting, especially considering that I am allowing the use of nulls to mean "there is no limit, therefore it is infinite" for things like "all cases equal to or less than 0 need to have white shading" and "all cases above 20 need to have red shading".

But, I got it to build, which means I wasn't blowing up something obvious. Tomorrow I will saddle up the test harness for the color processor and run it through it's paces.

Then, it is building a very simple popup handler with a simple template. Here's to hoping it is simple.

Alas, this means it might be until friday before I have another good tester of the GMap Polygon jsp (but hopefully, it will be very small and just sort of go 'create grid map object, draw grid map object'.

Google Flu Trends spreads privacy concern

December 9, 2008 (Computerworld) Google's new Flu Trends tool, which collects and analyzes search queries to predict flu outbreaks around the country, is raising concern with privacy groups.

The Electronic Privacy Information Center filed a Freedom of Information Act request asking federal officials to disclose how much user search data the company has recently transmitted to the Centers for Disease Control and Prevention, or CDC, as part of its Google Flu Trends effort.
Concern stems from what privacy groups claim is a disturbing lack of transparency surrounding the method Google is using to predict flu outbreaks. Google has publicly stated that all the data used is made anonymous and is aggregated, but there has been no independent verification of how search queries are used and transformed into data for Google Flu Trends, said the privacy groups.

"What we are basically saying is that if Google has found a way to ensure that aggregate search data cannot be used to re-identify the people who provided the search information, they should be transparent about that technique," said Marc Rotenberg, Electronic Privacy Information Center's president.

Rotenberg said the issue is important because the same techniques Google is using to predict flu outbreaks could be applied to tracking other serious diseases, such as SARS. "Let's say we have a spike in Detroit of SARS and the police say we want to know who in Detroit submitted those searches. How can Google ensure that this can't be done? The burden is on Google," Rotenberg said.

Have Your Say

Is Google's Flu Trends a privacy threat?
Publicly disclosed in November, Google Flu Trends has been described by the company as a Web tool to help individuals and health care professionals obtain influenza-related activity estimates for all U.S. states, up to two weeks faster than traditional government disease surveillance systems.

Google said in a blog post introducing Flu Trends last month that search queries such as "flu symptoms" tend to be very common during flu season each year. A comparison of the number of such queries with the actual number of people reporting flu-like symptoms shows a very close relationship, it said. As a result, tallying each day's flu-related searches in a particular geography allows the company to estimate how many people have a flu-like illness in that region.
Google also noted that it had shared results from Flu Trends with the epidemiology and prevention branch of the influenza division at the CDC during the last flu season and noticed a strong correlation between its own estimates and the CDC's surveillance data based on actual reported cases. Google said that by making flu estimates available each day, Google Flu Trends could provide epidemiologists with an early-warning system for flu outbreaks.
Rotenberg said the service was potentially useful, but much depended on the kind of search data that Google is collecting and analyzing to make its predictions. Google has said that the database it uses for Flu Trends retains no identity information, IP addresses or any physical user locations. However, what is not clear is whether the company is completely deleting IP addresses, and if so, when it is doing it. Also, he said another issue was whether all Google is doing is anonymizing IP addresses by redacting some of the numbers in an IP string.
Google also claims that as part of its overall privacy policy it anonymizes all IP addresses associated with searches after nine months. Yet in an apparent contradiction, when introducing Flu Trends, Google noted that it uses both current and historic search data -- dating back to 2003 -- to make its predictions, Rotenberg said.

Jeffery Chester, executive director of the Center for Digital Democracy, said Google's growing presence in the health care space also makes it important for the company to disclose what kind of data it is collecting and using for Flu Trends.

"Google sees a potential profit center from targeting its vast user base with advertising that is related to health issues," Chester said. The company's announcement of Flu Trends in fact shows to pharmaceutical and medical markets precisely the kind of sophisticated analysis the company can do with search data to enable highly targeted medical marketing, he said. "This is about taking the tracking data that Google has at its disposal and focusing it on generating a new profit center for the company," Chester said.

Pam Dixon, executive director of the World Privacy Forum, echoed similar concerns and questioned whether the anonymization techniques used by Google provided enough of a guarantee that a search term could not be traced back to specific individuals. She pointed to an incident two years ago where AOL inadvertently posted search information on a public Web site. The search queries had supposedly been anonymized by AOL, but it was still relatively easy to track specific search terms back to IP addresses and even individuals in many cases, Dixon said.
Mike Yang, senior product counsel at Google, downplayed privacy concerns related to Flu Trends and insisted that the tool uses no personally identifiable data.

"Flu Trends uses aggregated data from hundreds of millions of searches over time," Yang said today in an e-mail. "Flu Trends uses aggregations of search query data which contain no information that can identify users personally. We also never reveal how many users are searching for particular queries."

Yang noted that the data used in Flu Trends comes from Google's standard search logs. He also referenced an article in the journal Nature, authored by the Google Flu Trends team, which he said explains the methodology behind the tool.

Amazon Public Data Sets on AWS

Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. An initial list of data sets is already available, and more will be added soon.

Previously, large data sets such as the mapping of the Human Genome and the US Census data required hours or days to locate, download, customize, and analyze. Now, anyone can access these data sets from their Amazon Elastic Compute Cloud (Amazon EC2) instances and start computing on the data within minutes. Users can also leverage the entire AWS ecosystem and easily collaborate with other AWS users. For example, users can produce or use prebuilt server images with tools and applications to analyze the data sets. By hosting this important and useful data with cost-efficient services such as Amazon EC2, AWS hopes to provide researchers across a variety of disciplines and industries with tools to enable more innovation, more quickly.

Click here for further information.

AMDS-RODS service development

I added a draft set of data structures for AMDS requests to the wiki. This led to further defining the initial set of operations for an AMDS compliant web service.

These operations include:
QueryAMDSMetadata
QueryFullAMDS
QuerySpatialAMDS
QueryTimeAMDS
QueryTimeSpatialAMDS

Based on these two pages, I created a sourceforge tracker item that Felicia is working on.

We're working on the RODS version first for a few reasons, primarily Peter and Felicia have familiarity with the RODS database structure and we don't have data structures for any other biosurveillance databases. We'd like to start working on ESSENCE and BioSense data structures to create AMDS-ESSENCE and AMDS-BioSense soon. John is working with partners to plan out how which services we build next.

Tuesday, December 9, 2008

Thank goodness for other coders

So, I have this nasty tendency to get into ruts... and sort of formulate something in my mind and consider it the only really good way to do something because it's formed.

Luckily, I also have this tendency to occasionally ask other people which parts seem like good ideas, and which parts would annoy the snot out of them if they ever came across my code or had to use it.

After getting a prototype of the polygon generator and drawer setup working, I asked Felicia to help me with a bit of a 'does the way I'm doing this seem painful to you' type review, and in discussion both she and Brian helped me piece everything together... including a few extra concepts:

1. Forcing people to write a class all the time rather than just be able to set a few fields with the values they want really sucks. It's kind of a lazy way of implementing something flexible. One can always just make an interface to keep things flexible. If someone wants to do something so radical and complicated they just have to write their own class, then it makes it easy, but it sucks to not just build a default with an easy way to change the most obvious option.

2. JSPs shouldn't have much java code in them. At all. They should invoke the class up top, and then call the method in the right spot where the variable comes in. JSPs should be more about the layout and less about the code. The code should go *ding* up in that class you substantiated up top.

Thus, for tomorrow, I am hoping to complete the color handler (with easily settable shading colors), write and create the popup handler (with easily handle-able sort of string placement options) and then write an example JSP that makes it easy to show how to invoke the class and use a minimum of java in a simple init function.

I'm sure this isn't the first iteration, and a lot of stuff might change later.

But, I think it will be better than it was.

Saturday, December 6, 2008

OGSA-DAI 3.1 Released

Dec. 5 -- The OGSA-DAI project, a partner in OMII-UK, have released version 3.1 of their database access and integration software.

OGSA-DAI is an extensible framework for the access and management of distributed heterogeneous data resources -- whether these be databases, files or other types of data -- via Web services. OGSA-DAI provides a workflow engine for the execution of workflows implementing data access, update, transformation, federation and delivery scenarios.

The main features of OGSA-DAI 3.1 include:

A number of new OGSA-DAI activities for:

Advanced SQL joins and and merging of relational data.
Dynamic OGSA-DAI data resource creation.
Running queries on remote OGSA-DAI servers.
Interacting with XML databases, including adding, listing, removing and getting XML documents, creating, listing and removing collections and running XPath, XQuery and XUpdate statements.
Splitting and combining lists (contributed by the ADMIRE project).
Retrieving physical schema for relational databases (contributed by the NextGrid project).

A document-based workflow client.
A data source servlet and data source client.
Prototype support for pluggable workflow transformation components.
Prototype support for configurable inter-activity pipes.
Resources can now be marked as "private" meaning they are hidden from clients and can only be used within sub-workflows.
An example workflow monitoring plug-in which records events which can be browsed via a JSP page.
Support for MDS registration in Globus-compliant versions.
A number of bugs have been fixed, components made more efficient, usable or robust.
The user doc has been extensively refactored and extended.

OGSA-DAI 3.1 is designed to be backwards compatible with OGSA-DAI 3.0 without the need for recompilation -- data resource, activity and presentation layer APIs and service WSDLs remain the same.

OGSA-DAI is a free, open source, 100% Java product and is now released under the Apache 2.0 licence. Downloads compatible with Apache Axis 1.2.1, Apache Axis 1.4, Globus 4.0.5, Globus 4.0.8, OMII 3.4.2 and, now, Globus 4.2, are available

Friday, December 5, 2008

Track Flu Trends on Google Phone

Owners of a T-Mobile G1, also known as the "Google phone," can now download a program that tracks flu outbreaks by zip code. The makers of the flu remedy Zicam created the program and got their information from polling health care providers and pharmacies. A version for the iPhone is expected to be available later this month.

Click here to listen to the NPR article

Thursday, December 4, 2008

Design Patterns, I choose you.

Still migrating and refactoring the code that I already wrote for poicondai into the new Gmaps-Polygon code, I know, I know, I wanted Daigon too, but really, there isn't any DAI to this peice, just maps and polygons.

Which brings us to our next piece: trying to make sure that gmap-polygon is both elegant and relatively simple. Poicondai had a lot of duplicated code (not ideal) a lot of java inside the JSP pages (also not ideal) and a lot of commented out code and stuff that isn't necessarily called (once again, not ideal).

Thus, I am trying to make things pretty (relatively speaking) with lots of interfaces (so that if the way something is handled needs to be made radically different, you don't have to modify the original working code and can just use a different interface.) I am also catching myself making classes that are nearly identical and going "waitaminute, this can be done with one class and a smaller class that handles the differences". The end result is I am moving from things like state polygons and county polygons and zip polygons (with lots of duplicated code) into one "region polygon" that happens to have different handlers (thus, if you need a polygon from a zipcode, you create a regionpolygon with a zipcodehandler.. that way the polygon code isn't duplicated... and it becomes pretty clear and simplistic how to put everything together (to get a zip3 polygon, just use a zip3handler...)

I'm kind of excited about the code this will generate. I think what I turn out will be easily adaptable into a lot of these "I want a colored polygon for google maps javascript control" applications we (and maybe even other people/groups) will be creating.

Tuesday, December 2, 2008

That popping is my paradigms shifting without a clutch.

So, for the past couple of days I have been cleaning up the rodsadai projects in anticipation of giving rodsadai polygons.

Then, we all got together and decided that the next step should actually be a generalized set of code that can return polygons given some sort collection of spatial and time series.

This lead to more discussion about how such a suite of code would behave, what pieces should go where, and a lot of "oh dear, I don't know how google maps would be able to tell the page outside of google maps what google maps is doing." The standard batch of logistics, heuristics, and metaphysics that goes into any good refactor.

Thus, all my paradigms have shifted... and I am very appreciative that Felicia will be here to help me with some of the AJAX/JavaScript and Struts/Spring stuff that I am not too familiar with.

Thus, look forward to lots of little posts like "I got the page to reload without reloading when you zoom to a certain level in google maps!" and other such stuff.

I think I'll call this latest suite of code daigon.

Another Demo

There's another demo scheduled on Wednesday, Dec 17th for Dr. Lenert to show the NCPHI Grid Lab to some of the COTPER leadership.

Wednesday, November 26, 2008

Poicondai-web has polygons, and you can install it.

Hello everyone.

Poicondai-web now has zipcode polygons. You can see them here

Also, I have updated the poicondai-web service registry page with some more information about how to download and install the poicondai-web, poicondai-util, poicondai-loader, and NPDS-WS-Client. That is here

Next up is putting those polygons into Rodsadai... but that will be after the holidays.

Have a happy Thanksgiving everyone!

Thursday, November 20, 2008

I have polygons, but not all the zips that might be sought out.

I have polygons, and have shown that I can get all the polygons possible showing up in Colorado.... but there is a problem...

The list of zipcodes I have and the list of polygons for the zipcodes I have show some discrepancies... And it all stems from the fact that zipcodes can change. Thus, there are several areas that are blank in my "map of Colorado" because they have to do with zipcodes that might have split recently or were otherwise not in the Geolocation data I have been given.

That doesn't mean that the NPDS doesn't have a few results for them.

Thus, I am in a bit of a quandary. I guess the best thing I can do at this point is have a little table at the bottom of the map that says "zipcode ##### was not in the polygon database." Because even if we changed the polygons to fit with old zipcodes it means that there would have to be polygon overlaps and it would get very confusing.

Otherwise, I imagine poicondai with zipcode capability (at least the ones we have polygons-for) will be ready for testing sometime tomorrow.

Grid Enabling Existing/Legacy Applications With gRAVI

I recently wrapped SatSCan in a grid service using gRAVI and gRAVI treated me

well. gRAVI can be downloaded as an Introduce plugin and it is designed to wrap

a grid service around an executable. Your job can then be treated as GRAM job

which is great because the status of the job is then represented via GRAM (staging, running, finished, ...). Also, by default gRAVI stages your files in for you and transfers your results back to you via byte array. I think in gRAVI 1.4 will support the following transfer mechanisms: gridFTP, byte array, caGrid Transfer.

If you need to grid enable an existing/legacy application I highly recommend gRAVI. It will save you time.

Anyway, I just started my second iteration on grid enabling SatSCan and I have some work to do on the client plus the Cloud is on the horizon for this service. I better get back work:)

Here is a link to gRAVI: http://dev.globus.org/wiki/Incubator/gRAVI

World Wide Grid

All of that dark fiber and computing power has to be used for more than just YouTube videos. The EU has invested 2.5m Euros into a project that will make worldwide Grid computing more accessible. http://cordis.europa.eu/ictresults/index.cfm/section/news/tpl/article/BrowsingType/Features/ID/90191

Wednesday, November 19, 2008

Zipcode Polygons are working in testing.

So I have zipcode polygons enabled.... I also have modified poicondai-web to use maven filters.

Note to self: the test resources are different objects in the pom.xml than regular resources.. thus, if you need to enable filtering on the test resources in a maven2 project, you will need to create a section for it...

Tomorrow, I will modify the main pages to get zipcodes encoding and popping up search lists... and then I think I will be examining rearchitecting poicondai-web to have a much simpler structure with a class returning all the polygon javascript instead of doing it in the JSP.

Then, it'll be polygons for Rodsadai.

Tuesday, November 18, 2008

AMDS Sample Structure

So Jeremy and I have been tossing some ideas back about an initial draft for the AMDS data structures. Based on discussions led by Tom Savel on what fields should be included in the AMDS, we're going to start testing development using a basic AMDS that includes:

Date

Patient Zip3

Syndrome

Syndrome Classifier (i.e. which classifier was used to assess the syndrome)(e.g. BioSense, EARS, RODS, ESSENCE, etc.)

Count

Denominator (i.e.total count of all syndromes for that zip3 on that date)

Felicia is starting to think about converting the RODS-HDS service (developed to meet this feature request) to RODS-AMDS to meet the draft xml structures. These xml structures are very lightweight and we will modify them as the AMDS data structure undergoes changes based on scientific comment.

The next step is to add the sample xml and schemas to the wiki so we can start getting comments in. Based on comments, we will start to plan services that provide BioSense VA and DoD sample data using the AMDS structure so that it can be combined with RODS data (and eventually EARS, ESSENCE, other systems).

Updated DRN Design drafts

I updated the DRN Design Drafts wiki page with a data flow diagram to show the flow of detailed data --> aggregate data --> combined aggregate data based on some feedback from Roy Pardee, Ross Lazarus and Jeff Brown.

Zipcodes polygons in the database

So, I have updated the poicondai-loader project and the poicondai-util project and now we have zipcode polygons in the polygon database.

Also, the deploy of tomcat to the staging node went swimmingly. Dan switched the tomcat server over to port 8443, and rodsadai and poicondai worked like nothing had changed.

So, tomorrow is starting to do some testing with the zipcode polygons, and then updating the map application to have zipcode polygons in addition to the county polygons.

Another Distributed Aggregated Query Project (SHRINE)

The Harvard Catalyst's Informatics Program has developed technology in lock step with regulatory and ethical requirements to allow authorized investigators to acquire robust sample sizes across all Harvard-affiliated healthcare institutions. We call this querying system SHRINE (Shared Health Research Information Network). As shown in the diagrams below, there is no central database but rather the SHRINE queries are distributed across each of the participating institutional databases. In this way, each institution maintains autonomy, control, and monitoring of all transactions on behalf of its patients.

Consumer Health Informatics and the Grid?

I ran across this article this morning. It's about a new program using Google Health in association with Medicare programs in Arizona and Utah. Conceptually, given the Medicare bent, this may be rich information for chronic disease interventions and surveillance. In terms of services, I could see decision support / alerts. Are there others?

http://www.azcentral.com/business/articles/2008/11/13/20081113biz-medicare1113.html
(edited 2008.11.18 by BAL to add link)

Monday, November 17, 2008

Prepping for new rodsadai

So, today I spent some time prepping for the poicondai loader load of a lot of zip code polygons into a database.

But a lions share of the day was spent setting up the shiny new secure tomcat installation on the staging node. Also known as "boy, I love being able to configure different ports!"

So, the window for taking down the node and updating RODSAdai to call secure tomcat is tomorrow... but that didn't stop us from making sure we couldn't set up secure globus and make some ogsadai calls to it. That took a little bit of time mainly because the tarball I set up apparently broke or was not happy in it's new environment, so we had to build a new one from scratch.

Many thanks to Felicia and Dan for pretty much doing most of the footwork before me so all I had to do was go "get this, put that there, lemme check... yay!" Dan was awesome with the configuration and Felicia knew most of this stuff from having to deal with it before a couple of times.

Tomorrow is loading polygons, updating build styles, and starting the poicondai-web modifications.

DRN Design drafts

I've put together some design sketches to show the planned deployment and sequence of events for the DRN SAS automation scripts that Dan has begun writing.

http://sites.google.com/site/phgrid/Distributed-Research-Network/drn-sas-grid-design

Basically we'll have three components:

ClientScript.pl - transfers files to Globus nodes and aggregates the output documents into a single TSV report (to be developed)

Secure Simple Transfer Service - runs on the Globus nodes and allows for listing, getting and putting of sas programs and output files (already completed)

SASProcessorScript.pl - configures and runs sas programs against databases (to be developed)

Please let me know any comments you may have as Dan is beginning to develop the scripts. The source code will be stored in http://phgrid.svn.sourceforge.net/viewvc/phgrid/drn/

(here's the visio file should anyone want to make direct comments and changes.)

Apache Tomcat Update

Tomcat has been updated on the NCPHI node. We are currently running version 5.5.27. Globus has been deployed to Tomcat on port 9443.

Friday, November 14, 2008

Demos are pretty

Sorry for the lack of updates on my part. I was spending most of the week improving the already nifty Poicondai demo to have variable y-axises and start grouping by week if queries over 180 days (about 6 months) are chosen.

You can check it out by going to http://ncphi.phgrid.net:8080/poicondai-web/poicondaiMap.jsp

The other thing I did was get ogsadai running on secure-tomcat and verified that rodsadai could connect to it. Many many many thanks to Felicia for helping me with that yesterday. She had already found out all the crazy things that had to be done to get tomcat working and was able to help me get a similar setup working in about a half hour.

Next up is zipcodes. I will need to convert the zipcodes into a geolocational database for poicondai, and then I will be updating poicondai-web to select zipcodes, updating rodsadai-web to use zipcode polygons (which I hope will turn out pretty cool), all while hopefully implementing some new filtering so I can pull all the different configuration options into one file.

Wednesday, November 12, 2008

Update: Globus on Windows

Corrected the following error by setting these variables:

set X509_USER_CERT=C:\Documents and Settings\bubba-gump\.globus\usercert.pem
set X509_USER_KEY=C:\Documents and Settings\bubba-gump\.globus\userkey.pem
set X509_CA_CERT=1234abcd.0
set X509_CERT_DIR=C:\etc\grid-security\certificates

The corrected error:

C:\gt4\bin>grid-proxy-init
Your identity: O=Grid,CN=bubba-gump
Enter GRID pass phrase for this identity:
Creating proxy, please wait...
Proxy verify failed: Unable to load CA ceritificates

New Error Message:

C:\gt4\bin>grid-proxy-init
Your identity: O=Grid,CN=bubba-gump
Enter GRID pass phrase for this identity:
Creating proxy, please wait...
Proxy verify failed: "/O=Grid/CN=bubba-gump" violates the signing policy defined for CA "/O=xxx/OU=zzz/OU=szzzz/CN=xxxx
Simple CA" in file "C:\etc\grid-security\certificates\1234abcd.signing_policy"
C:\gt4\bin>

Next step:
Create a new certificate request with the correct subject line. This should fix the security issue.

New Book about Scientific Collaboration on the Internet (Ian Foster Contributing)

Scientific Collaboration on the Internet

I'm looking forward to receiving my copy of Scientific Collaboration on the Internet. I have an article in it on lessons learned from the NEESgrid project (an earlier version is here, I think it's a good read, especially between the lines), but the other articles are probably far more interesting:

The Contemporary Collaboratory Vision

E-Science, Cyberinfrastructure, and Scholarly Communication -- Tony Hey and Anne Trefethen
Cyberscience: The Age of Digitized Collaboration? -- Michael Nentwich

Perspectives on Distributed, Collaborative Science

From Shared Databases to Communities of Practice: A Taxonomy of Collaboratories -- Nathan Bos, Ann Zimmerman, Judith S. Olson, Jude Yew, Jason Yerkie, Erik Dahl, Daniel Cooney and Gary M. Olson
A Theory of Remote Scientific Collaboration -- Judith S. Olson, Eric C. Hofer, Nathan Bos, Ann Zimmerman, Gary M. Olson, Daniel Cooney and Ixchel Faniel
Collaborative Research across Disciplinary and Organizational Boundaries -- Jonathon N. Cummings and Sara Kiesler

Physical Sciences

A National User Facility That Fits on Your Desk: The Evolution of Collaboratories at the Pacific Northwest National Laboratory -- James D. Myers
The National Virtual Observatory -- Mark S. Ackerman, Eric C. Hofer and Robert J. Hanisch
High-Energy Physics: The Large Hadron Collider Collaborations -- Eric C. Hofer, Shawn McKee, Jeremy P. Birnholtz and Paul Avery
The Upper Atmospheric Research Collaboratory and the Space Physics and Aeronomy Research Collaboratory -- Gary M. Olson and Timothy L. Killeen; Assisted by Thomas A. Finholt
Evaluation of a Scientific Collaboratory System: Investigating Utility before Deployment -- Diane H. Sonnenwald, Mary C. Whitton and Kelly L. Maglaughlin

Biological and Health Sciences

The National Institute of General Medical Sciences Glue Grant Program -- Michael E. Rogers and James Onken
The Biomedical Informatics Research Network -- Judith S. Olson, Mark Ellisman, Mark James, Jeffrey S. Grethe and Mary Puetz
Three Distributed Biomedical Research Centers -- Stephanie D. Teasley, Titus Schleyer, Libby Hemphill and Eric Cook
Motivation to Contribute to Collaboratories: A Public Goods Approach -- Nathan Bos

Earth and Environmental Sciences

Ecology Transformed: The National Center for Ecological Analysis and Synthesis and the Changing Patterns of Ecological Research -- Edward J. Hackett, John N. Parker, David Conz, Diana Rhoten and Andrew Parker
The Evolution of Collaboration in Ecology: Lessons from the U.S. Long-Term Ecological Research Program -- William K. Michener and Robert B. Waide
Organizing for Multidisciplinary Collaboration: The Case of the Geosciences Network -- David Ribes and Geoffrey C. Bowker
NEESgrid: Lessons Learned for Future Cyberinfrastructure Development -- B. F. Spencer, Jr., Randal Butler, Kathleen Ricker, Doru Marcusiu, Thomas A. Finholt, Ian Foster, Carl Kesselman and Jeremy P. Birnholtz

The Developing World

International AIDS Research Collaboratories: The HIV Pathogenesis Program -- Matthew Bietz, Marsha Naidoo and Gary M. Olson
How Collaboratories Affect Scientists from Developing Countries -- Airong Luo and Judith S. Olson

Conclusion

Final Thoughts: Is There a Science of Collaboratories? -- Nathan Bos, Gary M. Olson and Ann Zimmerman

PopSciGrid

Tom showed me the PopSciGrid by Science of Networks in Communities (SONIC) that he learned about at AMIA.

It's interesting as it is combining multiple data sets over a grid, but also has a rather useful user interface that we might be able to co-opt for the PHGrid UI.

Thoughts from AMIA

Dialing in from AMIA, I thought it'd be important to capture random thoughts and specific feedback from the conference.

1. The AMIA community seems to have embraced the public health community with great enthusiasm. All of the sessions were well attended by members from across all sectors -- clinical informatics, vendors, academia, international stakeholders, and consultants.

2. The PH Research Grid session was a very strong session, with lots of great interaction from the audience. With the COE's sharing their experiences and findings, they were able to give their honest appraisal of the grid approach, and while there are many, many kinks to work out, it seems there is strong agreement of the movement toward standard services is the way to go. Now if we get real crazy, stringing together a couple of services to show this would be a great next step.

3. Dr. Lenert was able to successfully demonstrate PH-DGINet, which helped the audience of his session appreciate some of the simple use cases we are aiming to satisfy (i.e. summary counts).

4. In the longer term, there is some potential to collaborate on the Clinical Decision Support work being led by Nedra Garrett. First, in using the NCPHI lab infrastructure, second, in publishing some alerting services to interface with clinical EMR vendors, HIE's, and other agencies. Still just an idea at this point, but something to explore.

5. Two pragmatic questions continue to raise themselves. Specifically:

What value to state and locals derive from 'the grid'?
Is syndromic surveillance valuable enough to foster adoption?

Thoughts?

Tuesday, November 11, 2008

Tracking Flu trends - Google

From today's official Google Blog!
From today's front page of the New York Times!
Or go right to their tool here!

Tracking flu trends

11/11/2008 12:51:00 PM

Like many Googlers, we're fascinated by trends in online search queries. Whether you're interested in U.S. elections, today's hot trends, or each year's Zeitgeist, patterns in Google search queries can be very informative. Last year, a small team of software engineers began to explore if we could go beyond simple trends and accurately model real-world phenomena using patterns in search queries. After meeting with the public health gurus on Google.org's Predict and Prevent team, we decided to focus on outbreaks of infectious disease, which are responsible for millions of deaths around the world each year. You've probably heard of one such disease: influenza, commonly known as "the flu," which is responsible for up to 500,000 deaths worldwide each year. If you or your kids have ever caught the flu, you know just how awful it can be.
(more on their site)

Monday, November 10, 2008

Globus on Windows

I'm able to run the Java WSCore container with no security, but I get an error when I try to start the container with a certificate. The next step will be to recreate the Linux based certificate directory structure on the Windows machine and troubleshoot from there. I will also try to load the ca-setup file from the internal grid machine.

FAILURE:
C:\gt4\bin>grid-proxy-init -debugFiles used: proxy : C:\DOCUME~1\bubba-gump\LOCALS~1\Temp\x509up_u_bubba-gump user key : C:\Documents and Settings\bubba-gump\.globus\userkey.pem user cert : C:\Documents and Settings\bubba-gump\.globus\usercert.pemYour identity: xxx xxx xxx Enter GRID pass phrase for this identity:Using 512 bits for private keyCreating proxy, please wait...Proxy verify failed: Unable to load CA ceritificatesjava.lang.Exception: Unable to load CA ceritificates at org.globus.tools.ProxyInit.verify(ProxyInit.java:131) at org.globus.tools.DefaultProxyInit.verify(ProxyInit.java:595) at org.globus.tools.ProxyInit.createProxy(ProxyInit.java:225) at org.globus.tools.ProxyInit.main(ProxyInit.java:530) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at org.globus.bootstrap.BootstrapBase.launch(BootstrapBase.java:114) at org.globus.bootstrap.Bootstrap.main(Bootstrap.java:41)C:\gt4\bin>globus-start-container[JWSCORE-114] Failed to start container: [JWSCORE-200] Container failed to initialize [Caused by: [JWSSEC-248] Secure container requires valid credentials. No container descriptor file configured and default proxy not found. Run grid-proxy-init to create default proxy credential.]

SUCCESS:
C:\gt4\bin>globus-start-container -nosecStarting SOAP server at http://127.0.0.1:8080/wsrf/services/
With the following services:
[1]: http://127.0.0.1:8080/wsrf/services/AdminService[2]: http://127.0.0.1:8080/wsrf/services/AttachmentTestService[3]: http://127.0.0.1:8080/wsrf/services/AuthzCalloutTestService[4]: http://127.0.0.1:8080/wsrf/services/ContainerRegistryEntryService[5]: http://127.0.0.1:8080/wsrf/services/ContainerRegistryService[6]: http://127.0.0.1:8080/wsrf/services/CounterService[7]: http://127.0.0.1:8080/wsrf/services/DeployService[8]: http://127.0.0.1:8080/wsrf/services/JWSCoreVersion[9]: http://127.0.0.1:8080/wsrf/services/LoggingService[10]: http://127.0.0.1:8080/wsrf/services/ManagementService[11]: http://127.0.0.1:8080/wsrf/services/NotificationConsumerFactoryService[12]: http://127.0.0.1:8080/wsrf/services/NotificationConsumerService[13]: http://127.0.0.1:8080/wsrf/services/NotificationTestService[14]: http://127.0.0.1:8080/wsrf/services/PersistenceTestSubscriptionManager[15]: http://127.0.0.1:8080/wsrf/services/QueryTestService[16]: http://127.0.0.1:8080/wsrf/services/RPParamTestService[17]: http://127.0.0.1:8080/wsrf/services/SAMLAuthzTestService[18]: http://127.0.0.1:8080/wsrf/services/SampleAuthzService[19]: http://127.0.0.1:8080/wsrf/services/SecureCounterService[20]: http://127.0.0.1:8080/wsrf/services/SecurityTestService[21]: http://127.0.0.1:8080/wsrf/services/ShutdownService[22]: http://127.0.0.1:8080/wsrf/services/SubscriptionManagerService[23]: http://127.0.0.1:8080/wsrf/services/TestAuthzService[24]: http://127.0.0.1:8080/wsrf/services/TestEnumService[25]: http://127.0.0.1:8080/wsrf/services/TestLoginModuleService[26]: http://127.0.0.1:8080/wsrf/services/TestParamAuthzService[27]: http://127.0.0.1:8080/wsrf/services/TestRPCService[28]: http://127.0.0.1:8080/wsrf/services/TestService[29]: http://127.0.0.1:8080/wsrf/services/TestServiceRequest[30]: http://127.0.0.1:8080/wsrf/services/TestServiceWrongWSDL[31]: http://127.0.0.1:8080/wsrf/services/Version[32]: http://127.0.0.1:8080/wsrf/services/WidgetNotificationService[33]: http://127.0.0.1:8080/wsrf/services/WidgetService[34]: http://127.0.0.1:8080/wsrf/services/gsi/AuthenticationService

caGrid / TeraGrid Security & Interoperability Concerns

>> From what I understand so far caGrid 1.2 uses Globus 4.0.3 Java WS-CORE and Globus in caGrid 1.2 is as up-to-date as possible, but not necessarily hardened. Is this right?

Yes. Frankly Iâ€™d be surprised to learn that TeraGrid is running modified Globus code that has not been contributed back given the significant overlap in personnel on those projects. However, if youâ€™d like to follow up with them on specifics weâ€™d be happy to work with you to assess any applicability to caGrid. Iâ€™m sure what they were referring to was something like the GSI OpenSSH libraries which TeraGrid uses to allow Globus credentials to be used via ssh. As Iâ€™m sure you are aware, the ubiquity and power of ssh makes it a prime candidate for potential attack and there is a large active community analyzing and addressing any such vulnerabilities. It is important for an infrastructure like TeraGrid to stay up to date with any such ssh patches, and those trickle down to the Globus libraries which use them. As stated before, we use no such libraries as we only use SSL for securing the communication channel of web service calls. While obviously this is still critically important, its scope and therefore potential for exploit is significantly less (e.g. you canâ€™t run arbitrary commands on the remote machine). As Steve mentioned, we monitor the Globus releases and community security advisories to ensure our infrastructure is not vulnerable.

>> It seems that caGrid 1.2 is installed at NCI, so it has meet the federal guidlines that are required to have it installed at a place like NCI, right?

Yes, that is correct. Before we deploy the grid we have to go through a series of vulnerability scans.

Some caGrid Considerations on Globus Hardening

Globus is a large toolkit, caGrid is a service oriented architecture and leverages the ws-core component of Globus. The TeraGrid infrastructure is different in that it mostly leverages other features of Globus such as GRAM and GridFTP. Thus each projects opinion on whether or not Globus is hardened is going to be closely tied to their experience with the components of the toolkit they use. Currently caGrid is using Globus 4.0.3, however many of our services will operate with Globus 4.0.X. The distribution of Globus that we link to on cagrid.org contains additional features/enhancements which have been added by working closely with the Globus team. The caGrid team monitors the Globus project closely to make sure any critical bug fixes are addressed as appropriate. With each Globus release, we look at and evaluate new features, some of which have been incorporated into the 4.0.3 distribution we provide. The main difference between the latest release of Globus and the version we are using is specification changes. Adopting the specification changes in caGrid would cause services developed on the earlier specifications to NOT interoperate with services developed on the newer specification. The specification differences are minor and this point not worth breaking interoperability between services. We do plan on adopting these specification changes with caGrid 2.0 but are waiting for the Globus folks to upgrade there web services environment, which should significantly improve performance. We would like to combine the specification upgrades and the web service environment upgrade into one release so that our users only need to upgrade their services once. Before answering your question I wanted to give you some insight on why caGrid uses the version of Globus that it does, however to answer your question, caGrid does not use a hardened version of Globus.

Friday, November 7, 2008

Slides From U of Utah CoE

Today GRID can do many exciting things that could not be done before. Also, the GRID is in the process of having security evaluated for PH purposes. Lastly, virtualising your data on the GRID does NOT mean you loose control of your data as the the slides
here illustrate via a demo of instantaneous authorization revocation. The slides also go over the benefits of grid today and where security is today.

Minimum Node Hardware Specs

We've been getting some chatter about what hardware is required to run Globus and PH-DGInet nodes (combined as a PHGrid node). The idea is that the node has very little important data and really just runs simple web services, it can run on commodity hardware.

So the initial spec we're working with is something like:
2GHz processor
1GB RAM (Linux) / 1.5GB RAM (Windows)
8GB HardDisk storage for Globus / 10GB HardDisk storage for PH-DGInet (mainly the geospatial databases)

So the spec looks like a top of the line server from the year 2000. Nowadays this should be something like a mac mini ($599) or something more respectable like a Dell PowerEdge 1U server (about $700). Note that both of these blow away the processor specs (because they are dual core and quad core) and hard disk specs.

Security Policy Document - New Draft

Raja and Joseph put out a new draft of the PHGrid Security Policy Document.

It's available on the wiki for your review.

Another Demo

February 24/25, 2009. AAPCC Mid-Year Director's Meeting in Albuquerque, NM.

edit: BAL- specific dates and name of meeting

Thursday, November 6, 2008

Upcoming Demos

There are a few demos scheduled for PHGrid:
Today, we'll be showing off PH-DGInet and the Poison Control WS demo application to Dr. Alvin Bronstein.
Next week (Tuesday, Nov 11) Drs Lenert and Savel will present PHGrid at the AMIA Conference.
Dr. Savel might present RODSA-DAI (as a potential NHIN Domain Service) at the NHIN Public Forum on Dec 15/16.

Wednesday, November 5, 2008

Project Mgmt Update

A lot of moving parts, so at the suggestion of the team, thought it'd be good to note the important happenings:

1. A PH Grid Charter draft has been produced and is currently in review by program staff.

2. A draft PH Grid Project schedule is being drawn up now, with the first draft nailed down by Friday COB.

3. We are preparing for a Poison Control visit tomorrow (nice work on the demo, Peter), where we will have a tour of the lab, and discuss two major topics. A) future enhancements to the Poi Con web service and B) Poi Con web service security requirements

4. We are working with ESRI, the NCPHI lab, and South Carolina to get the PH-DGINet nodes, services, and demos working as they were post-PHIN. Some changes to all the environments have led to poor communication handoffs on my end.

5. We have received feedback on the proposed AMDS, and we will be working internally by the end of the week to discuss next steps.

6. Tom Savel and I have discussed how best to recruit and provide nodes (be them DGINet or Globus) to state and local health agencies. Right now, a multi-prong approach including possible regional collaborative coordination, HIE recruiting, GIS community recruiting, and program priorities is recommended. If you have any ideas or any interest in having a node (especially if you are a public health department or some of your best friends are public health departments), please let me know.

7. A draft PH Grid security policies document and conceptual architecture has been produced. Please let us know if you're interested in seeing either.

8. Next week: AMIA demos

So, Poicondai is out there and pretty.

I have been making several modifications to the Poicondai-Web demonstration.

You could go to the service registry of the PH-Grid wiki to find the link, but I am going to go ahead and post it here:

http://ncphi.phgrid.net:8080/poicondai-web/

Please go ahead and poke it, and especially the poicondaiMap.jsp page. Any searches you do will help build the cache and improve the responsiveness.

Also, please email me any bugs you may find, so that I might start thwacking accordingly.

Cheers!

Windows Core Update

The system requirements document has been completed.

I am currently focusing my attention on the installation and configuration of Globus 4.2.1 Java WS core on the Windows Dev node. At first glance, many of the files used within the C core are present in the WS Core.

The internal grid certificate configurations have been moved to the Windows server.

I had to uninstall the Java 5 EE SDK because it did not seem to function with the WS core install. I will try version 1.4.2.
Hash file on NCPHI was rebuilt due to corrupted file.

Friday, October 31, 2008

Architecture Diagrams for AMDSS

As part of the project planning that John Stinn is doing, I'm starting to draft up some architecture diagrams for how the AMDS services (AMDSS) will be built, accessed and deployed. The first in this series is a deployment diagram for CDC facing and Partner facing components. This is all fairly vanilla UML 2.x diagram notation-spec, wikipedia in case anyone is interested.

Please let me know your comments.

Thursday, October 30, 2008

We have polygons

We have polygons, and they are colored depending on the amount of calls within.

Tomorrow, I work on making popups depending on which polygon was clicked.

Wednesday, October 29, 2008

Oh those wacky geolocations.

So, yesterday I spent some time working up a nifty little display to make sure all the counties were showing up the way I needed them to.

I found out that most (about 45) of the counties in Colorado, actually showed up in Colorado... and some of them were showing up in other states.

Because I made a very naive assumption that County Names did not repeat in the US. And, like that assumption that birthdays don't cluster, it came as a nasty surprise. Thankfully, because I made sure to maintain the poicondai-loader project the way I did, it only took me the better part of an evening to change the loading scheme so that it supported county, state, and geolocation data.

The rest of today has been spent setting up some utility classes for pulling what I am calling PoiConCounts. I spent a lot of thought-chewing time figuring out the granularity of the objects for optimal caching and pulling (and many thanks to Brian for busting me out of the loop I was finding myself in) and just settled on "the thing that is going to be displayed". I have made this flexible, in case the way things need to be displayed or stored ever changes.

Tomorrow I will be trying to get that first county yank into a map with appropriate shading. If that goes well I will probably be ambivalent about whether to try and get the histogram-popup working or move on to zipcodes. But let's not fillet those fish yet...

Monday, October 27, 2008

Maps and Polygons and JDBC, oh my.

Today has been an odd day of sorts, but I got a lot done.

First, the big thing... I got the connectivity to the database working and drawing polygons. Now the next big thing is going to be having a poicondai-web page that sorts everything out by counties and tags the resulting polygons with pop-up counts.

Then, the next big thing is going to be implementing modular pop-ups that instead of just showing counts, show a Google-chart for the selected area with a histogram and what-not. But that might get over-ridden with the encoding of zip-codes.

Otherwise, I am in danger of hitting that ambivalence trap. I spent some time sprucing up the old poicondai chart-based application, and then found out it would have been ostensibly better to get the new map application working... but at the same time the sprucing-up I did is going to probably be pulled into the new poicondai-maps application... and there are lots of better ways I could have set up the database pulling, and I might just replace the whole JDBC framework with Hibernate... and there are lots of different things I could do and I seem to be focusing on the most visually nifty and code-based annoying.

Also, I have lots of little doubts. Despite many days of concept proofing I keep worrying about some functionality not being there and throwing me back to the drawing board (what if I cannot assign click handlers to polygons? Stuff like that).

But like most things... I find that the harder I work, the more luck I seem to have... and it's not like the loops and caches I would build aren't going to be useful if I have to move from a polygon-based map to a tile-based map.

So wish me luck... now I am just trying to get things on map attached to the numbers coming back from NPDS.

Friday, October 24, 2008

VMWare Appliance Now available online

Dan made a VMWare Appliance of the PHGrid Globus node and it's available at http://phgrid.net/documents/node/v1/.

If you download both of the files here, you can run a SUSE installation of the Globus Toolkit. This will make configuration easier as all you have to do is make a few configuration changes(set the IP, hostname, user accounts, request a user and host cert, etc).

This will reduce the install time down even lower (5 minutes) and allow PHGrid nodes to run on Windows (through the free VMWare player).

This first release is still pretty large (4.5GB download, 8GB necessary to run) so it may take you a while to download (took me over 30 minutes to upload).

The data, it is in the database.

So today, I managed to get the polygons for the ~3500 counties into a database.

I used maven (and found some places where maven was being very odd). I used SAX (and found some places where SAX was being very odd). I used eclipse (and.. well, you get the idea).

It was one of those weird coding days... where I was pressed on only by the faith I had that I could get it working and that my brain had set it all up in my mind. And thankfully that seems to be the case. I was going to try and just brute force REGEX/parse the XML string into the objects I needed without using any sort of XML parser (because I wasn't that familiar with SAX/DOM/Etc) but I found out that I couldn't really think of a way to do it... and that is a good thing because I'm sure whatever I would have turned out would have been a candidate for some of those "LOL at this horrible code" type websites.

Instead, I have a little SAX parser that is elegant, fast, and coupled to a little prepared-statement inserter that is also elegant and fast. And I should be able to use it relatively quickly for when I need to load zipcodes.

Now, the next big step ahead of me is to finalize the "pull these counties and then draw these counties" map for poicondai. Then I also have to do the same thing for zip codes. And then make everything look better.

Here's to demonstrations.

Aggregate MDS Service planning

Tom and John presented the FY09 plan to the NCPHI governance council last week. A rather decent size block of work that the NCPHI R&D lab is planning is the development of a set of summary data services and associated coordination services.

These services will be an extension of the RODSA-DAI work and will provide access to various biosurveillance systems hosted by partners out in the states. Each service will return a set of aggregate syndrome counts that will map to a new common data structure that for now we are calling the "Aggregate Minimum Data Set". The AMDS will use the AHIC/HITSP MDS and select the minimum useful fields for aggregate reporting. The work on developing a scientifically vetted AMDS is in progress and so far is involving the Centers of Excellence and CDC BioSense personnel.

So the idea is that there will be multiple implementations of the AMDS services for each participating biosurveillance system and then a set of coordination services that know how to run a federated query across the participating nodes and combine together the results from the query. This is basically what the PHDGInet and RODSA-DAIWeb demos shows as a proof of principle, but 2009 will bring this to a proper pilot by developing services for actual installed biosurveillance systems running at partner sites.

Here's a list of the initial set of services that will support the pilot:

AMDSX-DAI (Where X is one service for each participating system, in 2009 probably 5 different implementations planned for RODSA, ESSENCE, BioSense and specific state systems)

AMDSCoordinator.RunFederatedQuery

AMDSCoordinator.QueryAvailableCoverageArea

Thursday, October 23, 2008

poicondai, and ESRI GIS presentation.

Poicondai is coming along nicely. It's not as far along as I would like, but then again there were a lot of things I discovered that needed to be done that I did not anticipate (building a mini-custom GIS database for the sake of drawing polygons with simple Google maps). Today I am building a little builder that will build the database I need, and I have proven that I can build the overlays I need in Google maps and they look sorta cool.

Otherwise, I have also been attending security meetings and I attended an ESRI-hosted GIS presentation where I learned the differences between Web Mapping Services (WMS) with support for Style Layer Descriptors (SLD), Web Coverages Services (WCS), and WFS (Web Feature Service) and the uber nifty WFS transaction abilities.

In short, all three sets of acronyms are something approved by the Open Geospatial Consortium (IE, they are accepted standards) for a client seeking geospatial data to better define what they need.

WMS pretty much allows for creating overlays and polygons over maps, and the added SLD's allow a client asking for data to say "I want these lines to be blue" or "I want these measurements made in metric and English and I want both returned".

WFS allows for more succinct geospatial objects like "This is a trail" or "this is a path in the woods" or "these are all the rest areas in the park". They are vector based and when transactional, allow the client to actually change the data in the hosting database (hence, if a trail is off, it can be corrected... after being locked, adjusted, and committed.).

WCS is a way of asking for area coverages using raster projections. Hence shaded areas and change diagrams (IE, how did Mount St. Helens look before and after the eruption... with pretty colors!)

ESRI makes server and client products that deal with all the different flavors and visualizations, and even support lots of open source and/or free visualization and server products.

The other cool thing is that KML pretty much supports data from all the methods and encompasses all the cool ways of serializing the GIS data. KML sort of sits between server strength and client strength values and serves as a major transport language.

So yeah, tomorrow I am hoping for polygon databases and displays.

Wednesday, October 22, 2008

Globus service testing using SoapUI

One of the tools I like to use to test out web services is SoapUI (because it is free, open source, 100% Java and other superlatives). So far this is useful for calling public (anonymous access) services, but for secure services it didn't work out.

From the gt-users mailing list, Joel Scheider emailed me to let me know:

Using soapUI, it is possible to pass a client-side SSL credential to a web Service, e.g., for GSI Transport (TLS) authentication, but it's necessary to first convert the public/private key into Java keystore format, as described in Appendix A of this document:

http://agnis.googlegroups.com/web/UsingSoapUIWithAgnis.pdf

Instead of creating a proxy certificate, this method uses the client certificate directly, so delegation is not supported, but TLS authentication still works.

soapUI also claims to support WS-Security, but I haven't personally tried using that feature yet.

Thought this may be helpful for anyone else who needs a quick way to call Globus services.

Tuesday, October 21, 2008

new techs on old demo

Today I managed to finish sprucing up the poicondai demo with better logging, and better handling of the new zipcode lists. I also started my research of the JQuery show/hide toggle so I could show/hide the raw zip-code list unless data was to be specified.

Tomorrow I hope to complete the show/hide, and get a prelim set of geographical outlays in poicondai with count bubbles. It will look a lot like rodsadai.

Otherwise, today we had a cool meeting on security, and we are going to doing a lot of data and application classifications and start laying out all of the different standards and policies... and just looking upon our data with more of a "security" eye. Thus, I am meeting a lot of new security focused folks who ask the important, retrospectively obvious questions about what services are present and what sorts of fields are used and whether they might be dangerous when boxed up and shipped somewhere to someone who hasn't seen them before.

DiSTRIBuTE's Aggregate Data Model

John Stinn reminded me of the DiSTRIBuTE's Aggregate Data Model. This summer, the PHGrid team met with Ross Lazarus and the BioSense epis about the potential structure of an Aggregate Minimum Data Set (written about earlier on this blog).

The International Society for Disease Surveillance has a project called DiSTRIBuTE (aside to future project namers- name your project something that is easily googlable) that seeks to collect aggregate data on ILI. DiSTRIBuTE uses a minimal aggregate data structure of:
| date | zip3 | age group | fever count | denominator |

This is similar to that proposed by Dr. Lazarus for use at Harvard ESP.

Tracking Changes

The phgrid sourceforge project is now configured to use sourceforge tracking to track changes to the PHGrid projects. So far, there are two active trackers: Feature Requests and Bugs.

The group has been very active in developing out services and demo apps, now that activity will be even more transparent by tracking the changes as they are submitted, prioritized, assigned, developed and tested.

The sourceforge tracker has rather obvious limitations (fixed workflow, no assigned tester field, on and on) but it's free. Any suggestions are appreciated.

This can also be used by the community to suggest new features, services or bugfixes.

To support this tracking, I'm recapping our development workflow below (which is available as a pretty graphic on the wiki):

User Enter Feature/Bug with description (with or without use case)

Admin Prioritize/Assign change

Developer creates (or updates if use case exists) and posts use case, use case is reviewed

Developer codes change in new branch with automated unit testing (JUnit, etc)

Developer assigns change to tester (other than developer)

Tester reviews code and either approves (go to #7) or notes required changes (go to #3)

Developer committs change to trunk/release

Admin closes Feature/Bug

Friday, October 17, 2008

Change, Test, Repeat

Today I have a wee bit more functionality in the advanced search, and I am getting some good leads on how to get and modify GIS polygonal data. It's neat stuff!

But, something else was brought to my attention today: The plans to get a tracker system behind our changes. This is a very good and cool thing because it means that the people interested in our projects will have a well documented and leading practice way to suggest features and report bugs. Also, it means we'll start being a bit more formal about what we are working on, and invite other developers and users to test our stuff, making it that much more robust and usable.

This means it's becoming more real.

Thursday, October 16, 2008

Troubleshooting Northrop Grumman Node

Troubleshooting the Northrop Grumman Node. I will be on site at 10am working with Marcelo on solving this issue.

530-globus_xio_gssapi_ftp.c:globus_l_xio_gssapi_ftp_decode_adat:846:
530-Authentication Error
530-GSS Major Status: Authentication Failed
530-accept_sec_context.c:gss_accept_sec_context:183:
530-SSLv3 handshake problems
530-globus_i_gsi_gss_utils.c:globus_i_gsi_gss_handshake:890:
530-Unable to verify remote side's credentials
530-globus_i_gsi_gss_utils.c:globus_i_gsi_gss_handshake:863:
530-SSLv3 handshake problems: Couldn't do ssl handshake
530-OpenSSL Error: s3_srvr.c:2010: in library: SSL routines, function SSL3_GET_CLIENT_CERTIFICATE: no certificate returned
530-globus_gsi_callback.c:globus_gsi_callback_handshake_callback:531:
530-Could not verify credential
530-globus_gsi_callback.c:globus_i_gsi_callback_cred_verify:681:
530-Can't get the local trusted CA certificate: Cannot find issuer certificate for local credential with subject: /O=HealthGrid/OU=Globus Toolkit/OUxxxx/OU=xxxxx/CN=xxxx
530 End.

Taking shapes.

Things are starting to precipitate here in and around the NCPHI Grid Lab. People are meeting and starting to talk about moving from a research stance to a production stance and actually making things to be used and seen by the world. I think it is awesome, and I also think it will be a lot of work and will be a bit tricky.

When I think of what I want when I think of public facing production GRID products... I think of what I would like to turn out, and I keep thinking of my dream/killer application: Something with the availability of OpenOffice and the simple functionality of Google Apps.

Open Office is pretty much available for everything. It is also deliciously packaged for everything. You can get them through any given *NIX package manager as RPM or a debian package files, it can be had for solaris and windows and mac, and it just seems to be there after the double-click.

Google Apps are simple and ridiculously functional. They also get to every computing platform out there. You can get Google Earth for *nix, Windows, mac, and it has just the same sort of "drop and unbox" functionality. The only drawback is they are closed source and don't like redistribution... but you wouldn't know it from all the APIs they post. There is little question to how to use their apps, and if you want to get into some of the really complex functionality you can easily Google (tee hee) how to do it.

I want our apps to be like that.

I want a NCPHI node to be one of those very simple and concise packages that only asks for what it needs (maybe the appropriate X509.11 certificate files) and then just installs. Whether it be a package (with a lot of attached packages) in the package manager of your given *NIX, or the installer on windows or mac. Right now it is a multi-step download-n-build-n-massage process that takes a new person multiple days and a seasoned veteran the better part of 3 hours depending on how the downloads go.

In the meantime, I want the apps that can come with a node to be much like Google apps. Easy to download and include, easy to find, perhaps even a few checkboxes from an admin portal that just lets you include and configure the pieces from the get-go, but otherwise just something you can nab and drop-in and have it Just Work.

Doing this is going to be a bit difficult, a lot of the limitations we are facing have to do with Other People's Code. But, most of that code is open sourced, and can be modified, and we are having an okay time talking with other people and they have been gracious and enthusiastic about working with us, and it might not get to that point due to a bunch of constraints that are unforeseen or just unknown at the moment... but that is my target. It's what I want to work-for.

Wednesday, October 15, 2008

Updates

Updated the MonaLisa configuration on Dallas, Tarrant, and NCPHI

Contacted the MonaLisa support about adding PHgrid to the Global configuration. This will show connectivity between the grid nodes.

Creating an expect script that will support OGSADAI proxy.

Removed all unessential software from the sandbox node in preparation for the VM appliance.

RODS, poicondai-web, and GIS.

Yesterday I got the filtering completed... today I started answering the question "so how do you get google maps to show polygons for zip codes" and I learned quite a few things.

Lots of other people have done this before me, as there are all sorts of cool websites (with code access for pay) that have the overlays for zip codes and phone area codes on top of a google map.
Lots of these sites seem to work by building a KML that google can read.
RODS is one of these sites, and at least that code is free.

What I am trying to do will use KML as a last resort. KML generation means you have to stick a KML document at some public URL and tell google maps to find it, and the CDC end of the grid is all sorts of locked down... and as far as I can tell, Google Maps regrettably does not have a "send me a string of KML" function.

But, there are ways to generate polygons and overlays with just the javascript commands, just like there are ways to drop points on a map with javascript commands, it's just a matter of teasing out coordinates for the borders of the polygons, and Jeremy helped me discover the appropriate PostGIS and postgres tools that are supposed to let me get a zip-code/border database... and he has shown me the way to the RODS code that makes the appropriate queries to get the coordinates that are usually sent to KML. I'll just have to make it so they are sent to javascript arrays instead.

It'll be a bit kludgey, and will probably need some caching, but it should work, and then you know the only thing you need to see to deploy the POIConDai web visualization is access to the service and the appropriate GIS database.

Also, tomorrow will probably be spent focusing on just getting the zip code centroids working and teasing the data out of the NPDS service appropriately. But it's nice to have some part of my mind working on how to turn those dots into polygons for the time being.

Otherwise, we had a meeting with the ESRI/DGI-net guys. Their browser is mega-posh.

International Science Grid This Week

Issue 96: iSGTW 15 October 2008
Opportunistic storage increases grid job success rate

The DZero high-energy physics experiment at Fermilab, an Open Science Grid user, typically submits 60,000-100,000 jobs per week at 23 sites. The experiment’s application executables make many requests for input data in quick succession. Due to the lack of storage local to the processing sites, up until recently much of DZero’s data had to transfer in real-time over the wide area network, leading to high latencies, job timeouts and job failures.OSG worked with member institutions to allow DZero to use opportunistic storage, that is, idle storage on shared machines, at several sites. This represents the first successful deployment of opportunistic storage on OSG, and opens the door for other OSG Virtual Organizations. With allocations of up to 1 TB at sites where it processes jobs, DZero has increased its job success rate from roughly 30% to upwards of 85%.
Read more

Tuesday, October 14, 2008

poicondai moves forward

Yesterday and today, I managed to do a few updates to poicondai, and basically have it to the point where it has a drop-down for ClinicalEffect to filter as needed.

Tomorrow, I will be building up the GIS databases to support making polygons, and hopefully I will be able to get polygonal data on a google map sometime tomorrow or Thursday.

Otherwise, we had a meeting with someone who is a bit better at web design than me, so hopefully the demos will be a lot prettier.

And finally, we got the Dallas problem solved, looked like there was a box in need of a restart and some bottlenecks that needed to be sorted out.