Public Health Grid (PHGrid) - Research and Development: 2009

Thursday, December 24, 2009

NSF TeraGrid GIS Workshop on Cyber-GIS

National Science Foundation TeraGrid Workshop on Cyber-GIS:

http://www.cigi.uiuc.edu/cybergis/index.php

February 2-3, 2010 - Washington, DC

The NSF Cyber-GIS workshop will take place in conjunction with the 2010 UCGIS Winter Meeting at Doubletree Hotel, Washington, DC. The workshop will focus on the following themes and topics:

Complex geospatial systems and simulation of geographic dynamics
Computational intensity of spatial analysis and modeling
Data-intensive geospatial computation and visualization
High-performance, distributed, and/or collaborative GIS
Geospatial ontology and semantic web
Geospatial middleware, Clouds, and Grids
Open source GIS
Participatory spatial decision support systems
Science drivers for, and applications of Cyber-GIS
Spatial data infrastructure

For more information, please contact workshop co-chairs:

Shaowen Wang
National Center for Supercomputing Applications
University of Illinois at Urbana Champaign
shaowen@illinois.edu

Nancy Wilkins-Diehr
San Diego Supercomputing Center
University of California at San Diego
wilkinsn@sdsc.edu

Monday, December 21, 2009

HealthGrid 2010 conference: call for papers, posters and workshops - deadline Feb 15, 2010

HealthGrid 2010 conference: call for papers, posters and workshops

conference web site: http://paris2010.healthgrid.org/

KEY DATES
Call for papers, posters and workshops closes:
February 15th 2010

The eighth HealthGrid conference will take place June 28-30 2010 at University Paris XI in Orsay (France). Every year, this conference is the opportunity to discuss the state of the art for the integration of grid practices into the fields of biology, medicine and health. This year, it will take place just at the time the European Grid Initiative will start federating the national grid initiatives and propose its resources to the Research Infrastructures. The conference program will include a number of high profile keynote presentations complemented by a set of refereed papers, which will be selected through the present call. Out of the selected papers, the best will be invited for oral presentations and the others for poster presentations. All the selected papers will be published in the book series „Studies in Health Technology and Informatics“ published by IOS Press and referenced in Medline, Scopus, EMCare and Cinahl Databases

Call for papers and posters
Contributions should be made as full research papers (up to 5,000 words in length and maximum 10 pages). Selection for oral or poster presentation will be based on the content of the submitted papers, the originality of their contribution, technical quality, style and clarity of presentation, and importance to the field. Oral presentations could include demonstrations.
All papers must be submitted electronically. Please refer to the conference website for upload instructions. The guidelines for authors and support tools are those of the series „Studies in Health Technology and Informatics“ (see URLs below). Papers are invited in, but not limited to, the following areas and topics:
A: ACCESSIBILITY
Challenges to making grids more accessible to bio-medical users
Scientific gateways
Workflow engines
Grid portals
Grid platforms
B: CORE TECHNOLOGIES AND KNOWLEDGE INTEGRATION
Grid technology versus web applications
Data privacy: confidentiality in distributed medical information systems – and the security challenges
Knowledge integration – knowledge management
Semantic techniques and the challenge of integrating heterogeneous biomedical data
Visualization in Grids
C: APPLICATIONS
Bioinformatics
Biomedical informatics
Medical imaging
Public health informatics
Genetics and epidemiological studies
Pharmaceutical R&D: drug discovery, clinical tests
Grid computing and the Virtual Physiological Human (VPH)
D: SOCIO ECONOMIC ASPECTS
Grid business aspects: sustainability and go-tomarket strategies
Experiences on production
Grid used in real business
Grid sociology: how to win society for Grids?
E: THE FUTURE OF GRIDS
Experiences with GP-GPU
Cloud computing, on demand computing
Nanomedecine

Call for workshops
This year, the conference Program Committee calls for workshops. On Monday afternoon, June 28th 2010, parallel workshops are scheduled from 2.30 pm to 6pm. Workshop proposals should include the following information:
Topic: the workshop topic should be directly related to the conference topics listed in this call.
Duration: all workshops selected through this call will take place Monday afternoon. Duration can be 90 minutes or 210 minutes (including coffee break).
Targeted audience: what is the expected attendance (for room allocation) ?
Format: do you wish to invite the speakers or to call for contributions? The contributions to the workshops will not be included in the conference proceedings.
Name, affiliation and email address of the workshop submitter.

Tuesday, November 24, 2009

Code-A-Thon

I attended the NHIN CONNECT Code-A-Thon in Portland, Oregon last week. It was two days of developers planning out and working on the next release of the NHIN CONNECT software (v2.3). There's a wiki up with notes from the two days worth of sessions.

But the interesting, relevant piece was when they started talking about future architecture. Two of the future topics (out of probably 20 or so) were grid computing and cloud computing. Because of the distributed nature of NHIN (CONNECT nodes everywhere), we had a short discussion about how data and computing power can be spread around in a distributed manner. Specifically, the question of hadoop and map-reduce was brought up about how jobs can be spread out over NHIN and NHIN-compatible systems.

On the whole, a pretty remarkable mini-con. This is a new approach to Federal open source projects and refreshing that OSS has come so far.

Wednesday, November 4, 2009

Vietnam welcomes three new grid sites; hospitals get new ‘HOPE’

.... HOPE (HOspital Platform for E-health) developed jointly at CNRS and HealthGrid in France, allows hospital sites to exchange medical information. HOPE is now installed at the Institute of Information Technology in Ho Chi Minh City, formerly Saigon, for testing. All going well, it will be installed in the primary Ho Chi Minh hospital.... read more..
http://www.isgtw.org/?pid=1002120

Tuesday, October 27, 2009

Updated SDMX-HD Resource page

I updated the PHGrid wiki section on SDMX-HD.... with this link.

Friday, October 23, 2009

PHGrid Community Update

To the PHGrid Community:

Related to significant organizational change currently underway, the internal team supporting the PHGrid activities within NCPHI has transitioned to other projects. To assist in this transition, we have provided many updates to PHGrid documentation (technical and project) posted to the PHGrid wiki (http://wiki.phgrid.net), and to PHGrid-related software in the Google code repository. If anyone has any questions relating to this change, or PHGrid software / services, please don't hesitate to contact me. We look forward to continued PHGrid research activities upon completion of the reorganization. It has been my sincere pleasure to work with the NCPHI PHGrid team (Brian, John, Peter, Dan, Chris, Moses, Joseph).

-- Tom

Monday, October 19, 2009

So long and thanks for all the fish...

Thank you to everyone I've worked with over the years as part of the public health informatics research grid project. I've met some extremely bright individuals and had a chance to collaborate with some extremely rare organizations and groups.

Although moving off the project formally (i.e. I won't get paid for contributing), I'll still be participating through the loose system of collaboration that the project uses to create the blog, wiki and software elements.

Saturday, October 17, 2009

Grid Computing Technologies for Geospatial Apps

Grid Computing Technologies for Geospatial Applications:

http://ifgi.uni-muenster.de/0/agile/

The gallery is here: http://ifgi.uni-muenster.de/0/agile/gallery.html

Standing room only !! or maybe just grab an open seat....

Friday, October 16, 2009

Project statistics

While creating the transition documentation, I ran some stats on the active code base (not including everying in old-projects) using cloc.

I'm not really a fan of measuring quality by number of lines of code (since good programmers produce fewer lines of code than bad but busy programmers), and a lot of this is boilerplate, etc. But I think it's worth noting that with just a limited team, we made 282 classes with 25k lines of Java, 2k lines of JSP, 6k lines of comments and documentation. Nothing massive, but it's a decent body of work.

Language	files	blank	comment	code
HTML	579	19160	9724	161640
Java	282	5420	5594	25656
Javascript	35	3098	2779	14490
XML	136	956	998	9448
XSD	20	146	99	4228
SQL	126	425	769	3840
JSP	20	357	95	2368
CSS	12	158	140	1460
Bourne Shell	1	2	0	7
SUM:	1211	29722	20198	223137

Wednesday, October 14, 2009

Enhanced PHGrid portal wireframe

So, as a step better than a jpg, the PHGrid portal demo is now web-based. Kudos to Chris.

Click HERE to launch...

Code transfer

I finished moving all of the active source projects from sourceforge to google code to support our transitioning off the project.

The following subprojects have been successfully moved (GIPSEPoisonService, GIPSEService, GIPSEServiceInstaller, gmap-polygon, gridviewer were all moved prevously:

GridMedlee: from sf to gc.

PHGridLanding: from sf to gc.

SecureSimpleTransfer: from sf to gc.

gipse-dbimporter: from sf to gc.

gipse-store: from sf to gc.

gipse-poly-web: from sf to gc.

loader-gmaps-poly: from sf to gc. (this includes the CSVs for settng up the gridviewer GIS tables)

npds-gmaps: from sf to gc.

npds-gmaps-web: from sf to gc.

poicondai: from sf to gc.)

schemas: from sf to gc. (this includes the schemas and example xml for the GIPSE services)

The sf projects will be left intact so as not to break any links, but all activity will be made on the google side from today onward.

Successful GIPSEService test

Forgot to post that last week Ron Price and I successfully tested a deployment of the GIPSEService at the Denver DOH.

Ron, working with Art Davidson, set up a synthetic aggregate data set and then he deployed an instance of the GIPSEService (8/31 gipse spec from the SVN repository).

I was then able to submit a test query from the NCPHI Lab using lab credentials that was successfully processed by the GIPSEService and sent back a response document containing the relevant observation set. Also, I tested with inappropriate credentials from unauthorized locations and I was not able to access the service (as expected since Ron's security controls prevent access by unauthorized users or locations).

Friday, October 9, 2009

Sir Tim Berners-Lee: The Semantic Web Has Arrived and the Obama Administration is "Onboard"

http://www.beet.tv/2009/10/sir-tim-bernerslee-the-semantic-web-has-arrived-and-the-obama-administration-is-onboard.html

Thursday, October 8, 2009

GAARDS Security implementation.

So, my next task for the coming months is to learn, tinker-with, and hopefully implement some cool bits of the GAARDS service as made by the folks up at Ohio State and their work with CaBIG and CaGrid. After a few preliminary readings of white papers and discussions with other people who have investigated various security models, I'm going to try and summarize things as I understand them, and invite people to correct my summarizations...

Globus works with X.509 certificates. To save a lot of complicated two-stepping, I'd say the easiest way to think of a certificate is as a licence with a special key embedded in them. Two nodes wanting to talk to each other have to present their certificates order to access services and establish secure communication, and the nodes have to "trust" each others certificates.

The way to get "automatic" trust without having to add keys into individual trust stores would be to have all the certificates issued by a trusted third party like Verisign or Thawte. This is like getting a passport or a drivers license as ID instead of having a business card with your name on it. It is also expensive, and to have to do it for every node on the grid beyond 5-node grids is pretty much unscalable.

Enter Dorian. Dorian is a GAARDS component and is essentially a Grid Service that allows other authentication methods to be used to access the grid. On one hand, it allows for someone to say "people authenticated by [method] at [node] are allowed to access these grid services". Thus, instead of having to have a certificate, one might just need to enter a username and password, or use their operating system credentials, or use a certificate issued by the Node itself instead of a larger third party.

The other critical component is Grid Trust Services (GTS) which allows for grids with different certificate sets to talk to each other and delegate which services on each grid are available to others. It also performs important syncing functions so that updates to access and authentication chains are propagated through the different grids.

There are other bits too, like GridGrouper which allows for simpler group paradigms (members of the group 'Gridviewer' would be able to access various gridviewers on different nodes... ) and Web Service Single Sign On which would allow an easy port for web applications to gain access to grid services... and you can read about it at the GAARDS website

Either way, I am at the periphery of understanding right now. I hope within a couple of days to have a really good grip on how security works now (and it's limitations) use cases for what we need, and a stronger correlation to how GAARDS will answer those use cases and which components are needed to do it.

Then, over the next couple of months, I'll need to implement those pieces and see what service modifications are needed to use them.

Wednesday, October 7, 2009

GIPSEPoison service move, and GIPSE Service Installers

The GIPSEPoison Service has now moved over to the google code repository (you can check out a read-only copy from http://phgrid.googlecode.com/svn/GIPSEPoisonService/trunk).

But, one of the things that I have been doing has been making an ant-fueled bundle that will install both the GIPSEPoisonService and the GIPSEService with a single ant command. You can read about that here: http://sites.google.com/site/phgrid/Home/service-registry/gipseserviceinstaller.

It essentially uses a properties file to download code from a repository, deploys other service-specific properties files to the downloaded code, and then calls the downloaded code's build and deploy scripts.

The idea is that for future service deploys, I can just email Dan a zip with the appropriate properties files and say "unzip this as [user] and then run 'ant all'". My next hope is to try and see if I can make a mvn based download script so that I can make similar installers for things like GIPSEService. The other cool thing is that both ant and mvn should be easily callable by the NSIS installer that Dan found.

Monday, October 5, 2009

Open Source Installshield Equivalent

In an effort to reduce the complexity of installing grid nodes, I have been evaluating Open Source products that will allow us to create an automated installer. The goal is to have a user download grid software from PHgid and execute the install program that would install a node with minimal user interaction.

Nullsoft Scriptable Install System (NSIS) is one noteworthy product that stands up to this task. NSIS requires low system overhead, it's Windows compatible, scriptable, and it supports multiple compression methods.

Screenshots and additional information can be found at: http://nsis.sourceforge.net/Screenshots

Friday, October 2, 2009

More code migration

I just finished moving over two additional subprojects from our sourceforge repository to the newer google code repository.

From now on, please access gridviewer and gmap-polygon through google code. The sf version will stick around for a while to minimize inconvenience of repository switching, but eventually it will be replaced with just a pointer to the google code repository.

We're switching to google's site for a few minor reasons: 1) the issue tracking / wiki software is better. It lets you create pretty clean workflows through their tagging system; 2) the code review feature is useful; 3) their Subversion server is quite faster than sourceforge's; and 4) it's easy to switch, so we can switch back to sourceforge if it becomes better.

Note, we're not moving everything over to google code (yet) so there will be some time until all the subprojects come over.

Thursday, October 1, 2009

Gridviewer Updates

Gridviewer was updated with the following changes(enhancements):

Default state of region query boxes is hidden. Seems to load cleaner.

GET Requests are now supported, meaning you can use a URL to generate a specific query.

Email link: Added an email link which generates a shorten URL (using bit.ly) and opens a pre-formatted email.

Light modifications to javascript for performance and general file size reductions.

http://ncphi.phgrid.net:8080/gridviewer/

Tuesday, September 29, 2009

GIPSEPoison supporting zipcodes, now onto easier installers

So, after a lot of planning, I finally managed to get general zip5 and zip3 support working in the GIPSEPoison service. I got it so that the service would only ever have to make two calls to the NPDS-poison service (one for states, one for zips) and that it could return zip3 and zip5 and state results simultaneously, and checked in the code after testing it.

The next thing I am planning to work on is some ant scripting for building and deploying the service in one simple command. The hope is that it will turn the install process for this service into a simple command thus turning about ten steps into about a two step process (one if you are repeating the process or just installing a patch).

Otherwise, it seems the farther future involves the generation of more services, more intricate installers, portals and other service/viewer aggregators. My hope is to just start creating things that make fetching and installing these items as easy as possible.

Cheers

Monday, September 28, 2009

Unsupported Class Version Error

The error:
Exception in thread "main" java.lang.UnsupportedClassVersionError: Bad version number in .class file

Explanation:
This happens when you have compiled a jar file with a newer version of Java and executing it with an older version of Java.

The Solution:
Either upgrade that Java version on the executing machine or recompile the code with the correct version.

Friday, September 25, 2009

Gridviewer deployment

The latest version of Gridviewer has been deployed this morning on the training node. It has all the latest security additions.

http://ncphi.phgrid.net:8080/gridviewer/

Thursday, September 24, 2009

Invalid Encoding Name in Tomcat 6

Here's an error message you may come across when deploying Globus to Tomcat 6. This error message happens when Tomcat is started.

SEVERE: Parse Fatal Error at line 1 column 40: Invalid encoding name "cp1252".org.xml.sax.SAXParseException: Invalid encoding name "cp1252".

Use the following procedure to correct the error:

1. Change directory to %CATALINA_HOME%\conf
2. Open tomcat-users.xml in a text editor
3. Look at the first line of the file and change encoding='cp1252' to encoding='utf-8'
4. Save the file and restart Tomcat

A web-based portal for PHGrid - Initial screenshots for discussion

I keep thinking about the public health workforce….and their most-likely perspective on our PHGrid activity. My thought is, what matters to them is that they are provided a new, robust and intuitive resource at their disposal that makes their work easier - not that it's a cool technology based on the globus toolkit, and leverages grid computing.

Thus, to better demonstrate to the public heath community the potential capability of the PHGrid architecture and ecosystem, I created some wireframe mockups articulating my thoughts around a web-based PHGrid portal. The goal of this is to demonstrate to users that PHGrid is not just about 1 GIPSE (aggregate data) service and 1 web-based geographic mapping "viewer" (i.e., a PHGrid Gadget)- but about a dynamic ecosystem potentially consisting of hundreds of different PHGrid resources (services, applications, Gadgets, etc) created by many and shared among many. This portal would provide a user-friendly, single, secure, access point to PHGrid resources (services, applications, Gadgets, Data, Computational power, etc.).

These wireframe screenshots are very crude….have many errrors, and are in no way exhaustive. They are at least a starting point for discussion.

Of course, there may end up being many more features of the portal - but I really do see it requiring 3 fundamental components:

1. General, secure, customizable PHGrid dashboard. I see this as a combination of MyYahoo, iGoogle, and iTunes for PHGrid. It combines, for example, social networking, eLearning, news, alerting, and statistics.

2. Real-time directory of available PHGrid resources. This combines features of an automated standards-based (UDDI) registry with integrated social aspects of eBay and Amazon.com. Users can quickly look for specific resources (services, applications, etc), examine their strengths and weaknesses, and ultimately request access to the resource. In other words, this part of the portal provides user-friendly access to a dynamic PHGrid resource ecosystem.

3. MyGrid - A user-defined PHGrid "workbench." This part of the portal allows users to customize a pallet of resources which are important / relevant to them - and which simplifies the process of organizing PHGrid resources.

Depending on the access control requirements - the user may be able to obtain immediate access to the resource- or may have to wait for the resource provider to grant access. Some resources may even require a document to be filled out and submitted. As can be seen in the Access Status feature of the site, this feature will allow the requesting user to monitor the status of his/her request - regardless of the specific process. Once the user has been granted authorization, he/she can use the Service/Gadget Automation tool to create ad-hoc and recurrent workflows / macros - tying together multiple resources. This would have aspects of both the taverna workbench and yahoo pipes. Clearly this area needs a lot of work..but I feel it may grow into a much larger aspect of the site.

Here is an example of a workflow (i.e., a macro) that could be created and saved - to be run at any time - or automated to run on a recurrent based:

After logging into the system with their secure credentials, a state epidemiologist goes to the MyGrid part of the portal. They then complete the following steps:

1. Requests specific data elements from data source X [service a]
2. Combines this data with data from source y [service b]
3. Runs a Natural Language Processing (NLP) engine on one data field from source y to convert a large chunk of text from a family history field into discrete coded data elements [service c]
4. Performs geospatial analytics on the newly generated data set z [service d]
5. Visualizes the analytic output using specific criterial [gadget e]
6. Creates images from the visualization tool and exports them to a web-based tool to be accessed by his/her colleagues at the local and county health departments within that state [service f and gadget g]
7. The user saves this workflow, and configures it to run every night at midnight.

I look forward to others thoughts on this. My hope is that we can create a very rough mock-up of this in the near future. It's my belief that it is only through the creation of a resource such as this, that we can clearly articulate the real value of PHGrid's robust, secure, SOA-based architecture / ecosystem to the overall public health community, and not just to the IT and informatics savvy public health workforce.

Thanks! Tom

Security

I have started a wiki page addressing security practices we are implementing in gridviewer. My hope is we can continue to expand on these. Wiki page is here.

Wednesday, September 23, 2009

Many SDN deploys and GIPSEPoison tweaks

So, I finally got the "All Clinical Effects" element working in GIPSEPison... which effectively, if selected, tells the poison service to return all the human call volume for a given region instead of a particular clinical effect.

The next step is to get zip5 and zip3 working. I plan to do it the same way I did it in quicksilver: ask for all the counts of a state and then filter out the zipcodes I don't need. If zip5 is selected, just copy into the observations, but if zip3 is selected, bucket aggregate the zip5s. Ironically, having the whole state return grouped by zipcode is a lot easier than trying to list a long set of zip5s for an entire state (as would be the case with most gridviewer zip3 searches).

I also made a little zip3 to state properties file... in the hopes of not needing the geolocation database (and saving installation steps).

Otherwise, I have also been helping a lot with deploys and scans for various targets... which has caused lots of discussions about ways of automating things so that less time is taken up by these things which are going to be more frequent. The hope is that we can just tell someone about a build label and have someone hit "go" with that variable set and whatever needs to be deployed will get deployed automagically.

Either way, that is not the only improvement being discussed. I get the feeling that soon next week a whole lot of cool new services and visualizations are going to start getting hammered onto and into various design tables.

Cheers,
Peter

Tuesday, September 22, 2009

H1N1 Dashboard

While not technicaly PHGrid, the Novel H1N1 Collaboration Project's Enrollment Dashboard is an interesting approach to data visualization that may filter into PHGrid prototypes for visualization.

You won't be able to view the dashboard since it's not public, but the multi-layer google map is pretty interesting.

Saturday, September 19, 2009

neuGRID Project Video Demonstration

A video demonstration of the neuGRID project and platform is available
at: http://www.youtube.com/watch?v=fpfD6GZ90tQ&v=30

for more information, contact:
David Manset

CEO MAAT France (maat Gknowledge Group)
Immeuble Alliance Entrée A,
74160 Archamps (France)

Mob. 0034 687 802 661
Tel. 0033 450 439 602
Fax. 0033 450 439 601
dmanset@maat-g.com www.maat-g.com

Friday, September 18, 2009

Gridviewer improvements

We were able to complete a few substantial improvements to the Gridviewer application. The changes listed below were deployed on the training node (http://ncphi.phgrid.net:8080/gridviewer/):

Added download csv extract functionality per query request. (Click on Download Data to get an extract)
Added HTML data table display with sortable columns per individual query request as well as a combined data set . This feature is currently needing improvements in speed. Further improvements could contain subtotaling by regions or dates. ( Click on Display data link to see control. )
Corrected bug in the console log with the time displays.
Appended build version on the title and header.

Wednesday, September 16, 2009

GSA launches portal where agencies can buy cloud computing services

Full article here:

Kundra's great experiment: Government apps 'store front' opens for business
http://fcw.com/articles/2009/09/15/gov-apps-store.aspx?s=fcwdaily_160909

Direct link to the new service:

https://www.apps.gov/cloud/advantage/main/start_page.do

Tuesday, September 15, 2009

GIPSE Request Examples

I've received another requests for better documentation on the GIPSEService requests (what stratifiers are used for what, better examples, etc.). So I will create something soon and post to this blog.

In the meantime, the JUnit tests run through a series of examples in google code that may help a little against the NCPHI node.

Friday, September 11, 2009

Got GIPSEPoison working on the training node.

So, after many months of on-again, off-again trying... I finally managed to figure out the library conflict on the training node that was preventing GIPSEPoison from loading up on the training node.

The library ended up being WSDL4J. The CXF libraries being used by Jeremy Espino's awesome NPDS Client needed a newer version of wsdl4j than the one that was provided with Globus. Luckily, the newer version seems backwards compatible.

Thus, GIPSEPoison can now be seen as an option in http://ncphi.phgrid.net:8080/gridviewer/

If you read some of my earlier posts, you'll know that the first step was to import the libraries separately (rather than as a part of a Jar-with-dependencies). Otherwise, the error seemed to indicate a WSDL or some other parsing error, and WSDL4J was the first duplicate jar I tried.

The next couple of days are going to be spent fixing up GIPSE Poison. Giving it an "all clinical effects" option, and zip3 and zip5.

Cheers,
Peter

Tuesday, September 8, 2009

PHIN Conference Thoughts.

Greetings all,

It has been a week since the PHIN conference started with a long weekend in-between, and today I wanted to list out the impressions I got from the crowd and the panels and the discussions I had there... Then I hope to expand a bit with my notes and more thoughts in future blog entries.

- Impression the first(e): Where are the users?

At least two of the panels I attended could be summarized as "this/those informatician(s) used that/those cool grid technolog(y/ies) and find it helpful in doing there work, now if we could only get more of those technologies and work on making them talk to each other". The panels showed me several things. The most important is that informaticians, doctors, and users in general knew that there was stuff out there and they were using it to make their lives easier (IE, "Peter feels validated"). The next coolest thing we learned is that the users had very good opinions and suggestions about how the products could be made to better suit their needs... and the final point I realized is that while many of the fine-grained steps in the process were different (each health department has a different end data format and a different way of describing what they consider to be 'flu'), a lot of the general needs were the same (speed, ease of use, customizability, ease of interaction with other services). I feel that this shows the need to find as many of the users as possible and to try and create a very good space for them to have their opinions and needs voiced, and hopefully help each other install and adjust GRID-like products to suit those needs.

The other user impression I got is that PHGrid, as users of things like Globus and CAGrid (Introduce), are getting a lot of things needed from those programmers and communities. Globus is coming out with cool new things that solve a lot of the old problems, CAGrid is improving Introduce to match the new version(s) of Globus and answering the "how do we create a grid without having to buy lots of expensive third party certificates and how do we simplify registration/addition of new nodes" problems. CAGrid is also looking into ways to remote-deploy services to user boxes so they can run analysis or other functions on data that cannot leave their organization.

- Impression the second(e): We should do the things to make it easy for users even if it makes our lives more difficult.

Several of the panels included the phrase "at our local health department... we tend to see flu like this... but next door, they see flu like this...". In short: "Two health departments, three classifications". In addition, local health departments are wary of attempts to take large chunks of their data so that someone else can re-classify the data. But, they are fine with setting up a service that only gives summary data (no patient info, just counts). Furthermore, they don't have much trouble going "when the service asks for an aggregate count of flu, we'll give them an aggregate count of what we think flu is". Furthermore, they like the idea of having services that allow them to re-organize or re-classify data so it matches a standard, so long as they don't have to email large files or mail DVDs to some place outside of their control.

This means, the biggest impediment to using these types of services is probably going to be the major learning curve required to install all the various toolkits (globus, CABig, tomcat, certificates) and the various configurational complexities to all the applications. The more time we spend making an installer, the less time we have to spend on the phone with interested health departments telling them how to set it up. That also goes for how easy we make it for health departments to write the queries into their own database(s) or datasets. Finally, we need to focus on allowing for customizable outputs from the service viewers we create. If someone can go to GridViewer and get a sample of data to make sure that the CSV we spit out will be able to be read by their analytics... that will help save them some time in a word-processor.

Thus, installers, view panes, things that centralize and simplify configuration and installation with the obvious stuff up front and the complicated stuff defaulted but in an obvious place for modification. Think firefox. Think google.

- Impression the Third(e): Everyone is really happy that all the stuff is relatively open, free (as in speech), and everyone is acquainted with each other and thinking of ways to collaborate and suggest improvements.

If we make a service builder, we will probably be extending CAGrid's introduce. Dr. Jeremy Espino is tweaking globus service projects with Ivy so one doesn't have to upload various libraries into a repository. Globus is new and probably going to be using CXF instead of Axis which will probably improve everything and make us impromptu beta-testers, and a lot of people seem to be marvelling at Quicksilver and Gridviewer and looking forward to their improvements.

So generally, the PHIN conference helped me focus on what I think we need to be doing for the next year, namely getting ready and making our stuff flexible enough for a lot of users to do a lot of different things to make their statistical and analysis efforts easier.

Friday, September 4, 2009

Sparklines

This weeks build includes the following:

1. Moved flot chart out of Marker info window and into greybox for improved UI.
2. Corrected loading image bug
3. Corrected hidden markers reappearing when new query submitted bug
4. Corrected date range selection bug
5. Added sparkline graphs per region with ability to change chart type.
6. Improved show/hide on region selections and filter selections
7. Modified ajax request flot calls per changes on server-side for facilities/datasources

Not happy with the way the site is rendering in IE7, but it is an inferior product anyway. When possible, use Mozilla/Webkit..

Tuesday, September 1, 2009

2nd iteration..always room for improvement

I am happy with the number of improvements made to the UI of gridviewer in the few weeks we had. We eliminated post-back calls to the server for every query, we developed custom pin overlays, added service areas and ages into the query parameters, redesigned the layout and made a considerable number of general aesthetic changes. This 2nd release represents what is possible utilizing agile and of course, javascript.

On that note, I would like to point out my current thoughts on future improvements.

1. Charts: The first gridviewer used the pin info windows to display region specific chart data. We maintained that in the 2nd release. The issue with this design is the info window is too large for the scale of the map. It is my opinion that we move the chart out of the map completely into a dynamic layer around the form. This would prevent all the dragging of the map to see the whole chart.

2. Primary navigation: I waver on how important the map actually is to the user. When I first began working on gridviewer, my original impression was the map should serve as the primary navigation, meaning they should use the map for drilling into all the query results. Now I actually believe the map is really a subordinate to the query result boxes (see image below). My opinion is we should expand the current boxes to include more informative data results such as simple bar charts showing zip totals, or possibly show a sparkline for each respective zip. A user should be able to click on AZ, get a chart for that collection as well as have the map zoom into that respective region. As the user selects items either pins on the map or check-boxes in the query boxes, each respective control is responding that event accordingly, meaning the map zooms in and out and the chart data changes based of the selected region. As others have mentioned, a data download should be available per query box as well.

3. Form
I wonder if we could reorganize the form selection box to have an "advanced options" selection which would reveal(show/hide) the age, dispositions and service areas. It seems that age and dispositions could default to all and service areas should mostly be driven by the classifier selection. It would be less intimating to the user and also would provide more space.

4. Comments
One suggestion was for commenting which I am really in favor of. I believe before we implement this we must have the ability for people to generate links to a saved query. It would not be a large hurdle to have one URL per query (box), but to have a single URL for multiple queries is going to represent a challenge. We could look to using something like ShareThis once we have the URL issue worked out.

Monday, August 24, 2009

More updates for H1N1

I'm just putting out a general updates post since Peter and Chris haven't said anything yet.

Marcelo updated the GIPSEService google code project with some performance tuning and configuration enhancement changes. It won't affect the input/output but makes the code easier to maintain.

Chris and Peter put out an update to the gridViewer that looks pretty cool, but more importantly, allows you to overlay different indicators, data sources and services on top of or next to each other.

We also now have an EDVisits indicator so now calculations can be performed that require a denominator.

Tuesday, August 18, 2009

Correcting Globus Handshake Errors

The Problem:
When Globus-WS is deployed to Tomcat 5.5.x within Windows, a Handshake error is thrown when secure Globus commands are issued. The secure commands run fine in a standalone container but fail when Globus is deployed to Tomcat.

Example:
C:\>counter-client -m conv -z none -s https://192.168.20.120:8443/wsrf/services/Se
cureCounterService

Error: ; nested exception is:
javax.xml.rpc.soap.SOAPFaultException: ; nested exception is:
org.globus.common.ChainedIOException: Authentication failed [Caused by:
Failure unspecified at GSS-API level [Caused by: Handshake failure]]

The Solution:
(Commands are based on JDK 1.5.x)

This is caused when the SSL client does not trust the CA that signed the certificate. The solution is to add the CA certificate as a trustedCA.

1. Create a Java Key Store:

keytool -genkey -alias servercert -keyalg RSA -dname "CN=Your_host_name, OU=yoursite.net, O=your_organization, L=city, ST=state C=country" -keypass changeit -keystore server.jks -storepass changeit

2. Create a PKCS12 Keystore:

keytool -genkey -alias globus -keystore globus.p12 -storetype pkcs12 -keyalg RSA -dname "CN=Your_host_name, OU=yoursite.net, O=your_organization, L=city, ST=state C=country" -keypass changeit -storepass changeit

3. Export your PKCS12 Keystore:
keytool -export -alias globus -file globus.cer -keystore globus.p12 -storetype pkcs12 -storepass changeit

4. Import your PKCS12 Ketstore file into you Java Keystore:

keytool -import -keystore server.jks -alias globus -file globus.cer -v -trustcacerts -noprompt -storepass changeit

5. Import the 3rd Party CA into your Java Keystore as a Trusted CA

keytool -import -keystore server.jks -alias globusCA -file c:\etc\grid-security\certificates\31f15ec4.0 -v -trustcacerts -noprompt -storepass changeit

6. Import the host certificate issued by the 3rd Party CA into your Java Keystore.

keytool -import -keystore server.jks -alias containercert -file c:\etc\grid-security\importcontainercert.pem -v -trustcacerts -noprompt -storepass changeit

Based on the proceedure above, your server.xml file should look like this:

className="org.globus.tomcat.coyote.net.HTTPSConnector"
port="8443" maxThreads="150"
minSpareThreads="25" maxSpareThreads="75"
autoFlush="true" disableUploadTimeout="true"
scheme="https" enableLookups="true"
acceptCount="10" debug="0"
protocolHandlerClassName="org.apache.coyote.http11.Http11Protocol"
socketFactory="org.globus.tomcat.catalina.net.BaseHTTPSServerSocketFactory"
keystoreFile="C:\apache-tomcat-5.5.27\conf\server.jks"
keystorePass="changeit"
cacertdir="c:\etc\grid-security\certificates"
encryption="true"/>

---------------------------------------------------------
The commands change slightly when using JDK 1.6.x.

1. Create a Java Key Store:

keytool -genkeypair -alias servercert -keyalg RSA -dname "CN=Your_host_name, OU=yoursite.net, O=your_organization, L=city, ST=state C=country" -keypass changeit -keystore server.jks -storepass changeit

2. Create a PKCS12 Keystore:

keytool -genkeypair -alias globus -keystore globus.p12 -storetype pkcs12 -keyalg RSA -dname "CN=Your_host_name, OU=yoursite.net, O=your_organization, L=city, ST=state C=country" -keypass changeit -storepass changeit

3. Export your PKCS12 Keystore.

keytool -exportcert -alias globus -file globus.cer -keystore globus.p12 -storetype pkcs12 -storepass changeit

4. Import your PKCS12 Ketstore file into you Java Keystore.

keytool -importcert -keystore server.jks -alias globus -file globus.cer -v -trustcacerts -noprompt -storepass changeit

5. Import the 3rd Party CA into your Java Keystore as a Trusted CA.

keytool -importcert -keystore server.jks -alias globusCA -file c:\etc\grid-security\certificates\31f15ec4.0 -v -trustcacerts -noprompt -storepass changeit

6. Import the host certificate issued by the 3rd Party CA into your Java Keystore.

keytool -importcert -keystore server.jks -alias containercert -file c:\etc\grid-security\importcontainercert.pem -v -trustcacerts -noprompt -storepass changeit

Monday, August 17, 2009

H1N1 Response

I just realized that I haven't made any posts about our H1N1 response. The grid team has been working with the rest of NCPHI preparing for the flu season and the H1N1 surveillance that will be required. H1N1 response has been consuming a lot of all of our time.

I've been working on a data format that is modeled after the DiSTRIBuTE data use agreement's Appendix A that is a form of "GIPSE Lite" that defines a CSV format that can be used for immediate H1N1 surveillance. The data format is available on the wiki.

This format can be used in the "Producer-Collector" style of GIPSE architecture where data sources wish to provide daily data feeds that produce a report for transfer to a collector node (such as DiSTRIBuTE or CDC's GIPSE store repository). This is showing to be a viable option for sites that do not wish to host a query service, which requires a dedicated web service host with inbound access on port 443.

This is also helpful as this is the first DUA from CDC I'm aware of that specifically addresses summary and aggregate data collection and transfer.

Sunday, August 16, 2009

Prepping for Monday.

Greetings!

Over the past couple of weeks I have been dealing with two fronts: On one front, GridViewer is being modified heavily with lots of cool new javascript and AJAX type functionality, so I have been making simple data-fetching interfaces as needed to support said functionality. On the other front, I have found out that GIPSEPoison works everywhere but the training node and have been investigating why.

There are now two new methods in the new gridviewer class PHMapper and a bunch of new classes, code, and even some new bits in gmap-polygon which will help data be pulled relatively easily. I'm basically mapping out better ways to associate and pull data (things like keeping the regional data separate from the time-series data so that one can be adjusted without messing with the other) but maintaining all the old structures to keep the old grid-viewer working as an example. Thus there has been some function splitting, but luckily I have managed to avoid duplicating lots of code.

Meanwhile, one of the foibles of GIPSEPoison is that it relied on the maven-built NPDS-Client-jar-with-dependencies... The "Jar with dependencies" is something maven does in case you want to run a client from the command line so that you don't have to mess with classpaths or worry about conflicts (shove the jar with dependencies in a directory and hit go). The dangerous temptation is to include it in something that is not maven-related that still needs the code (like an introduce project, more specifically, GIPSEPoison) because it's one jar rather than 20 (NPDS-client uses a lot of CXF and xml/jaxb jars).

Thus, one of the things I hope to complete early next week is to wrangle together all the jars needed separately and include them in the lib of the GIPSEPoison project. Right now GIPSEPoison does not work on the training node... and while it seems like a library conflict it could be something else. If it's a library conflict, having separate jars will be necessary to fix the issue. Otherwise, having separate jars is good form (it allows other potential conflicts to be discovered easier) and we might just run GIPSEPoison off a different globus container.

So, next week should be some adventures in jar hunting and library tweaking. If GIPSEPoison gets sorted out relatively early and the various installs I need to support go well, I hope to clean up some of the packaging of gridviewer (pull out the java code into a 'ServiceUtils class' leaving grid viewer a pure web project) and gmap-polygon (try and see if more of the data can be cleaned up and made simpler to download/sync) and brush up the documentation.

Cheers,
Peter

Friday, August 7, 2009

Updated sample data on NCPHI node

One of our partner asked for better distribution of Denominator and flu data on our sample node's GIPSEService (https://ncphi.phgrid.net:8443/wsrf/services/gipse/GIPSEService).

So I generated some synthetic data for the 12 BioSense Denominator and flu indicators (EDVisits,EDVisits-LowTemp,EDVisits-HighTemp,EDVisits-UnknownTemp,ILI-b,ILI-b-LowTemp,ILI-b-HighTemp,ILI-b-UnknownTemp,ILI-n,ILI-n-Lowtemp,ILI-n-HighTemp,ILI-n-UnknownTemp) for Jan,Feb,March,April and June.

The values are all random but at least you can generate some ratios and percentages and such (previously we only had about 35 days spread out over the year).

So please check out gridviewer and the service API and let me know if it is helpful at all.

Thursday, August 6, 2009

GIPSEService and GIPSEPoison.

GIPSEService was updated, so I updated the client and other libraries in the PHGrid repository (the old client/common/stubs was version 0.1, the current version is (like the schema) 20090831) and tested the integration. If you are interested in updating to the new versions of GIPSEService and GIPSEPoison, just update your gridviewer and it should be able to connect.

Otherwise GIPSEPoison is now returning state data. I had to adjust how the CXF service was being called, but thanks to Jeremy and a lot of legwork by Chris, we now have service re-sending enabled within a globus service... which is neat!

Tomorrow, we are going to try and deploy a bunch of this new stuff, so hopefully you will be able to pull from GIPSEPoison and the new GIPSEService and see some of the cool changes Chris has been making too.

Otherwise, this marks the first step in The Big Shift of a large amount of the UI code to Chris. He is (at least from my vantage point) excruciatingly good with several web technologies like JSON, REST, AJAX, JQuery, (a lot of which fit under an umbrella of JavaScript) and frameworks. Thus, a lot of my job over the next couple of weeks will be itemizing processes and adding to the infrastructures of geolocation code and the service-wrangling code. The end goal is for me to make a bunch of simple and obvious functons to be called by the architecture Chris is setting up. Thus, many of the persistence and state needs will move into a fancy web controller which will allow for much niftier User Interface stuff (like sliding date ranges and AJAX-populated drop down lists).

I'm looking forward to a lot of the new stuff!

Cheers,
Peter

Easy upgrading GIPSEService to 20090831 gipse.xsd schema

In case you have customized the GIPSEService, you can easily update it to use the http://ncphi.phgrid.net:8080/schemas/gipse/20090831 namespace gipse.xsd by following the few steps below: (thanks to Peter for collecting the details)

Update the gipse.xsd within the ./schema directory with the latest version (available in the sourceforge project)

Replace all the occurrances of "20090630" with "20090831" within the following files:
./schema/GIPSEPoison/gipse.xsd (replaced entirely)
./schema/GIPSEPoison/GIPSEPoison.wsdl
./GIPSEQueryRequest-ForClient.xml
./introduce.xml
./introduce.xml.prev
./namespace2package.mappings

Run "ant clean"

Run "ant all"

Run "ant createDeploymentGar"

Undeploy your old service

Deploy your updated service (created in step 5)

That's it, you shoudln't need to modify any Java code.

GIPSEService update

I just checked in updates to the GIPSEService project to include support for the Age and Service Area (Disposition) stratifiers as well as the Data Source filter.

This has pretty major ramifications as the version of gipse.xsd changed from 6/30 to 8/31. This means that anyone using the source code for GIPSEService should redownload and rebuild their local gar so that the latest version of gipse.xsd is supported.

You can see the latest copy of the gipse.xsd in its sourceforge project.

Taha wanted us to try out google code instead of the usual sourceforge project, so you can grab the latest version of the GIPSEService project through google code's svn. (btw, so far google code is a lot cleaner, but I miss the ability to download a tarball through the browser)

I'll update the NCPHI lab node during Friday's usual deployment, so you have until then to update your clients to use the latest updates.

If we were actually in production (planned for after Sep 1), we would actually version the service and provide a support window where both versions were supported. We're only breaking interfaces because there are no production users yet.

If you don't update, you will see an exception like this:

[java] org.xml.sax.SAXException: Invalid element in gov.cdc.ncphi.phgrid.services.gipse.stubs.QueryMetadataRequestQuery - MetadataQuery
[java] at org.apache.axis.message.SOAPFaultBuilder.createFault(SOAPFaultBuilder.java:221)
[java] at org.apache.axis.message.SOAPFaultBuilder.endElement(SOAPFaultBuilder.java:128)
...

This is caused because the client expects a specific xml namespace to be used by gipse. Older clients will expect http://ncphi.phgrid.net:8080/schemas/gipse/20090630, while this latest version uses http://ncphi.phgrid.net:8080/schemas/gipse/20090831.

This should be a rather painless upgrade for any existing users. If you customized the service code then you'll struggle (but what did you expect, next time contribute your changes back so we can make sure your changes make it into future versions).

For anyone still using the amds* database structure, you will need to update your db. Use the gipse-store project to generate ddl for SQLServer or PostgreSQL. Then you can copy over all your data from amds_extract using a query like:

INSERT INTO gipse_store (date, indicator_id, value, data_source_id, zip5, state)
SELECT date, condition, count, source_oid, zip5, state FROM amds_extract

Tuesday, August 4, 2009

PHMap and GIPSEPoison

So, I went away for vacation last week and when I got back, a lot of stuff had changed, but I feel it will make for better progress.

First up, I am supporting Chris in making PHMap... which may combine some of my service wrangling code and geo-fetching code with Chris' web code. This is fortuitous because Chris has a much better hang of actual web technologies like CSS, Spring, Javascript, and AJAX... and the concepts that make those all work smoothly together like REST and JSON. Thus, the code should be relatively clean and followable and adjustable, which is much harder when you have things like Java Objects which spitting out Javascript/HTML strings (like in my code).

This also means some of my service-wrangling and geo-fetching logic will probably be broken out or otherwise added-to to better enable the paradigm Chris wants to support . So essentially, refactors are upon us, and it will pretty much be the good old process of splitting function A into functions B and C since Chris will only need functon B.... and updating libraries and setting up property-less jar files for easy downloads instead of complicated builds.

Part of me feels a bit sheepish about how all the code sort of ended up mashed together and a lot of MVC lines were blurred... but then I remember that grid-viewer is essentially an evolution of RODSA-dai, which was a teeny little mashup that would have been overkill to load up with separate controllers and lots of different views. Then, suddenly, we needed polygons, and we needed to interact with the NPDS service, then we needed lots more UI tweaks and the ability to hit different services with different metadata.... and I must have been doing something right because I was able to make those modifications without too many large refactors, it's just now I get to deal with someone else actually needing to use it and not just install it, and their always runs the chance that the two paradigms for how something should work will be a bit obtuse, and larger refactors will be needed to accomodate both of them.

The other thing I will be doing mainly is getting GIPSEPoison working. I implemented a client and several tests and found that it is indeed getting data, but the aggregation needs to be tweaked, and some of the zip3 translations need to be taken into account. I think for that I will just split out some of the code I made for npdsgmaps (so it doesn't have any axis components) and make a new library that can be used by both npdsgmaps and GIPSEPoison.

Otherwise, lots of changes, lots of work, lots of progress, one hopes.

Cheers,
Peter

Monday, August 3, 2009

Vector-based maps

Ran into this interesting project out of MIT. Instead of taking image tiles like Google maps, it renders real-time maps in HTML5 and a custom styling language (like CSS and JSON combined) called GSS (Geographic Style Sheets). You will have to use Firefox 3.5 or Safari 4+.

Check out the project page:
http://wiki.cartagen.org/wiki/show/HomePage

Sample:
http://cartagen.org/

Refreshing to see a new approach for browser-based maps. We should definitely look to Cloudmade API as well.

Tuesday, July 28, 2009

White House Mulls NASA / Center for Cloud Computing

White House Mulls NASA / Center for Cloud Computing:

http://www.nextgov.com/nextgov/ng_20090724_6498.php?oref=topstory

Friday, July 24, 2009

8th Australian Symposium on Parallel and Distributed Computing

-------------------------------------------------------------------------
8th Australasian Symposium on Parallel and Distributed Computing
(AusPDC 2010)
Brisbane, Australia, January 18-22, 2010
http://www.cse.unsw.edu.au/~rajivr/auspdc2010/
in conjunction with
Australasian Computer Science Week, 18-22 January 2010
-------------------------------------------------------------------------
Overview/Scope:
The 8th Australasian Symposium on Parallel and Distributed (AusPDC 2010) will be held in January, in Brisbane, Australia in conjunction with Australasian Computer Science Week, 18-22 January 2010.

Scope of the Symposium
AusGrid event has been broadened to include all aspects of parallel and distributed computing and hence it will be called as Australasian Symposium on Parallel and Distributed Computing (AusPDC) from 2010. In both New Zealand and Australia parallel and distributed computing has been recognised as strategic technologies for driving their moves towards knowledge economies. A number of projects and initiatives are underway in both countries in these areas. There is a natural interest in tools which support collaboration and access to remote resources given the challenges of the countries location and sparse populations.

Topics of interest for the symposium include but not limited to:
* Multicore
* GPUs and other forms of special purpose processors
* Cluster computing
* Grid computing
* Green computing

* Cloud computing
* Peer-to-peer computing
* Service computing and workflow management
* Managing large distributed data sets
* Middleware and tools
* Performance evaluation and modeling
* Problem-solving environments
* Parallel programming models, languages and compilers
* Runtime systems
* Operating systems
* Resource scheduling and load balancing
* Data mining
* Reliability, security, privacy and dependability
* Applications and e-Science

The symposium is primarily targeted at researchers from Australia and New Zealand, however in the spirit of parallel and distributed computing, which aims to enable collaboration of distributed virtual organizations, we encourage papers and participation from international researchers.

Best Paper Award:
A best paper award sponsored by Manjrasoft Pty. Ltd, Australia will be presented to a paper receiving the highest quality rating. In addition, a special issue in a high quality international journal will be organized for selected best papers.

Program Committee Chairs:
- Jinjun Chen, Swinburne University of Technology
- Rajiv Ranjan, University of Melbourne

Program Committee:
David Abramson, Monash University
Mark Baker, University of Reading, UK
David Bannon, Victoria Partnership for Advanced Computing
Rajkumar Buyya, University of Melbourne
Paul Coddington, University of Adelaide
Neil Gemmell, University of Otago, NZ
Andrzej Goscinski, Deakin University
Kenneth Hawick, Massey University, NZ
John Hine, Victoria University of Wellington, NZ
Jane Hunter, University of Queensland
Martin Johnson, Massey University, NZ
Nick Jones, University of Auckland, NZ
Laurent Lefevre, University of Lyon, France
Andrew Lewis, Griffith University
Piyush Maheshwari, Perot Systems
Teo Yong Meng, National University of Singapore
Manish Parashar, Rutgers University, USA
Srikumar Venugopal, University of New South Wales
Yun Yang, Swinburne University of Technology
-------------------------------------------------------------------
--
Dr. Jinjun Chen
Lecturer in Information Technology
CS3 - Centre for Complex Software Systems and Services
Faculty of Information and Communication Technologies
Swinburne University of Technology,
1, Alfred Street, Hawthorn,
Melbourne, Victoria 3122, Australia.
Tel: +61 3 9214 8739
Fax: +61 3 9819 0823
Office: EN508a, Engineering Building, Hawthorn Campus
Email: jinjun.chen@gmail.com
URL: http://www.swinflow.org/~jchen/

Thursday, July 23, 2009

GIPSEPoison getting closer

Today I spent some time playing with GIPSEPoison trying to see if I could invoke the service... it seems I could but I kept getting caught up in classpath issues... first needing to import Dr. Jeremy Espino's CXF client for accessing the NPDS service... and then talking a bit with Brian about how he set up the build scripts to include data (properties files) from a "resources" directory in the classpath. So tomorrow I have a good list of things to tinker-with that will hopefully successfully punt data from npds through GIPSE and to the grid viewer.

Then I need to clean up a bit of the fun javascript logic around the magical autopopulating drop-downs (there are some strange inital load issues).

Otherwise, I have gone ahead and made a service registry entry for gridviewer to help aid people who want to install and set up their own gridviewers. I have also added some database definitions to the database loader project to help aid in geodata creation and described their use in that entry. Right now one needs to build and install the gmap-polygon project before they are able to build and use gridviewer, and it has pointed out to me that it's an annoying step that could probably be replaced with a no-props version of the gridviewer jar... so I am going to look into doing that soon as well.

Cheers,
Peter

Tuesday, July 21, 2009

Introduce Metadata working.

So, with some help from Dan and Brian, I got GIPSEPoison deployed to my local dev box... and then I pointed grid-viewer to it... and got metadata back in the standard metadata format (which means that I properly imported some of the metadata code Chris already wrote).

Thus, tomorrow will be sorting out the client side of the code and then, hopefully, GIPSEPoison.

Cheers,
Peter

Authentication failed [Caused by: Unknown CA]

This post is in reference to the following Globus Error:

Authentication failed [Caused by: Failure unspecified at GSS-API level [Caused by: Unknown CA]]

This error is generated when the client does not trust the server CA. This error can be corrected by adding the server CA file (.0) to the Globus client's .globus\certificates directory.

This error is also generated when the .0 file is corrupted. You must verify that the md5checksum is correct on both the remote server and local Globus node. Please see the follow link for an example of this procedure.

http://www.globus.org/mail_archive/gt-user/2007/09/msg00103.html

Introduce Quirks

I am noticing a few strange differences between my GIPSEPoison and the GIPSEServices written by Brian and Chris....

First, when I created my project, the prefix that showed up for the service ([service url]/[prefix]/[ServiceName]) was "cagrid" instead of something like "gipse". Brian and I found that the property "service.deployment.prefix" in the deploy.properties file defines the prefix. Might be handy for anyone else who wants to change their service prefix from cagrid.

Another strange thing I have noticed is that my autogenerated code throws "RemoteExceptions" while Brian and Chris' autogenerated code throw "Exceptions"... I am not sure what quirk caused that or why they are different... Perhaps I got a slightly different version of introduce. But, it all compiles together, and I hope that it ends up working.

Monday, July 20, 2009

GIPSE and Introduce.

Greetings all,

As you can see from a few posts below, Brian changed a few of the elements of what used to be the AMDS service into the GIPSE service. Thus, I spent a lot of time modifying grid viewer to attach to GIPSEServices in several of our local installations (and debugging said modifications). But that didn't stop me from playing with Introduce and GIPSEPoison.

I am hoping I did the setup correctly (and many thanks to Brian for helping me and Chris to doing a lot of legwork with the code before me). This time around we are trying to use the same namespaces and packages with the hope that the client generated for the Biosense GIPSEService will work with the NPDS GIPSEService. The hope is that if you want to create a service, and you import the schema from the wiki/sourceforge/ncphi-site and stick to using the same namespace... it should be readable by the gridviewer.

Right now, I have gotten it to the "building but not deployed/tested" phase for the relatively simple metadata query, and I am going to poke it tomorrow to see if grid-viewer can pull the metadata without "this isn't the operation I expected" errors.

Cheers,
Peter

Thursday, July 16, 2009

Institute of Medicine meeting to discuss application of grid to HIEs

Institute of Medicine arranged a meeting of 30 experts from computer science, informatics, and the health information exchange community to discuss the applications of grid computing technologies to health information exchange. Read comments form Dr. John Halamka:
http://geekdoctor.blogspot.com/2009/07/dispatch-from-washington.html

Updated GIPSEService (finally)

I updated the GIPSE BioSense service so that it uses the 6/30 GIPSE data structure draft.

The GIPSEService can read from any jdbc data source and is configurable by editing the db.gipse.table property to point to the view/table name that contains the GIPSE data.

Also note that if you are upgrading your service, but not your data store (still using amds_extract) you should set the following properties in your gipse.properties file:
db.gipse.table=amds_view
db.gipse.table.column.date=date
db.gipse.table.column.state=state
db.gipse.table.column.zip5=zip5
db.gipse.table.column.zip3=zip3
db.gipse.table.column.condition=condition
db.gipse.table.column.classifier=classifier
db.gipse.table.column.count=count

This service is built to share the BioSense aggregate counts using the required stratifiers (date, state/zip5/zip3, indicator, count) so it will not use age, service area / disposition nor facility even though the schema supports these optional stratifiers.

There is a lot of buzz around adding age, so the next version may end up supporting the age optional stratifier with a couple of age groups.

Wednesday, July 15, 2009

Re-introduce introduce

I managed to fix a few bugs and do a few deployments with grid viewer over the past few days, and it looks like the next thing I get to play with is the GIPSEPoison service... but GIPSEPoison has been built in introduce, and I need to play more with introduce to figure out how those services work and are built and can be modified.

But it will be about the third time I have started to learn introduce. The other two times priorities shifted and I started working on other things... So, perhaps third time is the charm.

I am also going to play a bit with graphics and see if I can plan out concurrent load items.

Cheers!

Monday, July 13, 2009

Client flexibility.

So, one of the little issues that cropped up over the last couple of days is that if you do not really watch how you set up your service in Introduce... there is a very good chance that it will use new packages and namespaces and use an entirely different service in the generated service code.

Right now I am using the client half of the client/server code that was generated by introduce for GIPSE-Biosense. Meanwhile, the client half of the client/server code that was generated by introduce for GIPSE-Poison was different.

Really different. Different packages different.

This means I would have to import both the clients of GIPSE-Poison and GIPSE-Biosense and have logic in my code to differentiate between the servers being called to get data from both... and the idea of having to do that for every service someone wants to generate makes the solution not scalable.

Thus, a big focus over this upcoming week is going to be making a very flexible client. I was chatting with Jeremy about things like Yahoo Pipes and he also just pointed out that the data is XML and I could probably just run things through an XSLT to get data from one point to another... and it was a really good idea. Hopefully, it means that all one would have to do is create an XSLT in case their schema didn't match ours exactly and drop somewhere in the classpath of gridviewer and set a few properties to say "this might need a bit of extra processing" and then gridviewer will be able to display the data.

Also, I am going to be thinking of pinwheels and set-the-URL pushpin styles... but I am not sure if I am going to be able to get to them this week.

Cheers,
- Peter

Thursday, July 9, 2009

Got multiple region searches.

Part of the improvements that will hopefully deploy successfully tomorrow is the ability to have multi-region searches.

So, there will be a new button that says "add to current" which, when pressed, will add data without clearing off the old data. This means you can have all the zip3s in Florida post next to all the zip3s in Georgia in case you want to compare between the two.

Currently, there is no way to differentiate between two searches on the same region (so if you select for all the zip3s in Georgia for one condition, and then add all the zip3s in Georgia for another condition... it will just overwrite the first set)... this is going to be part of a suite of improvements that allow for different sorts of pinpoints (pinwheel) and probably for selecting regions and legends.

But, I think it's neat and I hope you will too (and I actually got it working yesterday but spent a lot of yesterday and today debugging it, as I essentially had to rip apart the gmap-polygon project to get it working (and I branched before I did, thank goodness))

Cheers!

Wednesday, July 8, 2009

GIPSE Store update

I just updated the amds-db project so that it is renamed to gipse-store to update the database structure to support new stratifiers. Now the database structure supports new optional stratifiers of:
age
gender
service area (e.g. ED, Outpatient, Inpatient, etc.)
facility

in addition to the required stratifiers of:
indicator
region (zip5 and/or state)

Note, although the GIPSE store supports these new indicators, the BioSense GIPSE service will still only use the indicator and region stratifiers.

I'm making this change to support storage from the BioSense HIE feeds as well as adding in additional stratifiers that may be used by the CoEs and Regional Collaboratives.

Next up is the change to the AMDSService to rename to GIPSE.

This project has ddl and seed data for PostgreSQL (we use this for dev) and SQLServer (we use this for production). Should anyone require another RDBMS please let me know (or add them yourself).

NHIN Connect 2.1 is in the lab

NHIN 2.1 is fully functional in the lab on a single VM. I like it. It is much nicer than the 2.0 version.

Testing is underway!

A cool little service registry example

I came across the Mebrane SOA Registry yesterday and thought it was a good example of a really lean service registry that is highly functional. It has basic publish/search capability, but I like how it integrates clients for testing services (you can call from the web site) and also digg like behavior of letting users rate services.

This matches a lot of the rhetorics PHGrid has been talking about.

Unfortunately, the source code isn't released yet (they plan to later this year) but I like how this is a thin response to the usually pretty weighty chunk of software for registries.

It also has a really nice little generic SOAP client for testing public services that I find useful for when I don't have soapui installed on a machine. Comes in handy with the NHIN CONNECT test cases.

Hot off the press: Google Plans to Introduce a PC Operating System

Check out this NY Times article here

Tuesday, July 7, 2009

multisets

Gmap Polygon and Grid Viewer are now essentially doing cumulative searches... it's just not displaying them in any particular order... Thus tomorrow will be spent creating some updated pinpoint libraries (pinwheel), modifying the flot-plot screens, adding a "clear map" function... and filtering to make sure that empty data sets aren't added.

Also, there is a distinction for "empty set". If servers are queried and no data comes back, I'll show the pinpoints/polygons as empty. If, however, a search is run without any servers selected.... it will be considered an empty set and not added to the search tree.

Thus, when you first load the page, you'll get a map of the US, but there will be no pinpoints until you actually search for data. This behavior can change, of course, but for now it will help debug the behavior of how data is added.

Then, it's testing to see how it all behaves, and seeing whether a collection of collections will suffice or whether I will have to upgrade it to a map of some sort (a bit more complex to navigate, but probably much better "make sure these are in order" behavior)

HPCCloud FYI

Got this off the gt_users mailing list and thought it may be of interest:

http://groups.google.com/group/HPCcloud

A group dedicated to high performance cloud and grid computing.

Enterprise...

So, I was chatting with a friend yesterday and indicated that I was a bit sad because the next set of changes (modifying GmapPolygon to hold multiple sets of polygons/indicators and writing a new polygon color handler that allows for URL's to be set) will sort of make things ugly and a bit more confusing and threatens to mottle the design patterns even more...

The friend spake (and I paraphrase): "Don't worry about it, it's enterprise code. You go in expecting A... and you end up making A, which eventually needs B after the user sees it and decides it wants B-esque features... but you don't have the time to completely remake it into B, so you make it into a mashup of A and C which works just as good but is not as pretty. So just be happy that A-C is doing good enough and make it into B if you have time before the next deadline." We then went on to discuss how the whole idea that code as a start-do-end thing is more myth than reality... as specifications change and user wishes change and sometimes you change from linux to windows or the tool you need only works with a certain version &c &c.

So yes, The next step is going to be changing the GmapPolygon and the Grid Viewer to work with indexed collections of polygons instead of just one collection of polygons. This will allow for sequential searches (search for all the fever in the US... now search for all the nausea in the US... now search for all the poison-related cellulitis in the US... now do those again for all of Texas...) to be maintained and shown simultaneously on the map until they are explicitly cleared. This means the pinpoint visualizations will need to be changed so that later searches don't show up on top of the older searches (right now I am thinking of using rotating pins... so sequential searches start filling in a pin-wheel). The flot-plotting mechanics are also going to need to change a bit to search on region type, name, and index (thus, it knows to look for the 3rd search or the 2nd search). Eventually the selection mechanisms might also be tweaked to become more intuitive or allow for more specific searches... but right now it's just about getting multiple searches on a map.

I'm sure it will actually not be as ugly or unintuitive as I fear when I'm done, but it is a lot of changes in the name of niftiness,

Cheers!

Monday, July 6, 2009

New NHIN Connect 2.1 released - initial feedback

Officially, the new release of the NHIN Connect software comes out tomorrow. Unofficially, it was located on the NHIN Connect web site on July 2nd. The release home is located at http://www.connectopensource.org/display/NHINR21/Release+2.1+Home even though this folder isn't in the main menu system at the time of this blog entry.

Here are some of the highlights:

1 - NHIN Adapter and Connector are now distributed as ONE pre-configured set-up. No more TWO-box solutions. ONE-box only!

2 - Documentation has been greatly improved. It isn't flawless, but it is definitely better.

For my next trick, I will be installing a NEW adapter+connector in the lab from scratch on a new VM to see a) how the install goes, but also b) how well the new adapter+connector system functions.

More to come...

Thursday, July 2, 2009

3. Authentication (To Chris' Point)

Chris wrote:

The application currently does session based authentication. We could potentially look to Spring for handling authentication. We would gain persistence (remember me), an adaptor for authenticating with OpenID, LDAP, and an easier path to cross-domain authentication if that were to become a requirement in the future. Spring-Security (formerly ACEGI) also supports X.509 certificates.

Our thinking behind grid/globus is to use their authentication framework rather than creating our own, especially leveraging GAARDS (caGRID). Perhaps we need to reconsider this assumption based upon what's happening at ONC and upon our experience to date.

Team, please comment.

Wednesday, July 1, 2009

Enabling and disabling servers.

I spent most of last week ironing out some of the ways that the server disabling and enabling will work.

Thus, as servers get added, if they only have some regions, or (more likely) completely different classifiers and indicators... the drop-downs should change and servers should become enabled or disabled.

Otherwise, tomorrow and next week are going to be a paradigm shift... namely the ability to cumulatively add searches with different indicators, thus being able to see differences in data.

Draft GIPSE Element Description now available

A first draft version of the GIPSE (Geocoded Interoperable Population Summary Exchange) "data dictionary" is now available.

Link here

Comments and suggestions are always appreciated.

-Tom

HealthGrid Berlin Update

Hello.

This is Ken Hall from HealthGrid in Berlin.

You can read and virtually experience the conference at http://gridtalk-project.blogspot.com/.

There is good US participation...

Rich Tsui, University of Pittsburgh
Julio Facelli, University of Utah
Muzna Mirza, CDC (NCPHI)

Mary Kratz, HealthGrid.US
Jonathan Silverstein, HealthGrid. US
Howard Bilofsky, HealthGrid.US
Carl Kesselman, USC
Edwina Barnett, Jackson State University
Raphael Isokpetl, Jackson State University

Friday, June 26, 2009

Training updated

The training node was updated with the new caching gridviewer and the listed schemas were also updated.

In gridviewer, Dallas has been disabled for the time being because it seems to be acting rather flakey (will load fine for a half hour, and then times out). If anything it means I need to work in a good timer function to abandon a load and notify the user after a set amount of time.

Otherwise, gridviewer next week will be given some UI tweaks. I am hoping to get better server allow/disallow logic, multiple loads on one map (with different pushpins depending on the load), and autopopulated classifier/indicator drop-downs that don't require server hits.

Cheers everyone, have a good weekend,
Peter

Thursday, June 25, 2009

Data is caching

Now the data for gridviewer is caching using OS cache. I may have to tweak it a bit to behave better as a singleton, but otherwise it is efficient and simple and seems to make everything quite snappy. Especially now that gridviewer is not referencing servers for metadata in each load.

Otherwise, the server url's are now pulling from the wiki page (or any page you wish to configure yourself). Thus, less database configuration is needed just for gridviewer.

Wednesday, June 24, 2009

Chronicles of NHIN CONNECT, volume 1

For the past few weeks, I have been in the process of becoming an expert in the NHIN CONNECT project. This project contains both the NHIN connect GATEWAY and the NHIN Connect ADAPTER.

The NHIN Connect project is located at http://www.connectopensource.org and version 2.0 is the current release of the code in question (although 2.1 is supposed to be forthcoming in early July).

While I can now configure a gateway and an adapter in under 1 hour each, there are several 'gotchas' that are not addressed in the documentation. Using the "pre-configured binaries' option (versus install-from-scratch), here are the 'gotchas' so far:

==>Adapter and Connect need different machines. I've tested them on the same one and functionality is mediocre at best. The lab has these set-up nicely on 2 seperate but equal machines.

==>Java is set to allocate 1.2 gb of memory from the start. Don't try this on a machine with less than 1 gb of memory... you want at least 2.

==>OID Registration doesn't work as advertised. Fortunately I haven't needed my OID for internall connectivity but when I connect outside of the lab, I will need this.

==>c:\java is hard-coded as java location. Make sure you install to this location. If you don’t some of the NHIN services break with odd errors. The documentation reflects inaccurate pointers to the java locations. And with _some_ of the application hard-coded with this java location, better safe than sorry.

==>The NHIN documentation says in multiple places that port 9080 is the non-secure port and that 9081 is the secure port. DON'T BELIEVE IT! Port 8181 is the secure port.

Thus concludes volume 1 of the Chronicles of NHIN CONNECT. Stay tuned for updates as the PHGRID <--> NHIN CONNECT interoperability testing continues...

Data is combining, now to get it caching.

The data is now combining in the data model... and it seems to be doing it correctly.

Now, I am hoping to get a majority of it caching... and instead of trying to write my own caching, I am going to use a caching mechanism suggested by Chris: OSCache from the OpenSymphony project.

It seems to have generic back-end caching with configurable levels of intelligence and persistence (which is all I really needed)... but it also has the most potential for helping on the front end (like request caching: if a request looks like the exact same request that was sent a minute ago, it will return the exact same HTML that was sent back rather than hitting the server) It also allows for better and more fine-grained error handling.

After that, it's some UI clean up, some javascript niftiness, and testing to see if GridViewer will run in the same area as an AMDS service.

Tuesday, June 23, 2009

A case for using grid architecture in state public health informatics: the Utah perspective

"This paper presents the rationale for designing and implementing the next-generation of public health information systems using grid computing concepts and tools. Our attempt is to evaluate all grid types including data grids for sharing information and computational grids for accessing computation"

http://www.biomedcentral.com/content/pdf/1472-6947-9-32.pdf

New time series handling.

I have adjusted the time series handling of gmap-polygon and grid-viewer so that regional collections hold multiple time series lists (differentiated by server).

Right now, it seems that it is building, but when you open the flot plot, all the data is somehow being tagged with one date, and it is not immediately apparent where or how that data is being set like that. I think I might have run into this problem before and it might have to do with how Java Calendars increments... I hope to figure it out later today or tomorrow.

Otherwise, after that is finished tomorrow, I will implement the new caching structure we discussed with Brian, and start publishing arrays so that the drop-downs for indicators and classifiers can be dynamically populated (as discussed with Chris).

After that, I am going to start allowing for multiple polygon loading... thus you can do one search, then another, and click and compare graphs. I might also look into making histograms in addition to line charts (that way, one can see the different data brought back by the different servers)

maphiv.org

NPR did a piece this morning on HIV statistics in the US and mentioned this website as a source. Maphiv.org appears to be operated by Z-Atlas which states itself as "America's health index and online maps providing information about health and health care in the US." Z-Atlas uses ArcGIS which I am sure most of you are familiar. Though the site requires registration, I think it is worth it to get a look at the UI.

Saturday, June 20, 2009

GIPSEPoisonService

GIPSEPoisonService has been committed to the repository this week. The service has been created using Introduce and then modeled around Brian's AMDSService. I will begin writing test cases next week and hopefully not run into anything major.

Peter and Brian asked me to assist on a few things for gridviewer application. I took a look at the code and the UI have a few ideas around where I think we can add functionality to it in the distant future. Peter has worked through some tough requirements and developed some strong code to handle all the p0lygon/map manipulations.

1. MVC
I think it would be a good step to introduce a controller to the gridviewer. The controller would not only to handle web user interface requests but also to process remoting protocols or generate specific outputs on demand. To handle MVC, I believe we should look to framework's such as Spring, JSF, Struts, etc. My preference is Spring-MVC, especially now that 3.0M2 release will be REST complaint, which is another consideration for the gridviewer.

2. JSON/RSS/XML output
The controller mentioned above would facilitate the development of outputting additional response types per client requests. Reusing all the business logic Peter has developed we could "re-format" the output to a resemblance of the AMDS schema in JSON. The only requirement to consume the JSON request would be a simple html file with a few lines of javascript. We could even output RSS for notification purposes (Spring 3.0M2 has controllers designed especially for this purpose).

3. Authentication
The application currently does session based authentication. We could potentially look to Spring for handling authentication. We would gain persistence (remember me), an adaptor for authenticating with OpenID, LDAP, and an easier path to cross-domain authentication if that were to become a requirement in the future. Spring-Security (formerly ACEGI) also supports X.509 certificates.

4. UI
I showed Brian a mash-up that is a few years old called housingmaps.com. It basically takes craigslist and google maps and creates city-based maps of the current RSS listings. I think it is a good example of making the map the focal point of the UI. By making the map larger and moving the selection components to the right, I think we can really improve the UI.

5. REST
Also I think we should use look to using GET requests which could facilitate additional functionality in the future such as remembering passed searches or allowing users to easily find URLs in their address bar. We should always work towards a REST architecture.

These are just a couple initial thoughts for future development (post-August).