Friday, May 29, 2009

Google Unveils Google Squared and other cool features

"A new effort to cull and table numerical data from Web pages was among several features announced today. Google Squared that creates tables of numerical data culled from searches of websites. In the example given, a search for "small dogs" created a table on different breeds, including data on such things as the breeds' heights and weights, placed inside boxes. Once an initial table is created, users can click on individual entries to check the source and--if the number is erroneous--correct the numbers through new searches. Finally, they can save their customized table for future reference."

I wonder how featurs like the "Google Squared" can be leveraged to take tons of unstructured available health information (or semi-structured, for example H1N1 line-list excel sheets sent from health departments all over the country/world, which are supposedly structured) and present it in a structured way amenable to statistical analysis.

read more: http://www.technologyreview.com/blog/editors/23522/?nlid=2025

Tutorial from Ohio Supercomputing Center

Tutorial from Ohio Supercomputing Center:

https://www.osc.edu/cms/sip/

New section added to AMDS project on Wiki

Added new section to wiki in the AMDS projct with information on IXF and related resources.  IXF is the Indicator eXchange Format - which closely adheres to the concepts of the Standard Data & Metadata eXchange (SDMX) protocol.

--Tom

Thursday, May 28, 2009

Launch of Open Mobile Consortium


Open Mobile Consortium: http://www.open-mobile.org/
Just been launched (May 26, 2009)
"The OMC is an unprecedented collaboration amongst nine high-profile organizations to develop an interopable set of platforms of high-quality open source mobile tools for humanitarian and civil society work." learn more here http://www.open-mobile.org/news/open-mobile-consortium-launches-open-source-mobile-tools-health-and-humanitarian-work

Tuesday, May 26, 2009

This may or may not be of interest to the Grid community.

I received an email regarding Intel's Parallel Studio (see below).
There is more information here:

http://software.intel.com/en-us/intel-parallel-studio-home/

Learn more: www.intel.com/software/parallelstudio


Jim Tobias




Download eval versions of
Intel® Parallel Studio.

Try the new software tools for yourself and build robust Windows* applications for multicore. See why Intel Parallel Studio got high marks during beta from industry leaders and C/C++ Microsoft Visual Studio* developers worldwide.

• Intel® Parallel Composer: Speeds up software development by incorporating parallelism with a C/C++ compiler and comprehensive threaded libraries. By supporting a broad array of parallel programming models, a developer can find a match to the coding methods most appropriate for their application.

• Intel® Parallel Inspector: This proactive bug finder is a flexible tool that adds reliability, regardless of the choice of parallelism programming models. Unlike traditional debuggers, Intel Parallel Inspector detects hard-to-find threading errors in multithreaded C/C++ Windows* applications and does root-cause analysis for defects such as data races and deadlocks.

• Intel® Parallel Amplifier: Assists in fine-tuning parallel applications for optimal performance on multicore processors by helping find unexpected serialization that prevents scaling.

Helping Dan debug Windows Container bugs

I've been helping Dan with some of the Windows bugs since I'm using a windows container to redevelop the AMDS services for the beta release.

I came across this bug when I tried globus-start-container.bat:

Failed to start container: Container failed to initialize [Caused by: Secure con
tainer requires valid credentials]

I thought I could fix this by running grid-proxy-init.bat prior to starting the container. This let Globus start up without error, but whenever I tried running a secure service I got an exception like this:

Error: ; nested exception is:
org.globus.common.ChainedIOException: Authentication failed [Caused by:
Operation unauthorized (Mechanism level: Authorization failed. Expected "/CN=hos
t/somehostname.cdc.gov" target but received "/O=PHGRID/OU=phgrid.net/OU=Globus Public
Health NCPHI/OU=phgrid.net/CN=someuser/CN=someuser")]


To fix this, you need to start up globus and with the "-containerDesc" parameter specifying a valid security_descriptor.xml (like the one found in $GLOBUS_LOCATION/etc/globus_wsrf_core) that specifies the location of the container cert and the container key. So your command would look something like "globus-start-container.bat =containerDesc c:\foo\security_descriptor.xml"

PS- For anyone trying to use the AMDS alpha, for the sake of debugging please wait for the Beta release on 5/31. The 5/31 release will re-write and fix a lot of what may be annoying anyone trying to use the current gar.

Troubleshooting Globus Windows Authorization Errors

We were able to get CSC to the point where they are able to start the container and access the CounterService locally. This means that they have a functional grid node.

The problem we are facing now is the ChainedIOException error that is thrown while accessing the CounterService on the NCPHI node. I provided CSC with a copy of our Server.xml in order to synchronize our configurations. I also had them update the global_security_descriptor.xml with the correct information that allowed them to start the container.

We are still researching the root cause of the problem.

Friday, May 22, 2009

AMDS Draft Schema Work

After some productive conversations with Bill and Ian from the University of Washington CoE, I've made some changes to the forthcoming May 31 draft of the AMDS schema. The changes are checked into SVN and can be viewed here (schema), here (metadata response example), here (request example), here (response example).

Please take a look and let me know your comments. This does a few things:
1) It flattens the xml structure as much as possible. Simplicity is good.
2) It allows for more fine-grained geographic regions to be selected. We now have wildcard (*) and parent qualifiers so you can stratify your result however is appropriate.
3) It adds additional, optional stratifiers of Age, Facility and Service Area (called "bucket" by BioSense). These optional stratifiers won't be supported by all AMDS services/providers/publishers but they are now in the schema to allow for it. DiSTRIBuTE supports these stratifiers.
4) Renames "condition" to "indicator" since AMDS can really be used to query and report any population indicator as definied by the AMDS provider.

I plan on using this schema for the service operations on the AMDS-BioSense beta service (due May 31).

One thing that is becoming apparent is that we will need a good registry / registry services to capture all the metadata for the indicator classifiers, age classifiers, service area classifiers, facility classifiers, etc. that are used by all of the AMDS services. I'm noting this now, but it won't be built for a while (for the typical reasons).

Finally, I'll add that this is a very DRAFT schema and will likely be updated again between now and PHIN (especially based on your comments). We will also change a lot once the science side of the house determines what standards we need to meet (Ken is working on this with the epidemiologists).

Thursday, May 21, 2009

Metadata niftiness

Grid viewer now has servers which are only selectable if the regions selected are chosen. Drop downs are now more based on metadata and not old quicksilver facets.

The next steps will be expanding the conditional handling, futhering the migration from a single-pane to two-panes with the top focusing on metadata and selections and the bottom focusing on the map and repeated queries.... and dealing with new servers and services and making sure all the metadata performs as expected (and probably a few paradigm shifts to allow changes in what 'expected' is).

Thinking about Laboratories and PHGrid




After giving a presentation this afternoon to the PHIN Laboratory Messaging Community of Practice, I worked with Brian to diagram a Proof of Concept that would examine introducing a suite of laboratory services on the PHGrid infrastructure - leveraging the significant work that has taken place in the development of STARRS (Specimen Tracking and Results Reporting System) and LUNA.




Wednesday, May 20, 2009

AMDS Technical updates

I added a project called db-importer that can be used to import CSV (and eventually xml) AMDS extracts into an AMDS Store database based on the schemas we're using at CDC. This isn't the fastest or best importer, but it works on MS SQLServer and PostgreSQL and will be used to bring in daily report extracts from systems like BioSense.

I also talked a bit with Vaughn to transition over his projects. I'll be fixing up / re-writing AMDSCore and AMDSPoison over the next few weeks in time for the PHIN Conference until we find another Java developer.

Finally, we had a productive meeting with the PHIN Messaging steward and team about ways to directly collaborate in the solution development of PHGrid, PHINMS and PHIN-SRM. Nothing major was covered, but lots of good ideas that we'll follow up on in the near future.

Tuesday, May 19, 2009

A few things...

I have been busy with a few things over the past couple of days:

I upgraded GridViewer and Quicksilver so that they used timers (so that one can see which proportion of waiting lies in service calls or data rendering) and had an option for plotting pinpoints instead of polygons (pinpoints being that much faster and clear-cut, polygons looking neater and showing better spatial orientation). Quicksilver also has a link to metadata for servers so you can see what data is avialable from a particular server.

Right now I am working on making grid viewer behave based on what the servers can provide, this means tailoring the lists (states, zip3s, zip5s, conditions) so that one can only select regions that a server can provide data-for, and not everything in the region-relations table or conditions properties. I am hoping to have all the regions and the conditions selecting appropriately tomorrow, that way it will become much less confusing which regions have been asked-for.

After that, more User Interface niftiness is planned. Including hover-over for metadata stats, and a server list that changes based on the selected regions and conditions, but first I want to get the service pulling back data more reliably.

Cheers!

Friday, May 15, 2009

PAOH-BIG Notifiable Disease Sharing Grid Application

We created a video of the early stages of our notifiable disease sharing application.


The application uses Globus for service deployment, Introduce for service stub generation, Dorian for authentication, Grid Grouper for service and method level authorization and the Credential Delegation Service for authorizing a proxy to access data on the client's behalf. We also implemented an additional layer of authorization in the application so that users can only see cases that are relevant to their jurisdiction



http://www.youtube.com/watch?v=GjoU82i5Vac

Thursday, May 14, 2009

Thanks everyone!!!

I may not say it enough, but I am extremely proud and privileged to be a part of this effort.  

I would just like to thank everyone (the NCPHI grid team and all of our national and international colleagues) for all your support and efforts.    Thank you for your posts on the blog, your submissions to the wiki, your code development (apps, viewers, services, etc), your entries in sourceforge, your artistic diagram creations, your use case documents, your abstract and conference submissions, your presentations, and your white papers. (..and the list goes on).

We've made truly incredible progress since our first post on Wednesday, November 14th, 2007 at 10:37 AM.  

What a ride.  Thanks again everyone!

Google Maps and privacy

I get asked a lot about how Google Maps works with our data (like the grid viewer, quicksilver, rodsa-dai, etc). Specifically, concerns about privacy and security of sending your data to Google.

There is a popular misconception about Google Maps and how it handles data. Way back in the olden days (~2006) the easiest way to use google maps was to hand google a KML file and google would draw it on their maps site. This was convenient but involved passing your data points to google within a KML file.

PHGrid does NOT do this. We use the Google Maps API which sends no data on to google maps other than requests for map images. No counts or conditions or calls or any data is passed to the google servers. All that stays on PHGrid web servers and users' web browsers. All of the polygons and pins are drawn using Javascript without passing data points back to maps.google.com.

So from a privacy concern, Google can tell that a random user is requesting a map of the state of georgia (and scrolling and zooming) but will not know the polygons drawn, the pins placed, the counts that affect the colors or any of the actual AMDS data.

Wednesday, May 13, 2009

NHIN and PHGrid Interoperability Testing

The NHIN 2.0 'connect' and 'adapter' packages have been obtained. The lab now has two machines ready for usage for testing (thanks TS!) so now we get to digest the NHIN source code and install it to prepare for testing PHGriD <--> NHIN interoperability.

Pinpoints and Polygons

It has been requested to have the ability to display both pinpoints (located centrally) and polygons... namely because some zipcodes are a bit confusing and have multiple polygons.

I have spent a good part of yesterday and most of today enabling that functionality. It involved modifying several handlers and creating new classes to execute interfaces based on what was selected... Luckily, I did not have too many incidences of "oops, not OO enough", so it went smoother than I thought it would.

Otherwise, this is part of a host of speed-increasing tweaks. One of the other adjustments I am planning is to place timers throughout the application structure that both Quicksilver and Grid-Viewer use so that execution times can be displayed and give everyone a better idea of where time is spent between data loads.

After that, it's enabling the metadata in grid-viewer, and beginning the transition to an "available versus selected" framework which will better show that only certain regions will be selectable.

Tuesday, May 12, 2009

Deploying Grid Trust Service (GTS)

The CaGrid Grid Trust Service has been deployed on the 1001 internal grid node. At this point I still need to deploy the GAARDS UI in order to test the installation. The install was done using the cagrid-installer jar, which takes care of downloading dependencies and deploying Globus to Tomcat.

For future reference, GTS can be installed from source much faster than going through the entire CaGrid install process.

Monday, May 11, 2009

KNIME

KNIME is the Konstanz Information Miner: http://www.knime.org/

It looks like a workflow tool such as TAVERNA and seems to integrate R, Python, in an Eclipse-like environment....

KNIME FEATURES: http://www.knime.org/introduction/features

KNIME SCREENSHOTS: http://www.knime.org/introduction/screenshots

KNIME SCREENCASTS: http://www.knime.org/introduction/screencasts

Getting Started: http://www.knime.org/documentation/getting_started

Friday, May 8, 2009

Planned updates.

Getting data into gridviewer has been a bit cantankerous... but now that it's there I know what I want to happen over next week.

- Update the structure of the grid-viewer to deal with items on an available versus selected. Having all the states show up when the service only allows for the Southeastern ones is going to be confusing... Thus, limit the possible selections to what is available. The same goes for services (if metadata could not be fetched for a service, don't let it show up as an option). The other added bonus is allowing for multi selection of regions... thus, you can pull data back for zip3s in two different states.

- Update the graph popups to populate zero values in a time series, as opposed to having them populated upon a service return. It's a bit complicated and behind the scenes, but it essentially moves some workload around and should speed up the client by having it do less work upon return.

- Start working with diversified clients. Things on different computers. Things behind different servers. Things in different states.

- Add bullets, and server return versus view render times.

- Have a histogram view, and update the time series to allow for differentiation of data source.

Either way, I am excited.

AMDS Data Store

I checked in the first batch of database definitions for the AMDS Data Store to SVN. This is basically the DDL for all the tables, views, triggers, etc. that we're using for the AMDS Store that will be populated with, initially, synthetic BioSense data. This will help out anyone else who is looking to build their own aggregate store.

I'm also starting a list of conditions and classifiers that we'll use internall in the AMDS Store but will be useful to any others developing AMDS services.

Eventually this will have DDL for postgres, any other dbs we can create scripts for by request.

Thursday, May 7, 2009

North Carolina is changing colors.

So, I finally got data pulling into the map.

It's just North Carolina, and it doesn't show a graph, but it pulls data.

I had to do a lot of debugging and found that the service is very particular about regions within the boundaries being passed in, and I had to build the list of available conditions... and I know there is a lot of tweaking and changing to be made in order for things to start behaving... This helps map out the ways the application will need to act as well (no selecting regions that aren't available, etc)

But hey, AMDS data in polygon(s). Yay!

Wednesday, May 6, 2009

AMDS Architecture clarification

After talking with Tom, Barry, John and Ken I thought it would be useful to clarify some of the ideas we have around the AMDS architecture. You can view the full page on the wiki, but I'll describe the concepts a little here.

Option #1 for AMDS is what we've typically be dealing with. A grid node is installed as a publisher and the publisher runs an AMDS service to query the biosurveillance database and return the counts based on the query. This requires hosting a service and getting your IT organization to open up port 443 for internal connections.
AMDS Publisher Architecture

Option #2 for AMDS is what we're calling the producer-collector scenario. Some partners cannot/will not host a service. To accomodate this, they generate AMDS reports containing only aggregate data and transfer it to a collector node. CDC doesn't want to be the collector node, but they may need to be to demonstrate the capability. These periodic reports are received and loaded into an AMDS store.
AMDS Producer-Collector Architecture

And if we put these both together, we see that producers send data to collectors, collectors then store the data in a database where a publisher can access it. Of course, there will be publishers that connect directly to biosurveillance data sources.
AMDS Full Architecture

Tuesday, May 5, 2009

Client data in grid viewer

I managed to get AMDS client data in grid viewer after importing several libraries and manufacturing a local temporary client of my own. I am feeling much better now.

The next steps are to get that data on a map... and to get the client out of the "localtempclient" into a cool multi-client that adaptively uses different connection methods depending on the URL provided.

Also, I anticipate a lot of mavening and a lot of jar hunting over the next couple of days.
Ok it’s been a month now since I have been on the team and I have completed my first service. Following my philosophy of service development, I have now checked in the code under AMDSCore and can be seen here. I have created the complete service implementation that is based on the common or core components. Additionally I have provided a client implementation that is compatible with the provided service. Detailed instructions for building, configuring and running the service are here. Currently, the service is not using security. I will add the security this week and be ready to deploy to the training node this week. For those who just want to try out the service without compiling you can go here to download the GAR.
This service was developed for several types of scenarios in mind and are:

· Connecting to the service while on the same machine
· Connection to the service over the internet

Since the Grid Viewer, the component that Peter is developing, will be running on the same machine, it make more sense for him to use the components geared for connecting to the core service implementation so that it does not have to incur the overhead of the full client implementation. Yesterday we were trying out the client but not having all the environmental variable set correctly on his machine, caused us to run into issues. I am hoping Dan will work with me to get this straighten out. Over all this stresses a need to get a staging internal grid node set up. I am hoping to work with Dan to get this going this week. The result will make our efforts more efficient.
For others who wish to connect to the service directly via the Grid infrastructure, the client implementation is best used scenario. Again this component is included in the distribution. All of these descriptions are captured in the AMDS Biosense Wiki. More to come....

National Biosurveillance Model (Summary / Grid Perspective)


Please review and comment on this 1st draft. Add your comments directly to this post. We are attempting to build a conceptual framework around national biosurveillance from a summary and grid perspective to help us and others understand how and why we think distributed models scale and work for public health. (Version .01)

CAGrid Transfer Service has been redeployed

The CAgrid Transfer service has been redeployed and tested on the training node. I had to rebuild the Globus 4.0.5 container from source in order to restore the missing library files. A backup copy of this new build is stored in the /usr/local/clean-4.0.5 directory. Currently all Globus services are functioning within normal operating parameters.

Monday, May 4, 2009

Upgrading Globus on Training

As you look at the training node, you will notice two new Globus directories in /usr/local. These directories are globus-4.0.8 and globus-4.2.1. Both are upgraded builds of Globus that are currently being tested on the training node.

The Globus 4.0.5 installation will remain intact until testing has been completed. When testing is completed, the system wide profiles will be updated to run the upgraded Globus installation by default.

In other news, the CA Grid Transfer service has been undeployed from the current installation of Globus for troubleshooting purposes. This service will be redeployed to the Tomcat container after the issue is resolved.

Current Error:
avax.servlet.ServletException: Failed to initialize 'cagrid/TransferServiceContext' service [Caused by: gov.nih.nci.cagrid.introduce.security.service.globus.ServiceSecurityProviderImpl]

Gar issues...

Today Vaughn and I tried to get the client for the amds service on my machine so we could develop against it. It didn't work.

First we tried to build his code on my machine, which failed because my machine is not his machine and therefore had a slightly different version of globus... or ant... or user setup... or some other variable that is yet to be discovered that usually pops up during the first couple of attempts like this.

Then we tried to bring over the gar he had built on his machine... and deploy it... only to have the client blow up whenever I tried to run it citing missing class exceptions... despite the fact that all the classes that should have been in the gar.

So, I am still wanting for an AMDS client, and the resulting checklist of code and build and configurion points is long and mildly upsetting. I'm wondering if making a quick little AXIS or CXF service would be worth it just so I could develop against the resulting classes and client examples while the Globus connections were ironed out.

I'm sure it will get sorted out eventually, but when I was working with Quicksilver, I had the benefit of having the service from day one. This time, I have build out a lot of the grid viewer on assumptions of how the client and service will behave, and I worry about increasing the chance of needing to refactor things once it gets booting and connected because my assumptions were off.

Oh well, growing pains. We'll get through it... and I'm sure what we make will be rather cool. It's just "that phase" in the project where things are going wacky all over the place when one was hoping they would just fall into place and it's extra difficult to take the calming breaths you need to collectedly analyze the situation.

International Grid Accreditation

An international consortium called the Americas Grid Policy Management Authority (TAGPMA) has released an updated set of standards for international trust within grid communities in north, central, and south America.

TAGPMA has established several grid 'profiles' that establish accreditation criteria for participating institutions. And as recently as this past weekend, they released their Minimum Integrated X.509 Credential Services (MICS) which defines the backbone of what CA's can be trusted. Think of it as an accrediations for CA's.

Information on the authentication profiles can be found at http://www.tagpma.org/authn_profiles .

Also of interest are links to the EUGridPMA and APGridPMA which are located on the main page at http://www.tagpma.org/. All of these participate in the International Grid Trust Federation located at http://www.igtf.net/ .

So many links so little time. Start here if you only click one: http://www.igtf.net/ .