Tuesday, March 31, 2009

Yay for the Douglas-Peucker algorithm

So, encoding the polygons worked. The map now displays much faster and without any of the nasty "this script is taking a long time" errors that were plaguing it before, even in IE.

Firefox (and especially Chrome) are still faster than IE, usually with little things like scroll-zoom, but IE is no longer the annoying almost unusable experience it was before, which is good considering a lot more people use IE.

Otherwise, we are about to code-wrap version 1 for a deploy to the SDN, so the most recent dev build should be ported to training tomorrow morning. In addition to the polygon improvements, the passwords will be updated (and you'll have to contact Brian Lee at fya1@cdc.gov to get a new one) and outliers will be much harder to find considering I fixed the logic with the evaluation part of the C2 algorithm. There are some other small UI tweaks and I can't wait to show them to people.

This project has come a long way, and all the things I thought were near impossible turned out to be rather easy (consequently, all the stuff I thought would be easy turned out to be much more annoying than anticipated). Either way, I am excited.

Cheers!

Updated AMDS Draft Schema

I went ahead and formalized a lot of the talks around the AMDS schema we've been having this winter and spring and updated the schemas on the wiki. Although this namespace is 20090330, it really has been around since January and isn't a major change from the v1 that was posted back on December 15.

The amds schema boils down to just two messages for the service:

  • MetadataQuery/Response - Returns the appropriate metadata for each service. This exists so that clients and registries can determine metadata through a runtime API rather than rely on an administrator to manually enter.

  • AMDSQueryRequest/Response- This takes in a query of conditions, regions and a date range and returns an array of counts by condition, region and day.



This is much smaller than the earlier version since we may as well start with a single operation.

The MetadataQuery is built into the service spec so that a user or registry can programatically check for what capabilities a service provides. Thanks Jeremy for this idea as it's a lot easier than trying to keep the service registry up to date manually.

Monday, March 30, 2009

BioSense AMDS Extract

John, Tom and I met with the BioSense data team to discuss the best way to provide aggregate BioSense data for AMDS. Until we can prepare an automation routine to determine aggregate data, the data team will generate a trailing 30 day report for the 11 syndromes (using the Chief Complaint bucket for Real-Time sources and the Outpatient Final Diagnosis bucket for VA and DoD).

This report will be parsed and loaded into the BioSense extract database for querying using the AMDS service for BioSense.

All this is near future (before June) kind of stuff. So we are getting rather close to being able to pilot AMDS-BioSense securely in a proper production/staging environment.

The release of AMDS-BioSense would be the second step in releasing the BioSense Grid Publisher (the first being the planned release of the Grid Publisher node).

Encoded Polygons draw and popup

Most of today was spent making a series of UI tweaks that were requested. Now the legends are adjusted a bit, some extra labels were added, and a link to the help page is on the development version.

I wanted to play with the encoded GMapPolygons before I went home, so I made sure I could replicate the functionality that I already had, and made thus, with the help of Alabama, I now know I can place an encoded gmap polygon with the same click-for-popup and color/shading properties.

Tomorrow will be spent enacting this. I am probably going to try setting up on-the-fly translation, and then work on storing encoded polygons in the database (namely because I can think in my head how to do an on-the-fly translation and there are some tricky bits with database storage of dual strings that I don't want to start there...

Either way, here's to hoping for a vast improvement in performance by tomorrow afternoon.

BioGrid Australia - Health through information - New Site

BioGrid is a unique platform for life science research teams to access and share genetic and clinical research data across multiple organisations in an ethically approved and secure way, using the World Wide Web.

http://www.biogrid.org.au/wps/portal

Overview (Video)
http://au.youtube.com/watch?v=9US57ZhGxeo


Sunday, March 29, 2009

Service, method and input parameter authorization using GAARDS

At the Real-time Outbreak and Disease Surveillance Laboratory we've completed our first sprint to demonstrate input parameter authorization using the GAARDS infrastructure. In lieu of the security requirements of our Pennsylvania Ohio Biosurveillance Grid (PA-OH BiG) project when sharing notifiable disease data between health departments, we implemented input parameter authorization into our notifiable disease data grid application using Dorian, Grid Grouper and Introduce. This was motivated by our belief that it would be infeasible to create an additional service or service method (i.e., programming instead of configuring) every time a different set of valid input parameters for a different person/group were to be authorized.

GAARDS (i.e., Dorian and Grid Grouper) already provide authentication, service authorization and method level authorization but what we have done allows health departments to maintain extremely fine grained authorization using the same infrastructure. For example, we can now define a security group that is only allowed to make queries for data generated by bordering counties of a state. This group is defined by Grid Grouper and authentication is maintained using Dorian. Local mappings of user common names or groups to valid input parameters are maintained in the application using the RODS 6 data model.

We are really liking Dorian because local organizations need not maintain the credentials of foreign users if they have a trust relationship. Local nodes always know who (by common name) is accessing their services and the local nodes maintain local access control to their own data.

BTW: In the process of architecting this we spoke to the Justin Permar and the other caGrid folks over at Ohio State University who built Introduce. We thought we needed to modify the Introduce code but they were able to clearly explain why things are setup the way they are. Thanks to OSU.

Friday, March 27, 2009

PHGrid Architecture

Moses, Charlie, Vaughn and I met again to revise the PHGrid Architecture models. We're now up to 0.5 and the good news is we've reached consensus on these four models.

I'll schedule some time with NCPHI leadership next week to present our models. But we're still looking for any feedback on the models.

Polygon encoding; or how to make your map of the US load faster

So, the problem we have right now that is causing a lot of slowdown (and annoying "this script is taking forever... continue?" errors) on a lot of browsers is that the polygons we have are too complex.

Things like Colorado or some city zipcode are simple enough... but states with long coastlines (California, Florida) or bordering rivers (Mississippi, Illinois) tend to have polygons with hundreds or thousands of vertices because of all the little crenelations that nature happens to draw on the country. Furthermore, even things like Colorado have zips that border rivers, and the end result is a browser having to download and process a LOT of javascript. At least 95% of the massive page draw is arrays of longitude/latitudes.

Google, however, represents these lovely little things called encoded polygons that take these thousands of points and turn them into two simple lines of text. I think they are the key to removing errors and speeding everything up. There are very in-depth summaries of encoded polygons and how to make them from lists of points here. I will try and explain my perspective anyways.

First off, the encoding is two dimensional in the data they store (hence two strings). The first string is a compressed representation of the points that make up the polygon, which saves lots of space because numbers are very easy for computers to de/compress. The second string is an indicator of which points should be displayed at what zoom levels. Thus, if you are zoomed way out (viewing all of the US) you don't need all the individual points on a river because, well, there could be 15 of them in one pixel of your monitor.

There is also a very neat algorithm that automatically determines the levels, and it is demonstrated here.

The best part, Google understands encoded polygons. Thus, less javascript processing for the browser, and since it's simple compression and algorithms, the server shouldn't have to spend many milliseconds converting the data on the fly. (and even if it does, it will only need to be done once and then cached).

I am hoping I can get this implemented Monday. They even have java ports of the encoding algorythm. I just hope it's rather straightforward, there are a lot of little extra usability tweaks to be made too and the deadline for this version of Quicksilver is relatively soon. And who knows, it might not help speed or errors that much.

But I have great hopes, and I think it will.

Cheers

Quicksilver updates

This afternoon we had a very productive session with the BioSense BIC/Epi team to review the latest build of Quicksilver.

They gave a lot of good feedback that fell into two categories: easy changes (cosmetic) and more complicated. The complicated changes include modifying the map to color the cloropleth based on number of outliers per period (rather than the arbitrary count ranges currently used). We're going to work on this after Dr. Tokars' team is able to analyze the data to find the correct break points for what is significant and what is not.

The easy changes have been made into a tracker item. Here's the list as submitted by Peter Hicks and Steve Benoit:

  • Script error issues for every query requested

  • For clinical effects, listing in alphabetical order would be helpful

  • In legend scale, first option should be 0 (not less than 0), 1 to 10, 11 to 20, above 21. Currently, the categories overlap

  • Label what these numbers represent in the legend (visits, calls, etc. ?)

  • Adjustable legend option should be called “customize breakpoints or classification”. How to enter the customizable option is not clear or self-evident.

  • Queries are extremely slow

  • Calendar allows you to pick future dates – should have date limits

  • Colors in legend don’t align with colors on map.

  • A help section that describes the module and it’s components would be useful

  • Call out box when mousing over state shows a time series – the average line is not an average for the time period selected. The outlier box is not clear. How do you define an outlier?

  • Unable to understand this visualization and chart. We selected a region and got the call out box below. Was there 1 call or 797?

Deploy is complete. Come revel in new features

So, I have completed another deploy (well, two deploys actually) of Quicksilver.

You can reach it here, and you can read more about how it was built or how to view the code here

One of the new features is the "remembered zoom"... where the zoom for a selected region will be maintained if only the legend, timespan, or conditions change. Changing the region (from MD to IN, from MD to the 208 zip 3, the 208 zip3 to a view of all states) will revert to the default zoom level for that region type. But now, if you had to zoom in on Rhode Island... you won't have to zoom in again after selecting for a different condition or widening your search.

Another new feature is the adjustable legend... meaning that you can change the difference in the count numbers that determine the colors for polygons.

Finally the search dates default to the current week. This is not going to be that useful on the training node because we are using test data that only goes up to about October 2008, but when Quicksilver gets deployed to a production setting and starts getting access to more recent data, it will be much more helpful than a always starting on a random week in February 2008.

Meanwhile, lots of people had a good long look at the app today and came in with lots of feedback. It was both wonderful (because lots of people liked the application and thought it was neat and I got some ooohs and aaahs from things like flot) and terrifying (because I was worried it would break, people found ways to make the app do strange things, and because people tried things and thought of features that would be insanely cool to install that I never would have dreamed of). Having a bunch of users that are not that familiar with the application generates a LOT of very good feedback and questions. People were getting confused over things that, in retrospect, are not very clear at all. Today literally involved a large explosion of possibilities and potential, and it's as paralyzing as it is motivating. If anything because it's difficult to triage what should be done first.

So, next week will be a lot of implementations of little and big fixes. There are some very salient UI tweaks to be made (Like labeling more clearly, having "zip3: 208 // Total Count: 350" is a lot more handy than just "208 // 350") lots of little help pages and legend explanation (namely how the C2 algorythm means the blue average line and outlier status is based only on the average of the preceding 30 days (minus the closest two)) and finally, an attempt to make the polygon drawing much more streamlined to get rid of the really-quite-irritating "This script is taking a long time, do you want it to continue" error thrown by IE, which is exacerbated by having a not-bleeding-edge computer.

I think I have found a way to do that, and it's called polygon encoding, and I'll be detailing that in the next post.

Either way, I am elated and looking-forward to how nifty we can make this application.

Cheers,
Peter

New Report: Envisioning the Cloud: The Next Computing Paradigm

Cloud Computing Meets Washington: Lots of Data Security and Privacy Questions
Last week Bernard Golden was invited to participate in a cloud computing panel at the Newseum, located between the U.S. Capitol Building and the White House. The Washington D.C. event marked the release of a new report, Envisioning the Cloud: The Next Computing Paradigm.  The two report authors are Jeffrey Rayport and Andrew Heyward. Rayport is a former Harvard Business School Professor who currently chairs Marketspace LLC, which provides digital strategy consulting services and is part of the high-end strategy consulting firm, The Monitor Group. (Rayport also coined the term "viral marketing.") Heyward, the former head of CBS News, serves as a senior advisor to Marketspace.

http://www.marketspaceadvisory.com/cloud/Envisioning-the-Cloud.pdf



Thursday, March 26, 2009

Don't have memory zooms yet, but we do have caching and default start week.

I spent a long amount of today finding various ways that trying to selectively save a user's zoom wouldn't work.

The basics are there, I managed to get the user zoom saved to the page, so the app knows what the user chose before clicking submit time, it's just getting everything lined up so that the app only defaults to the previously selected zip when everything locationally stays the same... throw in the fact that I never really designed the application for that much state awareness from the get go (because I never anticipated someone wanting to save their zoom level) and it's being more difficult than it should and taking much more time than I feel people thought it would.

But, I took a break from that, and got some caching of the NPDS service values working. Now, the service doesn't have to be hit if a call was already made previously, that should help speed things up.

The other thing I implemented was a "default to current week" feature that was requested.

Tomorrow is the deploy. I might brainstorm some nifty way to enable saved zooms before the deploy tomorrow morning, but I can't make any promises. After the deploy I have blocked out some time with other developers to start finding ways to speed up the Quicksilver experience in IE. We have some ideas, we just need to go ahead and try them.

Cheers!

Integrating caGrid and TeraGrid

For those who want to learn about Grid-Grid integration efforts - you may find this of interest.

http://magnet.c2b2.columbia.edu/AnnualReport/Y3/Papers/teragrid_2008_submitted_0604_final.pdf



Valve Issues

I tested deploying the Globus Java Core (Windows Version) with the latest version of Tomcat. (6.0.18) The verdict is, it does not seem to play well with the new software. I received the following error when I tried to start the container:

SEVERE: Begin event threw exception
java.lang.ClassNotFoundException: org.globus.tomcat.coyote.valves.HTTPSValve55

This error is the result of using the HTTPSValve55 in the server.xml configuration file. I'm currently tracking down which valve replaced HTTPSValve55 in Tomcat 6.0.18.

new Open Geospatial Consortium standards

Carl Kinkade posted this on the GIS listserve today:


Wayland, MA - The Open Geospatial Consortium, Inc. (OGC®) announces adoption and availability of the OpenGIS(R) Web Coverage Processing Service (WCPS) Interface Standard. The WCPS specification is available at http://www.opengeospatial.org/standards/wcps.

The OpenGIS® Web Coverage Service Interface Standard (WCS) defines a protocol-independent language for the extraction, processing, and analysis of multi-dimensional gridded coverages (see http://www.opengeospatial.org/ogc/glossary/c coverages) representing sensor, image, or statistics data. Services implementing this language provide access to original or derived sets of geospatial coverage information, in forms that are useful for client-side rendering, input into scientific models, and other client applications. Further information about WCPS can be found at the http://www.ogcnetwork.net/wcps WCPS Service page of the OGC Network.

An online demonstration of WCPS with 1-D to 4-D use cases encompassing environmental sensor data, remote sensing, geophysics, and climate data is available at www.earthlook.org.

The OGC® is an international consortium of more than 370 companies, government agencies, research organizations, and universities participating in a consensus process to develop publicly available geospatial standards. OpenGIS® Standards support interoperable solutions that "geo-enable" the Web, wireless and location-based services, and mainstream IT. OGC Standards empower technology developers to make geospatial information and services accessible and useful with any application that needs to be geospatially enabled. Visit the OGC website at http://www.opengeospatial.org/.





http://spatialnews.geocomm.com/dailynews/2009/mar/26/news4.html

Wednesday, March 25, 2009

Little Quicksilver Usability updates.

So, now that there is an adjustable legend, I have added a real legend to better explain what the colors mean, and hidden the adjustable bit.

Otherwise, I spent the rest of the day figuring out a way for the app to save the users selected zoom level. That way if they just changed the dates or the search condition, but were zoomed in on a particularly tiny zip3 or zip5, the app would "remember" the last zoom level rather than forcing the user to zoom in from the default again.

Otherwise, I have been talking with Brian about ways to make the application faster. One is to cache averages and standard deviations... and the other is to have a servlet that returns polygon details and eval the polygon additions in a loop eval style. Hopefully this will allow IE users relief from the "This script is taking a long time" error.

Open Source tool to Record Desktop: CAMSTUDIO

CAMSTUDIO: http://camstudio.org/

CAMSTUDIO is a free and open-source tool to create video recordings
of activity on your desktop.

This can be used to create training demos, tutorials, online help, documentation and more. The tool is free and open source;

Jim

Tuesday, March 24, 2009

adjusting legend

I moved a most of the logic of the legend adjustment and added much more error checking today.

Tomorrow I am going to try and get a more colorful legend, play with the placement, get click-through from the info panels (so you can click an IN link in the popup window and get it to reload focusing on IN) and then brainstorm with Brian for new search types and speedy loading.

OpenMRS Concept Cooperative.... Future service for the grid?

Just learned about this - very interesting.
http://openmrs.org/wiki/OpenMRS_Concept_Cooperative#What_is_the_OCC.3F

WHAT IT DOES:

    • OCC is a collection of the cumulative concept development work of the OpenMRS community, shared and viewable in such a way to allow commonly used conventions to "rise to the top". Perhaps with enough participation, common modeling conventions, and commonly used concepts will themselves become "de-facto standards".
    • OCC's foundation is the OpenMRS concept model, which represents to the best of our knowledge, the relevant metadata needed to actually drive system behavior. (Unclear how 'metadata [...] drives system behavior' -- Shaun) (Agree that it would be valuable to elaborate on this a bit more to describe how additional attributes about the concept besides the term name are needed to create flowsheets, data entry screens, and essentially anything else you want to do - Dan).
    • OCC concepts can be linked to 1 to n standardized reference vocabularies (such as SNOMED, LOINC, ICD, etc)

    (Does OCC have concepts...or simply mappings? "OCC Concept" could be misleading. Would it be better to say "OCC concept mappings can be linked to..." or "OCC-linked concepts can be mapped to..."? -Burke)

    • OCC's key ingredient is tight linkage to the vocabulary development mechanisms inherent in the OpenMRS Base install. (May be overstating the obvious, but if the reason that 'tight linkage' is a 'key ingredient' is because it eases the oft-cumbersome process of accessing and browsing terminologies in the familiar and friendly OpenMRS interface, then it may be worth stating that very point -- Shaun) Using network connectivity, users can browse the OCC resource within the OpenMRS dictionary editor, and import concepts into an implementation.
    • Implementations which import a given concept create an automatic mapping between their site and all other sites which have used the concept. They also import all of the collective work for that concept. (I think this is something that should be elaborated on. I would think that the more information about what has been mapped to a given concept and all things related to its current usage would be a huge help in mapping. - Dan) So, if any site maps the concept to a standardized vocabulary, all of the sites benefit from that new mapping. (Would be interested in hearing more about your thoughts on this particular point. As you know, we've taken a centralized approach to mapping in the INPC in part because the resources and expertise needed to do it are more than many local sites can expend. OpenMRS has taken a different approach. Either way, you want to take advantage of the work wherever it occurs. - Dan)

DATA MODEL:
http://openmrs.org/wiki/OCC_Data_Model
--- in XML  http://openmrs.org/wiki/Image:OCC_0.1.xml


WORKING GROUP
http://openmrs.org/wiki/WorkingGroup/OCC


UMLS / OCC Relationship:

Question:
If implementations utilize licensed vocabularies mapped within the UMLS (or directly from the licensee), when they share their concepts, does the OCC intend to leave the details of using these concepts with their licenses to each implementation?

Answer:

The OCC will, by the product of its ability to aggregate concepts, create necessary mappings between OpenMRS implementations. This will serve (at some point) as a possible foundation to allow OpenMRS implementations to share information between systems using messaging protocols such as HL7. However, OCC's primary intention is to serve as a pragmatic starting point for those interested in populating their own OpenMRS implementation with a dictionary that meets their local needs. OpenMRS installs will not come with a starter vocabulary over the long run.

OpenMRS very soon will come with an ability to link up to this service and browse the OCC much like they would their local vocabulary. This functionality is beyond the scope of the UMLS. Additionally, the atoms of the UMLS metathesaurus by their very nature have disparate metadata models associated with their source origins. Not a good starting point for a practical OpenMRS implementation. So, while there are similarities in what they might look like on their surface, they are fundamentally different tools for different purposes.

Look forward to thoughts on this…. a future service for the grid?   -tom

Monday, March 23, 2009

Lots of little usability improvements to Quicksilver

Today I managed to get the scroll-wheel zooming and small map overview working on google maps. I also externalized the legend API so one can set new boundaries for what turns a particular region red or yellow or green. Tomorrow will be some debugging of that API, then some discussions of how to implement cool new features and reduce annoyingly long load times and "this script is taking forever" type errors.

Friday, March 20, 2009

C2 Flot deployed to training

Quicksilver with Flot showing counts, with C2 based averages and outliers has been deployed to the training node.

Quicksilver can be reached at: http://ncphi.phgrid.net:8080/npdsgmaps-web/
If you need a user/pass please let someone at PHGrid know.

Next week will be trying to add some small tweaks for the Google maps bit (like zooming and a small overview window) and a few larger features like multiple condition and region selects, and different options for data aggregation (merge all the counts for all the zip3s in Atlanta, for example).

Thursday, March 19, 2009

Tomorrow: Flot with C2

Tomorrow I will be deploying Quicksilver with Flot running the C2 algorithm (if all goes as planned).

Then, it is a battle with the IE "Operation canceled" bug (although I will be trying something with this deploy) and a bunch of UI tweaks based on suggestions from multiple users. Things like click-through to new regions, and enabling better map functionality (scroll wheel zoom, mini-map). Also, some larger features like multiple zone select, combination of regions into counts, and multi-selected conditions will be considered.

Either way, I am just excited Flot is working. It's really pretty!

VMWare Grid node Distro

The next version of the VMWare grid node appliance will not be a VDT installation of Globus. It has been converted to a native Globus installation.

This has been done for two reasons. The first reason is to reduce the overall disk space requirements for the VM appliance. The second reason is to distribute a model that closely resembles the NCPHI installation of Globus.

Please keep in mind that this change does not reduce grid functionality. The original VDT installation was intended to reduce the complexity of installing a grid node. If we are distributing a DVD with a fully installed grid node, it serves no purpose for it to be a VDT installation.

This change will allow the users to research issues based on a standard Globus installation. I believe this move will reduce configuration issues as new grid services come online.

Wednesday, March 18, 2009

C2 in popup, attached to Quicksilver

It looks like I have flot powered by the C2 algorithm popping up for states, zip3s and zip5s.

Tomorrow I anticipate some debugging, perhaps changing polygon shading/zooming, and perhaps trying some fixes to see if we can get Internet explorer loading more reliably.

Either way, I am happy that the new graphs are working. They are really neat.

Tuesday, March 17, 2009

c2 in a popup

So, I got all the mechanics working (I think, I will need Will Duck to go over the mechanics with me to make sure I am doing it right) for the C2 algorithm.

Furthermore, I have tested them and put together the mechanics to deploy javascript arrays for flot... including making sure I have GMT-centered timestamps with my counts (much more annoying than it seems it should be).

I pasted the arrays generated into my flot tester, and the graphs worked :D

Tomorrow, I will work on getting the flotplot.jsp working on fetching the appropriate polygon and forwarding it to the flot JSP to build the arrays.

Then, it's plug it all together, import it into NPDS and modify how the dates are pulled and hopefully.. new charts in Quicksilver.

Yeah, this has been a sentence-paragraph kind of day. It means I have been thinking like a machine for most of the day.

Cheers

Wal-Mart, eClinicalWorks Deal Exposes Need For EMR Price Transparency

Very interesting article….

http://www.healthleadersmedia.com/content/229907/topic/WS_HLM2_TEC/WalMart-eClinicalWorks-Deal-Exposes-Need-For-EMR-Price-Transparency.html



GIS Video from Penn State

This was posted on the CDC GIS Listserve by Brian Kaplan....


GIS Video from Penn State: http://geospatialrevolution.psu.edu/


The geospatial revolution....

Monday, March 16, 2009

Janet Marchibroda to leave eHealth Initiative for IBM

http://wistechnology.com/articles/5633/

flot in a popup

I have successfully loaded flot charts in a little jsp in a popup (and using an iframe). I have also laid out all the mechanics and now just need to write the logic for getting 30-day rolling averages and then determining the standard deviation and applying that to make three arrays for flot to chart.

It should be easier than it sounds, I just need to get the algorythms and the looping right... but my mind is working on it now... and it's sort of tingling...

Cheers!

Automating Grid Srevice Development at the PH Entities Site (a thought experiment)

In certain circumstances it may prove advantageous to have a grid service wizard as part of the phgrid software stack. For instance a PH entity may have analytical tool that they want to convert into an analytical grid service or a database that they want to convert into a data grid service.

Currently the easiest way to wrap an existing analytical tool in a grid service is to use the gRAVI extension to Introduce. In the case of phgrid many of of the fields for gRAVI/Introduce could be automatically assigned or taken a step further one may be able to make use of the gRAVI/Introduce API to make wrapping an existing analytical tool on phgrid an automated task.

Wrapping an existing DB into a grid service in a fairly automated way is more challenging than wrapping an existing analytical tool. In the case of a DB some sort of automated Object Relational Mapping (ORM) needs to take place and the two ORM tools I'm familiar with are caCORE and the Java Persistence API (JPA). caCORE works well with Introduce and is a reasonable way to create a DCQL based grid service while JPA can be used outside of a Java EE container it may require some effort to connect JPA to a data grid service. An experienced developer could probably make use of the caCORE and Introduce APIs to create a partially automated way for a health entity to wrap an existing data base into a data grid service.

The success of the ideas above depend on the maturity of the caCORE/Introduce/gRAVI APIs and there relevance.


Any thoughts?

Friday, March 13, 2009

PHGrid Architecture Paper Submitted to Fall AMIA 2009

Thanks to everyone's hard work, we were able to submit a paper for Fall AMIA 2009. If you would like to read it, please let a member of the NCPHI team know. Hope the reviewers take kindly to it. I must admit, that 5 page limit made it quite a challenge to say the least. Have a great weekend everyone.

Security and Software Engineering

SANS has created a great list of the top 25 security issues in software engineering:
http://www.sans.org/top25errors/

Flot gets it's own page.

This morning, I deployed a bug-fixed Quicksilver and made sure some of the older apps were still running.

The next big improvement is going to be flot charts... and I have been thinking of the best way to incorporate them. The old Google Charts were literally single strings of broken-out data that were sent to Google and returned the static image of a chart.

Flot is much more involved and has more javascript. It literally quadruples the amount of code that the old charts had, and it would probably not be a good candidate for simply plugging into the usual system (which would simply make the main pane of Quicksilver expand rapidly).

Thus, next week, I am going to try and encapsulate the flot data into it's own jsp page and just have the google map call the page and pass in the polygon id. The JSP would then call the code to calculate everything, fill the arrays for the flot chart, and then plot them out in a main chart and an overview.

The first step will be just making sure that a HTML popup calling an external page will display properly. Then it is automating the load setups and adjusting the queries to actually pull 30 days of leading data and running the C2 algorithm to get the data for the arrays... the arrays leading to the chart lines.

Cheers, looks like next week will be fun.

Thursday, March 12, 2009

Bugfix Thursday.

Yesterday, after playing with the local development versions of Quicksilver, I noticed that when limiting results by condition, some of the results were not being "scrubbed" if the data set came back null.

After a bit, I figured out a good way to clean the time series, and there was a new spate of null-checking and re-testing that had to occur, but I think it is pretty solid.

Otherwise, I got some feedback on my flot example, and people seem to think it's rather cool. Next step is to get it integrated with Java (I am anticipating 4 hours dedicated to "no, Javascript wants it's date in THIS format" tinkering alone...). I am wondering whether I should try and post it to the main page or make a little JSP that can be opened in the google popup window.

myExperiment.org

myExperiment is brought to you by a joint team from the universities of Southampton and Manchester in the UK, led by David De Roure and Carole Goble, and is funded by JISC under the Virtual Research Environments programme and by Microsoft's Technical Computing Initiative.

myExperiment is part of the myGrid consortium, which develops the Taverna Workflow Workbench for creating and executing scientific workflows, and also builds on CombeChem - two of the original UK e-Science Pilot Projects. The related WHIP (Triana enactment) activity in Cardiff is supported by the OMII-UK Commissioned Software Programme.

http://www.myexperiment.org/



Wednesday, March 11, 2009

Flot more flot!

Graph two lines (check)
Have a zoom-able overview (check)
Have the x-axis with dates (check)
Have float-over tool-tips that reveal data (check)
Have a legend to distinguish the data points from the average points (check)
highlight points that are outliers in some way. (check!)

So, got them all working! Now the next step is to get the charts building inside gmap-polygon popup windows, and of course tackling what I am sure will be a fun exercise in getting Java dates converted to JavaScript dates... Then comes the codifying of the C2 algorythm.

DRN Demonstration

Someone just remembered for me that I never summarized the results of our Harvard Distributed Research Network research project.

Ken will ultimately write up the findings, but until then, I thought I'd make a few comments.

We presented to the DRN/DEcIDE group on February 19th at 230pm. The group included: DEcIDE centers at the HMO Research Network Center for Education and Research on Therapeutics and the University of Pennsylvania and the participating health plans: Geisinger Health System, Group Health Cooperative, Harvard Pilgrim Health Care, HealthPartners, Kaiser Permanente Colorado, and Kaiser Permanente Northern California

I've uploaded the presentation to the wiki.

The short story is that we demonstrated secure transfer of SAS programs over the PHGrid nodes using the PHGrid SSTS service, remote execution of the program at clinical partners (on synthetic data) and then combining the aggregate results together in the NCPHI lab using open source spreadsheet software (StarOffice).

Tuesday, March 10, 2009

Flot is very flot.

So, my goals for flot are rather simple.

Graph two lines (check)
Have a zoom-able overview (check)
Have the x-axis with dates (check)
Have float-over tool-tips that reveal data
Have a legend to distinguish the data points from the average points
highlight points that are outliers in some way.

So, three down, three to go. The interface for flot is very nice, and more importantly, the examples are excellent. They've been helping me tremendously.

Monday, March 9, 2009

Bug Fix Monday.

Over the weekend, I started just googling ways to deal with the Operation Aborted error in GoogleMaps for Internet Explorer... I found this and have implemented it in the gmap-polygon classes, and it seems to be working (at least I wasn't able to get IE on the semi-local box to throw the error... might be a completely different case on boxes not in the next room over)

Otherwise, I have added a lot more parameter checking to the Gmap-pane and the login.jsp.

Tomorrow, Flot. I hope to have a Flot-ted chart made (perhaps with fake data) and maybe even integrated by tomorrow evening.

JBoss Tattletale

Saw this by way of theserverside.com and thought it would be useful for those times we've needed to track down dependencies and which jars are used and which jars are not used.

JBoss Tattletale is a tool that can help you get an overview of the project you are working on or a product that you depend on. The tool will recursive scan a directory for JAR files and generate linked and formatted HTML reports.

JBoss Tattletale features the following reports to help you with the quality of your project:

Dependencies
* Dependants
* Depends On
* Transitive Dependants
* Transitive Depends On

Reports
* Class Location
* Eliminate Jar files with different versions
* Multiple Jar files
* Multiple Locations
* No version

Furthermore the tool includes an initial implementation of its ClassLoaderStructure interface for JBoss Application Server 4.x and JBoss Application Server 5.x which will scope the archives based on classloader constraints.

JBoss Tattletale is licensed under GNU Lesser General Public License (LGPL) version 2.1 or later.

Download
Issue Tracking (JIRA)
Forum

Dueling for Certificates.

So, It appears that the problem between Rodsadai-web and Amds-web that they are playing musical chairs for the client-certificate.

Felecia (who is developing AMDS-Web) has been helping me debug. We were able to replicate the issues outside of staging on my dev node (handy, since debugging these involves a lot of jboss-restarts). It appears that it is just a mutual exclusion issue: If you started AMDS-web first, it will be able to marshal the needed security credentials, and when Rodsadai-web tries to marshal the credentials, it will not be able-to and the connection to a remote globus server will fail. The behavior is the same in reverse. If the rodsadai-web client is called first, it works fine and then AMDS-web will throw a connection error.

Now, we are not sure if the failure to marshal is because of slightly different needed credentials that are being cached (apparently Rodsadai-web is using tramsaction layer security (TLS) with only privacy and AMDS-web is using TLS with privacy and integrity) or whether there is some sort of explicit locking behavior that prevents different applications from using the same set of proxies.

Either way, it reveals that we need to find a better way for clients to make secure connections to Globus services. Whether it involves some sort of Server-level pooling or having the apps load application-specific certificates, the current method would not allow client applications with slightly different needs to coexist on the same server (or worse, not even allow client applications with the exact same needs to coexist).

Thus, we plan to delve into the innards of globus security to see whether the certs need to change or how they are being marshalled. (Probably both).

A Hybrid Mesh Network: Path to a Distributed Secure Social Grid Network

After a rigorous weekend of coding and searching for code, I am quickly coming up with a model that can specifically express most of the ideas of a distributable secure social grid network. On Saturday morning I was reviewing some of my detailed thought experiments relating to “What would it take to create an infinitely scalable distributed social network”. Reading through the thoughts here it can be easily seen how that consideration is not only relevant but paramount from the perspective of supporting the expanding boundary conditions that a health care network will demand. So to that end, I began to review the types of physical networks that can support that type of environment and came up with the idea of basing my ideas on a Hybrid Mesh Network that will leverage two Networking sub-models: Fully Connected Topology (Point-to-Point) and Star Topology (Hub and Spoke). These topologies take into account three of the most important expressions of a Grid Node’s ability to connect and participate in the network which are:

  • A Consumer Grid Node – this node is consuming services/applications, vocabularies and autonomously exchanging data with “N” number of other nodes
  • A Producer Grid Node – this node is providing services/applications, vocabularies and autonomously exchanging data with “N” number of other nodes
  • A Producer/Consumer Grid Node – this node is providing services/application and vocabularies to some nodes while consuming services/applications to other nodes and autonomously exchanging data with all.

In all three cases, the incorporated PKI for credential management and access control will need to encompass direct control and delegated security models. Considering deployment requirements now, will make a solution that much more adoptable by a broad range of organizations because it is conforming to some of their deployment best practices. For leisure reading I took a look at this document ,I think it provides a good guidance in regards to a Producer or the Producer/Consumer DMZ deployment best practices. Thoughts? Until next time…..

Saturday, March 7, 2009

A Periodic Table of Visualization Methods

Jim Tobias discovered in the great ocean of the WWW a site that categorizes and gives examples of different methods of visualization.

Check it out here

Friday, March 6, 2009

More PHGrid Architecture

On Monday, February 9th, GB held a session to discuss review the progress of the PHIN-SRM proof of concept project and potential GAARDS-related PoC next steps. During this meeting he requested an enterprise architecture review of how PHGrid will look and what components are necessary for deploying services throughout a grid to benefit public health informatics.

Over the past three weeks, Moses (PHIN contractor), Vaughn (PHINMS contractor), Charlie (PHIN-SRM contractor) and I (PHGrid contractor) have met to draft an initial release of four diagrams that seek to address this need for an architecture. Our goal was to create an architecture that is meaningful to business stewards, project managers, architects and developers. Statistician George Box once said "All models are wrong; some models are useful". This set of diagrams is meant to be useful.

All four models are available on the wiki at: http://sites.google.com/site/phgrid/Home/phgridarchitecture.

PHGrid Architecture - This model shows the basic components of the PHGrid and where they fit with CDC's partner structure. Some components must be hosted at the CDC (such as CDC application data), some components must be hosted by our partners (such as State application data) and some component lie in between (collaboration portals, registries, etc.).




PHGrid Service Stack- This model shows the services available within a specific node. It is separated into a small list of program services that different partners may wish to install and infrastructure services that are required for minimal functionality.


PHGrid Node Functional Deployment- This model shows the software components within a specific node.


PHGrid Roadmap- This model is perhaps the most arbitrary as it requires more input from the relevant program and system stewards. It attempts to show a timeline for how the services, components and infrastructure can be built out. Each timeline is marked with what has been completed or is currently in progress and show the required dependencies. To the right of this marker are the items that require prioritization based on your decisions. For example, along the Infrastructure Timeline,



Please post your feedback on these models and how we can change and improve them. Based on the feedback gathered in response to this email I may schedule a time for us all to meet again and review potential next steps.


Here is the visio in case anyone would like to submit their comments within the models.

Deploy Log

Deployed new Quicksilver to staging this morning. The Zip3 and Zip5 counts are now geocentric from the Allied Center, and not just the Colorado Poison Center. Thus, you no longer log into Tennessee which was supposed to have 400 cases and then only see about 30. It also has security, so you will have to email someone on the PHGrid team to get a userid and password. The installation notes are here and the direct link is here.

The AMDS-UI was also deployed, but it was discovered that the AMDS-UI and Rodsadai-web are not playing nicely with the Globus Security Interface. Thus, when one is run first, it prevents the other from running, because the other cannot create a secure connection. Thus, AMDS-UI and Rodsadai-web seem to be mutually exclusive for the time being, and since rodsadai-web is still being demonstrated-on, we are going to try and debug and hopefully alleviate the conflict. Failing that, we will probably create a "switchover" where AMDS-Web replaces Rodsadai. After all, AMDS is capable of doing the same thing Rodsadai did, but better with more functionality and outside of Pittsburgh.

Enhancing the PHGrid Platform to realize the vision

Last Friday I had an opportunity to sit in on a presentation given by Tom Savel and Ken Hall, Grid Value Proposition. I believe I can contribute to the effort by utilizing some of my past and current related work. Just having published the final drafts of the PHGrid Architecture documents, I began to go through several blogs, documents, and the wiki to get a full pulse check of how PHGrid could be enhanced to implement the remaining infrastructure pieces described in the Architecture documents. In my thought experiment, I believe if one places a wrapper around the Globus container, ideally, it will make it that much more manageable by one who is non-technical, a significant advantage. I believe this is an important hurtle because having come from the world of PHINMS and seeing its growth over the seven years of my involvement, lowering the support burden and enhancing the configuration via a user interface (UI) was one of the main factors in growing its user base to over 700 active nodes and growing. What was found through that experience is that the following should be well defined:

  • UI and Automation for certificate exchange, renewal, distribution and revocation mechanism – this should be based on an invitation model for added security.
  • UI for Node-to-Node access control which will assist in fostering secure and reliable data exchange and service/application distributions
  • UI and Automation for auto-discovery and monitoring
  • Queue based data transactions (Similar to PHINMS)
  • A Graphical Installer (Similar to installshield type installers)
  • Mechanisms that provide auto updating and patches
  • A well defined support plan from deployment to long term management.

I believe the above describes the bases for realizing the full visions expressed by Tom and Ken while incorporating all of the existing PHGrid efforts. Since my background cross-sections all of the above mentioned from a technical perspective, I will try and put some things together. I will post some initial results soon. Please share your thoughts. Until next time….

Zementis

More Analytics in the Cloud
Here is another interesting announcement regarding cloud computing. Mathematica and MATLAB are joining the crowd with offers on Amazon EC2.

More analytics in the cloud
http://smartenoughsystems.com/wp/2008/11/25/more-analytics-in-the-cloud/


In a similar effort, we at Zementis http://www.zementis.com launched our ADAPA predictive analytics decision engine on EC2 which allows users to deploy, integrate, and execute statistical scoring models, e.g., using algorithms like neural networks, support vector machine (SVM), decision tree, and various regression models.

Buying predictive analytics like books - Zementis ADAPA
http://smartenoughsystems.com/wp/2008/06/20/buying-predictive-analytics-like-books-zementis-adapa/

Thursday, March 5, 2009

The Nation’s New Chief Information Officer Speaks - NYT

March 5, 2009, 2:57 pm

(oh...and he talks about cloud computing)

The Nation’s New Chief Information Officer Speaks

Reforming the entire health care system may be easier than doing everything Vivek Kundra says he wants to do when it comes to reforming the government’s computer systems.

Mr. Kundra, the 34-year-old former chief technology officer of the District of Columbia, was named by President Obama this morning to the new position of chief information officer of the United States. That’s a different job than the chief technology officer, a White House position that Mr. Obama said he would create but has yet to define....

Full article here:

http://bits.blogs.nytimes.com/2009/03/05/the-nations-new-chief-information-officer-speaks/

Secure Quicksilver, and Flot is neat.

Today I had a good discussion with Brian about how the G2 algorithm is going to work and how it will affect the future graphs. Also, We played with Flot and it looks like it will do what we need to do.

Also, Secure Quicksilver (with better information now) should be deployed tomorrow, thus most of Friday and next week will be getting the new flot charts working and getting the algorithms set up to make the flot charts that much more interesting.

Cheers,
Peter

Wednesday, March 4, 2009

security and better variable handling in Quicksilver.

Today, Brian and I worked through the intricacies of SHA1 hashing, Popular Open Source libraries for SQLServer JDBC, and all the various ways to get relative URLS in a servlet container...

But we have secure Quicksilver. If you try and access the main quicksilver pane in the app when you are not authenticated, you will be redirected to a login page, and you can only get to the pane after logging in successfully.

Also, I moved a bunch of static variables in a JSP into the servlet space, thus hopefully alleviating the "caching" of previous variables and settings between different sessions.

Tomorrow, there will be some cleanup, a test to make sure Gmap-polygon can connect to the SQL Server instance being used for authentication, and then some research and perhaps implementation of flot charts. I may also have time to look through being able to click on a link in the popups and essentially drill into the preferred state/zip3/zip5

Tuesday, March 3, 2009

Security thoughts continued.

Had a lot of people in and out of the lab today.

But I still managed to make some headway on the security stuff. Brian and I switched out from a servlet listener to a servlet filter, moved things we wanted authenticated into the "auth" directory, and then we got the SHA1 algorithm parsing the entered passwords.

After that, I got the parsing to put out readable text using the Apache Commons Codec library to base64 encode the output parse. Then I set up some methods for the login.jsp page to allow for checking and setting the proper variable for actually marking authentication in the session.

Tomorrow, I will set up the database connectivity to a user/passwordHash table. I will probably set up the appropriate table in PostGres and then start wiring it to the SQL Server. I will also be doing some updating of connection code inside Gmap-polygon and moving some code out of the static load for a servlet inside gmap-poly-web and npdsgmaps-web.

All these epiphanies and suggestions from sitting with another coder and having them look at what I've written for about an hour. We need to do this more often.

Otherwise, I met with Will Duck again today, and I think we are settling on Flot as the next thing to draw the NPDS charts. As you can see from the examples there are a lot of really neat things that Flot can do, it's all java script, and it allows for things like zooming and hover-over tips.

It's pretty too.

NHIN Interface Specifications and v2.0 Architecture Documents

Here are links to recent versions of the NHIN specifications and the v2.0 CONNECT architecture.


Service Interface Specifications

Gateway and Adapter SDK Reference Architecture

Gateway and Adapter SDK Reference Architecture Schemas

bringing the service to the data (cont.)

On Feb 17th a great discussion took place between NCPHI, JH and Utah regarding architecture to bring the service to the data. I've update the doc on the PH Grid wiki to reflect our discussion.
Here's a link to the updated doc:
http://sites.google.com/site/phgrid/Home/resource-documents/bringing_the_service_to_the_data.doc?attredirects=0

Windows AMDSRODS Ready to Test

I am pleased to announce the AMDSRODS service has been fully installed and configured on the Windows node. The Ogsadai database tables were exported from staging and imported on the Windows node. We will test the service in the morning.

Monday, March 2, 2009

Filtering, the security method of the discerning secure app.

So, Brian and I talked and emailed about security, and the method we singled out was "use the fancy filtering architecture of tomcat/jboss/any-servlet-container to basically screen all requests and bump any that are not authenticated to a login page". Thus, I have written a little class to test that a filter is being used and with the help of Brian, set up the web.xml of the Quicksilver web app to use the filter. Tomorrow I plan to deal with redirects and writing a little login page and probably create a new little simple authentication project so that anyone wanting to add (very) basic authentication to their app can.

Otherwise, the NPDS service underwent a small change over the weekend, so while staging wasn't undergoing any errors and some data was being returned, the data was all cached and I had to do a quick update of my code in order to handle the change.

Cheers, I hope everyone had a good weekend!

VirtualBox

There are many virtualization tools out there for creating VMs. Some are expensive and some are not, while some do what they say and some do not. In attempt to find a good balance between price and functionality I've been using VirtualBox. So far so good and it has all the functionality I need.

virtualbox.org