Friday, June 26, 2009
In gridviewer, Dallas has been disabled for the time being because it seems to be acting rather flakey (will load fine for a half hour, and then times out). If anything it means I need to work in a good timer function to abandon a load and notify the user after a set amount of time.
Otherwise, gridviewer next week will be given some UI tweaks. I am hoping to get better server allow/disallow logic, multiple loads on one map (with different pushpins depending on the load), and autopopulated classifier/indicator drop-downs that don't require server hits.
Cheers everyone, have a good weekend,
Thursday, June 25, 2009
Otherwise, the server url's are now pulling from the wiki page (or any page you wish to configure yourself). Thus, less database configuration is needed just for gridviewer.
Wednesday, June 24, 2009
The NHIN Connect project is located at http://www.connectopensource.org and version 2.0 is the current release of the code in question (although 2.1 is supposed to be forthcoming in early July).
While I can now configure a gateway and an adapter in under 1 hour each, there are several 'gotchas' that are not addressed in the documentation. Using the "pre-configured binaries' option (versus install-from-scratch), here are the 'gotchas' so far:
==>Adapter and Connect need different machines. I've tested them on the same one and functionality is mediocre at best. The lab has these set-up nicely on 2 seperate but equal machines.
==>Java is set to allocate 1.2 gb of memory from the start. Don't try this on a machine with less than 1 gb of memory... you want at least 2.
==>OID Registration doesn't work as advertised. Fortunately I haven't needed my OID for internall connectivity but when I connect outside of the lab, I will need this.
==>c:\java is hard-coded as java location. Make sure you install to this location. If you don’t some of the NHIN services break with odd errors. The documentation reflects inaccurate pointers to the java locations. And with _some_ of the application hard-coded with this java location, better safe than sorry.
==>The NHIN documentation says in multiple places that port 9080 is the non-secure port and that 9081 is the secure port. DON'T BELIEVE IT! Port 8181 is the secure port.
Thus concludes volume 1 of the Chronicles of NHIN CONNECT. Stay tuned for updates as the PHGRID <--> NHIN CONNECT interoperability testing continues...
Now, I am hoping to get a majority of it caching... and instead of trying to write my own caching, I am going to use a caching mechanism suggested by Chris: OSCache from the OpenSymphony project.
It seems to have generic back-end caching with configurable levels of intelligence and persistence (which is all I really needed)... but it also has the most potential for helping on the front end (like request caching: if a request looks like the exact same request that was sent a minute ago, it will return the exact same HTML that was sent back rather than hitting the server) It also allows for better and more fine-grained error handling.
Tuesday, June 23, 2009
Right now, it seems that it is building, but when you open the flot plot, all the data is somehow being tagged with one date, and it is not immediately apparent where or how that data is being set like that. I think I might have run into this problem before and it might have to do with how Java Calendars increments... I hope to figure it out later today or tomorrow.
Otherwise, after that is finished tomorrow, I will implement the new caching structure we discussed with Brian, and start publishing arrays so that the drop-downs for indicators and classifiers can be dynamically populated (as discussed with Chris).
After that, I am going to start allowing for multiple polygon loading... thus you can do one search, then another, and click and compare graphs. I might also look into making histograms in addition to line charts (that way, one can see the different data brought back by the different servers)
Saturday, June 20, 2009
Peter and Brian asked me to assist on a few things for gridviewer application. I took a look at the code and the UI have a few ideas around where I think we can add functionality to it in the distant future. Peter has worked through some tough requirements and developed some strong code to handle all the p0lygon/map manipulations.
I think it would be a good step to introduce a controller to the gridviewer. The controller would not only to handle web user interface requests but also to process remoting protocols or generate specific outputs on demand. To handle MVC, I believe we should look to framework's such as Spring, JSF, Struts, etc. My preference is Spring-MVC, especially now that 3.0M2 release will be REST complaint, which is another consideration for the gridviewer.
2. JSON/RSS/XML output
The application currently does session based authentication. We could potentially look to Spring for handling authentication. We would gain persistence (remember me), an adaptor for authenticating with OpenID, LDAP, and an easier path to cross-domain authentication if that were to become a requirement in the future. Spring-Security (formerly ACEGI) also supports X.509 certificates.
I showed Brian a mash-up that is a few years old called housingmaps.com. It basically takes craigslist and google maps and creates city-based maps of the current RSS listings. I think it is a good example of making the map the focal point of the UI. By making the map larger and moving the selection components to the right, I think we can really improve the UI.
Also I think we should use look to using GET requests which could facilitate additional functionality in the future such as remembering passed searches or allowing users to easily find URLs in their address bar. We should always work towards a REST architecture.
These are just a couple initial thoughts for future development (post-August).
Friday, June 19, 2009
Unfortunately, the data from multiple servers is not combining because of some back-end restrictions that I am going to work on next week, some additional refactors include separating polygons from pinpoints (thus, coastal states won't have really dark shadows (the result of multiple pinpoints being drawn in the same spot)), and storing data from multiple states in the markers (thus, flotplots should be able to distinguish which data came from which service in future versions).
Cheers everyone, have a good weekend!
Ken created a new mailing list for those interested in GIPSE schema development. It is: firstname.lastname@example.org should anyone want to join the technical discussion around the GIPSE schema.
Software professionals are, on average, over-optimistic about the required effort usage and the success probability of software development projects. This paper hypothesizes that common risk analysis processes may contribute to this over-optimism and over-confidence. Evidence from four experiments with software professionals, together with research in other domains, supports this hypothesis. The results of the experiments imply that in some situations more risk analysis leads to over-optimism and over-confidence, instead of the intended improvement of realism. Possible explanations of this counter-intuitive finding relate to results from cognitive science on “illusion-of-control,” “cognitive accessibility,”, “the peak-end rule” and “risk as feeling.”. The results suggest that it matters how risk analysis and effort estimation processes are combined. An approach is presented that is designed to avoid an increase in optimism and confidence with more risk analysis.
Thursday, June 18, 2009
So now, instead of having to track down the various files from globus, amds service, and introduce... and then install them into the local repository... one can just type "maven package" and maven will download the files.
I think this will help me a lot, and now that several people are using my code I am hoping it will help them a lot.
The one thing that will still need to be written up is that some of the files (all the ones specified by "provided" in the POM file) will need to be copied into the web containers shared lib directory. I am debating whether it will be more useful to change them from provided to default (meaning they will be included in the war) so long as whoever is trying to set-up grid-viewer isn't trying to use multiple grid-viewers (or other things that will be using the secure globus libraries), it should work.
Otherwise, I have made some minor changes to grid viewer, but I'm planning to make some progress to gridviewer behavior and and performance tomorrow and next week.
Wednesday, June 17, 2009
Simpler… if Project A needs Jar B… but Jar B relies also on jars C, D, and E… maven will read the pom associate with Jar B in the repository and also go fetch C, D, and E, even though C, D, and E, aren't listed as dependencies in Project A's pom file. At least it seems that way from experimentation.
So, if you want to use this jar, you will need to create a properties file with the url, systemuser, system name, and password (look into the poicondai project to see what the filter is filling), but it will possibly make it much easier for people needing to access poison to include this.
More importantly, it got me comfortable with moving things into a non-local repository (also many thanks to Felicia's posts), because I anticipate I will have to move more things into the remote repository for the sake of fixing a RODSA-DAI bug and for people who want to use grid-viewer (it will keep them from having to do a jar hunt like I have so many times).
Now the next step would be finding a cool way to get all the provided jars into the lib container of a webserver. Maybe there is a deploy plugin for that sort of thing.
Tuesday, June 16, 2009
I'm happy because it is connecting over a secure connection to get both metadata and data, and it's relatively speedy, and allows me to move on with future plans for grid viewer and how it will behave.
First, I will be working with Chris to get a second GIPSE service testing, which will allow me to finish the server-selection logic (IE, you cannot select servers if the data you are looking for is out of their range. You won't be able to scan the North Carolina AMDS source for results in California or try to scan a BioSense node for Poison-based indicators).
Then, I am going to finish separating the selection bits from the mapping bits... which will require some generification: Right now, because it's based off of quicksilver, the map is expecting a series of states, a series of zip3s in a state, or a series of zip5s in a zip3... while it is already flexible enough to allow only a handful of states as opposed to all of them, I hope to expand it to allow for collections of zip3 (that may cross state borders) or zip5s (that may cross zip3 borders) that don't necessarily have to be related. It should also be possible to select states and zip3's at the same time (but I am not sure if that would be terribly helpful or just confusing).
Finally, I am hoping to allow for cumulative data loads. Thus, you can run one query, and then run another query with different pinpoints, thus you can click and compare the data between the two queries.
In addition to all this and in the process, I am hoping to clean up some of how gmap-polygon behaves so that less data has to be stored in databases and things aren't repeated as often.
It's going to be a lot of work, but I think the end results will be rather nifty, and I'm just happy to have this version all wired together and working so I can move on.
"SET AUTOCOMMIT TO OFF is no longer supported" being thrown by the AMDSService operations.
This was caused because the training node uses PostgreSQL 8.1 yet the globus_database_common package deploys the JDBC driver for PostgreSQL 7.3 (pg73jdbc2.jar). This causes a problem with the way that ibatis handles postgres connections (specifically ibatis turns autocommit off to maximize performance).
The AMDSService includes the postgreSQL 8 driver (postgresql-8.3-604.jdbc3.jar). To get around this error, we removed the 7.3 driver from the training node's Globus and the training node's Tomcat. Globus is happy using the updated driver, AMDSService is happy using the updated driver and RODSA-DAI is still happy using the updated driver.
This is not a problem for WS-Core Globus installs (like our Windows nodes) since WS-Core doesn't include the 7.3 driver.
Something to keep in mind for any Linux PHGrid nodes that use AMDSService is that they will need to remove the 7.3 postgres driver after installing Globus and the AMDSService. (This is accomplished by renaming $GLOBUS_LOCATION\lib\pg73jdbc2.jar to pg73jdbc2.jar.old; then redeploying Globus to tomcat).
Friday, June 12, 2009
It was pulling data and displaying it Quicksilver style when installed in tomcat and connecting to the GIPSE Globus Service running on a non-secure globus container. In Windows.
Today, I tried to build it for a training deploy when connecting to secure GIPSE Globus service running in a secure tomcat container. I found some more dependencies I needed to add. And then I found out that the GIPSE globus service seemed to be having configuration issues.
Thus, I figure the next best step is to get that GIPSE Globus Service running on a secure tomcat container in Windows... or move development over to my old Ubuntu development box (where I would still have to get the GIPSE service running in Linux). Both would be beneficial and work towards the goal of getting an environment more like the training node.
Otherwise, I hope to get that done relatively soon, and then start focusing on the refactors I have been planning for Grid Viewer but haven't been able to really work on in the attempts to just get something Grid-Viewer-ey completed and working (hopefully correctly).
The other thing that needs to happen is more services. One that is starting development will provide NPDS data in GIPSE form, and we should probably also deploy a smaller GIPSE service some place like Dallas so that we can pretend to have a smaller service that only has data for a few states and/or a few conditions. Multiple services with varying metadata is where Grid Viewer should get most of it's niftiness.
If you go to page 36....there is an article on Neogeography and Google;
They interview Jack Dangermond (ESRI) as well as Intergraph and other GIS vendors;
Jack Dangermond makes an interesting comment about the future of Desktop GIS and the increase in demand for cloud computing....
Thursday, June 11, 2009
It seems to me we could link to the installation documents already written by Apache on configuring SSL with Tomcat in a pre-requisite section. There are so many variables like the user account running tomcat, where to store the keystore file, whether they can use a 3rd party certificate or self-signed, is there load-balancing, etc. Either way, once a user has Tomcat running with a certificate then they could begin stepping through our Globus installation.
Basically I think we should not duplicate existing documentation, and it will allow us to focus on Globus.
To further articulate the differences and similarities between the EGEE Grid, Amazon Cloud and the Public Health Grid...
The EGEE Grid service focuses on "short-lived batch-style processing (job execution)". The Public Health Grid has these services available and plan to research and deploy in the future but our current focus is on public health and health specific services.
The Amazon Cloud (and other clouds) service is "long-lived services based on hardware virtualization". The Public Health Grid does have a virtualization appliance, which we ship on DVD, and we are currently researching Grid in the cloud methodologies (University of Utah and Argonne specifically are working in this area).
The Public Health Grid services are long-lived services built using service-oriented architecture (SOA) methodologies and technologies. The team has build the first services: GIPSE, Grid Publisher and Grid Viewer. EAch of these services may be accessed through this web site and downloaded through our open source repository.
I was asked yesterday what the primary differences were between grid and cloud computing. Struggling for a good answer, I did some quick google searches and I came across this interesting paper called EGEE Comparative Study: Grids and Clouds Evolution or Revolution? done in Nov 2008. I imagine most people have read it, but thought I would post just in case.
Wednesday, June 10, 2009
axis.jar (this has to be the version that is shipped with globus)
jce-jdk13.jar (and this one seems to have those pesky bouncycastle.org security libs)
puretls (*for secure access*)
cryptix32 (*for secure access*)
cryptix-asn1 (*for secure access*)
These are stored in my private repository and downloaded by maven because they are classified as "provided" in the POM (which means they are needed for compilation and testing, but will be provided in the classpath when the war is installed). Thus, I copied them all over to the tomcat/commons/lib directory.
Initial attempts seem to be resulting in NoClassDefFound errors. So it seems there may be a few jars needed still in tomcat alone... or that tomcat is not properly loading the jars in the common space (and that might be the case, I remember RODSA-DAI having issues running in tomcat becuse of library/classpath issues).
Either way, I am hoping to get it sorted out early tomorrow, have it returning data to the map. Then I want to get it tested on a secure-globus environment (in case some more security libraries are needed and to make sure that it plays nice with other secure clients like RODSA-DAI), and then I'll be ready to continue with the refactors.
Right now, the main refactorings have to do with shifting from a "state, zip3, zip5" paradigm to a "region" paradigm. This should allow for showing zip3s and zip5s and states all on the same map. A complimentary paradigm shift will be allowing multiple loads on one map (load one query with one set of pushpins, load another query with another set of pushpins... so you can click both sets of pushpins and compare data from two queries on the same map). All with more services and more realtime options.
Google Groups for SDMX-HD
SDMX-HD is a Statistical Data and Metadata Exchange (SDMX)-based data exchange format intended to serve the needs of the Monitoring and Evaluation community. It has been developed by WHO and partners to facilitate exchange of indicator definitions and data in aggregate data systems.
Experience with the UNAIDS IXF version 2.0 provided a basis for developing the SDMX-HD. New features in SDMX-HD include better support for domain-specific content for various stakeholders, cross-domain 'Metadata Common Vocabulary', hierarchical codelists, and the ability to generate generic or compact XML from a common data model. It will be based on the ISO SDMX standard and be a 'Content-Oriented Guideline' for the exchange of public health indicator data typically for Monitoring and Evaluation (M&E) activities and international reporting, e.g. PEPFAR.
Tuesday, June 9, 2009
The thing is, I had to hand-install these items into the maven repository... which means that anyone wanting to test or build the gridviewer project at this point would have to do the same... and after setting up a grid-node and the GIPSE service, it is arduous and annoying.
Thus, I am hoping to get most of the jars needed into sourceforge. It was done once before for RODSA-dai already, and it should allow for people to get all the jars needed for testing after a simple property file change (which should be easily scriptable)
Furthermore, I am going to test running grid-viewer from the tomcat-enabled globus container... which should have all the libraries co-located in the lib directory, which will hopefully make it a much easier to install grid-viewer after setting up a tomcat instance.
I'm sure there will need to be some finessing either way.... and will still need to check all this in a secure globus environment and get the data returning in the grid view.
But, it's exciting. Nothing feels more gratifying than seeing your test code come back with something other than "NoClassDefFoundError: "
Monday, June 8, 2009
Tomorrow will be re-engineering the tests to match the new test cases, and then, hopefully, a war that can be installed and run (even if the initial phase is just having the service loader pull back the metadata options).
Then, it's implementing some new features in line with the current refactor, which will ultimately make things simpler.
I'm excited, I think grid viewer is turning into something that will work rather well, and still be flexible enough to work in ways not yet anticipated.
Sunday, June 7, 2009
I am thrilled to be working on this open source project that has enormous potential for public health informatics. There are really great individuals on the team and I am excited to be working with them.
Friday, June 5, 2009
On the service side, all the names of the objects being returned have changed, and that means all the classes have to be changed and the ways they are loaded too. That also means the metadata has changed and needs to be set up.
Also, the grid service will now allow for cumulative loads... in that one can load more than one set of results onto the map.. which is a completely new paradigm that will have to be handled, in addition to the already shaky paradigms of multiple, variable regions. And the new idea of having services come from a central repository instead of a database (like a Wiki page or UDDI).
But, I think I have finally gotten it all mapped out in my mind, and have built the task lists, and have started the massive refactoring that will be needed.
The thing is, I know this will happen again (new service bits), so I need to keep in mind where all these changes occur and try to make them obvious and as isolated as possible. Then it is more likely that a change in the GIPSE structure can be propogated without a change in the grid view dynamics, and generally less changes the better.
Finally, I think I am going to introduce a simple "this is the data we got back from the service" page for the sake of debugging and sanity. It will help immensely to see what is coming back to figure out how the grid viewer is interpreting it and illuminating other options or assumptions.
Wednesday, June 3, 2009
PS- In case you haven't noticed from Tom's massive renaming of all the AMDS-related wiki pages, AMDS has been renamed to Geocoded Interoperable Population Summary Exchange (GIPSE) by the NCPHI Director. So whenever you see GIPSE think AMDS.
Issues knocked down:
- Some of the m2eclipse issues: jar projects will find and eat other jar projects just fine, war projects still go "I can't copy this" and give errors that force you to go to the command line where it works just fine.
- Quicksilver, gmap-poly-web showing on my new box: Meaning I got all of the geodata and user data transferred to the new database and connecting okay. I found some rows that got omitted in the transfer, added them, and replaced the CSVs.
- gmap-polygon and gmap-polyweb version 1.0 have been updated to reflect their proper version in their pom files (before they were considered 1.1, not 1.0).
- For some reason M2Eclipse will look up maven artifacts in brians version of eclipse, but not the one I installed.
- The GIPSE spec, and resulting client, have changed a lot from prior loads. This means lots and lots of code needs to be updated in gridviewer not just to use all the peices, but to reflect all the metadata options and the like.
Tuesday, June 2, 2009
And while it has not been particularly difficult, it has been rather tedious.
Most of the morning has been spent getting SQL Server Management Studio and figuring out that the data export for the CSV's of locational data from Postgres was done in some weird format (IE, dump from the terminal client into a text file) which caused all sorts of padding issues which needed to be repaired, and then updating the data back to the repository.
A lot of the afternoon was spent figuring out that there is a strange M2Eclipse behavior with our projects that keeps installed jars from being found (so, in the eclipse instance I would build gmap-polygon, install it, and then try to build gmap-poly-web, only to have it blow up because it couldn't find gmap-polygon.jar, even though it was installed... and when I did it from the command line, it worked just fine). So that is something I am going to have to debug because really, it is much easier to just right click and get the pretty gui to do it for you.
Another portion of the afternoon was spent fixing a vise-versa error where I thought that a branch was the spot for new code, and not the trunk... thus, I was moving recent code from branch locations back into the trunks (luckily, the eclipse/subclipse SVN browser works rather well for that sort of thing... but sourceforge's SVN is rather slow regardless)
By the end of today, I had some small issue that keeps me from being able to run any cool web-code on this new box:
Gmap-Poly-Web is not set to work with gmap-polygon version 1.1. I either need to upgrade gmap-poly-web to deal with the 1.1 version of gmap-polygon, or I need to download the 1.0 version of gmap-polygon and install it long enough to build gmap-poly-web (I will probably do the latter as gmap-poly-web functionality can be seen in both quicksilver and grid-viewer).
I don't have the username/password tables for poicondai on this box. Thus, I will need to find them and import them before I can log into poicondai.
Finally, AMDSCore has been completely replaced with a new service which will need to be integrated, the main deploy target is going to be having gridviewer working with this new service and thus providing more reliable data pulls.
This, if anything, has illuminated several points where we needed to update our "this is how you get/build/install this software" entries and shown us a few spots where we need to work to make things more automatic and user friendly if we want non-experienced programmers to have an easy time with it.
NPCR-Advancing E-cancer Reporting and Registry Operations (AERRO) (previously MERP):
www.cdc.gov/cancer/npcr/informatics/merp. This site has updated Registry diagrams and use case documents that describe the cancer registration business.
North American Association of Central Cancer Registries (NAACCR) (umbrella organization to coordinate standards across the cancer registry community) http://www.naaccr.org/
Electronic Pathology Reporting standard developed by NAACCR that uses the HL7 2.3.1 standard (working on update to HL7 2.5.1) and HL7 Messaging WorkBench to validate the message content: http://www.naaccr.org/index.asp?Col_SectionKey=7&Col_ContentID=501
NAACCR transmission format that is used to transmit data from the hospital cancer registry and from the central cancer registry (state) to the national agencies can be found at: http://www.naaccr.org/index.asp?Col_SectionKey=7&Col_ContentID=133. On page 58 of this document is a table that lists all of the data elements with a cross mapping of the agency that requires collection.
You can find more information about all of the Registry Plus software tools at http://www.cdc.gov/cancer/npcr/tools/registryplus/. A link to additional information on Link Plus can be found at the same URL.
Monday, June 1, 2009
Some of the things were easier to set up. The biggest help was being able to share other windows development boxes and being able to nab their already-downloaded copies of Java, eclipse, and Globus-ws core files. The setup for globus also seems to be a lot snappier in Windows (but at the same time, I am doing a much less involved install, and this isn't the first time I've done it).
Otherwise, I was able to build and run the client from the AMDS Service that was pulled over from another computer. Now I need to build and run the client from a fresh client downloaded from SVN and configured myself... so I anticipate a few "I am not sure what this property is" issues, but then it will be configuring the client, building the jars that grid viewer needs on my box (gmap-polygon) and figuring out what is needed between those steps.
You can also download the raw gar from sourceforge. But I recommend getting the source and building with your own configuration.
This service uses the updated 5/31 AMDS draft.
This service is specifically developed to share BioSense aggregate data over PHGrid, but can actually be used for any JDBC data source that wants to be shared using the AMDS spec.
This release is significantly different than the 4/30 alpha release. Specifically we're using Introduce 1.3 (big improvement over 1.2) for service development and configuration management and iBATIS for easy db access / ORM. This release is smaller than the alpha release in size and lines of code so theoretically it will be easier to use. Please let me know any comments. We'll be following the weekly build schedule with a target of July 8 for code freeze.
Update: Tom asked me to explain that we're using iBATIS rather than Hibernate. Both are decent JDBC/ORM frameworks. I chose iBATIS because it has a lighter footprint and I got it working in about 15 minutes. This doesn't mean we won't use Hibernate in the future, but just that we're using iBATIS for now.