Public Health Grid (PHGrid) - Research and Development: April 2008

Wednesday, April 30, 2008

A thousand little libraries.

I am, I think, very close to getting the first of several little JSP applications working, but two things keep holding me back...

The first is that I keep getting cryptic OD errors and for some reason my setup is immune to logging... and the second is that I cannot try and debug using my favorite IDE (Eclipse) because it just seems to not want to run any OD queries from the workspace.

In both cases I get cryptic errors which seem like "Ogsa-dai-" followed by a timestamp.. which are thoroughly unhelpful.

I am guessing this all boils down to some sort of classpath error, libraries not being found somewhere... especially because all this code seems to work but only from one directory and only from the command line. If I try and run a modified class that worked perfectly from one directory in the servlet engine or eclipse... *bam* cryptic error.

I am going to have to sit with the Edinburgh guys for some insights methinks.

Cheers,
Peter.

Grid Questions

Here are the answers to the Johns-Hopkins the questions.

What services will be used on the grid?

We are currently using the following Globus services:

GridFTP - A file transfer protocol that provides secure, robust, fast and efficient transfer of data.

Grid Security Infrastructure OpenSSH (GSI-OpenSSH) - A Globus replacement for OpenSSH
Open Grid Services Architecture Data Access and Integration (OGSA-DAI) - A middleware product which supports the exposure of data resources, such as relational or XML databases, on to grids.

Reliable File Transfer Service (RFT) - A web service that provides “job scheduler"-like functionality for data movement. RFT should be used for extremely large data transfers.

Replica Location Service (RLS) - A registry that keeps track of where one or more copies, or replicas, of files exists on physical storage systems in a Grid environment.

Simple Certificate Authority (SimpleCA) - A package that provides a simplified certification authority for the purpose of issuing credentials to Globus Toolkit users and services. This package is often used for testing Globus installations.

Web Services Grid Resource and Allocation Management (WS GRAM) - A Unix server suite that enables users to submit, monitor, and cancel jobs on Grid computing resources.

Web Monitoring and Discovery System (WebMDS) - A web-based interface to WSRF resource property information that is available as a user-friendly front-end to the Index Service.

Coming Soon:

Data Replication Service (DRS) - A web service that combines RFT and RLS to provide a pull-based replication capability that ensures a specified set of files exists on a storage site.

Java Commodity Grid Kit (Java CoG kit) - A Jglobus library that provides a client-side API and limited server side functionality to the GT2-based services such as GRAM and MDS. It also provides a client-side API for GridFTP, MyProxy and has extensive GSI support.

MyProxy - An online credential repository used to eliminate the need for manually copying private key and certificate files between machines.

How will you demonstrate grid functionality at the PHIN conference? The method for demonstrating grid functionality is still being decided.

Will multiple grid user accounts be created for the PHIN conference? No, we will create a single demo account that will have controlled access to grid data.

Monday, April 28, 2008

Tomcat in code alley

I have started to actually write some JSP code. The first one is going to be a simple query that can be run through a webpage... then I am going to start trying more dynamic pieces where different resources are selected and polled.

I am also discussing the best way to discover OGSA-DAI resources on the grid. We were thinking of perhaps building a MonALISA extension to poll the OD on a given box and then sending that back to a repository, which would reduce the hits on a network since it would be passive polling.

Tomorrow, I hope to get to complete a query through a webpage... but I am anticipating a lot of classpath resolution errors will hold me up (as I have very rarely worked on JSP and Servlet projects where this didn't occur)

Using an existing Certificate Authority (CA) within Globus

Question:

Hi Dan,

I was wondering if there was any support for using existing
authentication sources with the PHGRID. I am interested in using the
University of Washington's Kerberos and Shibboleth services to
authenticate against PHGRID services.

Answer:
Yes, Globus can be configured to trust x.509 certificates issued by a 3rd party CA. This is accomplished by copying the 3rd party CA's certificate hash file and signing policy to the /etc/grid-security/certificates directory.

Example Hash Files:
31f15ec4.0
31f15ec4.signing_policy

Note: The certificate hash is located by running the following command:
$GLOBUS_LOCATION/bin/openssl x509 -hash -noout < ca_certificate_file

The certificate's distinguished name must match the pattern found in the signing policy file.

Friday, April 25, 2008

Daily Lab / POC Activities

Extramural:

Configured Myproxy server on lab1002. Further testing needs to be done in order to determine the best practices for implementing security and scalability within Myproxy.
(I.E Accepted Credentials, Renewal Policy, Passprase Enforcement, Certificate Map, etc.)

The CaBIG developers are currently working on installing a CaBIG grid node on lab1004. I provided them with a history of the previous configuration efforts. The current developers ended the day by configuring /cacoresdk/conf/deploy.properties. Progress will resume on Monday.

Thursday, April 24, 2008

OGSA-DAI over large sets

I had a few discussions with Alastair over how we think OGSA-DAI would work over large sets... like hundreds of thousands of distributed nodes

Some queries would be able to just crawl... namely things like aggregations (counts of an infection by zip code) where the set wouldn't get large just the counts would increase... but if you were doing joins on national or global data... it would probably need some sort of tiered architecture.

In my mind I see a MonALISAextension being built to monitor and manage OGSA-DAI instances... handling the delegation of what collects what data from 100 or so nodes and then propogates it up... Alastair pointed out that there were already plans for OD to manage itself in a assymetrical tree concept. It would probably be merged.. with MonALISA providing feedback about which nodes were reachable and their lag times... and OD then selecting to pull the queries through it's more efficient channels.

Otherwise, I have since started to focus on discovery concepts for my demonstrations, being able to use the APIs of globus and OGSA-DAI to figure out what data resources are available and at what locations.

Wednesday, April 23, 2008

More complex data

I have started using more complex data (sample data from OpenMRS) in my tuple merge and join tests... I might supplement that with RODS test data soon too.

I am also starting to consider the scalability and distributions of the grid when it starts to get Big. I think that OGSA-DAI will oversee most of the data pulls and merging, but only across about a hundred nodes which would then store their results in repositories. That way any polling algorithms will only have to pull from about a hundred "super" nodes instead of tens of thousands of regular nodes. The same would go for resource discovery.

Tuesday, April 22, 2008

OGSA-DAI update

I have installed the extensions for OGSA-DAI that allow more complicated joins and such... unfortunately when trying to modify the example client I start getting an error about not being able to find the newly installed resource.

After some research it appears that the new resources are being installed on the Globus container but not the tomcat container and I am testing with the tomcat container.

Thus, tomorrow morning will either be help in getting the new bits deployed to the tomcat container, or help debugging the Globus container connection.

Tomorrow I will probably start pulling from similar schemas on different computers and trying more complicated things like aggrigation. At the same time branching out to dynamically built workflows and more secure communications, assuming we get this bug out of the way quickly

Daily Lab / POC Activities

Extramural:

VM Appliance DVD created for Pittsburgh
Updated MonALISA installation document
Researched exchange network
Modified MonALISA module config on lab 1002

Update on NAPHIT Conversation

Following up the conversation held with NAPHIT, the following is a proposed approach to collaborate.

Pre-Public Health Information Network (PHIN) conference

Target a webinar for July / Aug for the NAPHIT membership to more concretely describe PH Research Grid activities to date, and possibilities for the future.

PHIN

Hold meeting to scope potential pilots, (e.g. for the security model/ administration).

Post-PHIN

Conduct pilot project with NAPHIT membership.

Monday, April 21, 2008

code pulling from two databases

They aren't integrating fully yet (I need to ask about join/aggregate/merge commands tomorrow morning), but I have info coming in from different databases displaying to one screen.

I spent a lot of today reading through documentation about grids, and updating the OGSA-DAI use case document and Proof of Concept, and then reading through some of the client and developer documentation.

Tomorrow I hope to get into the nitty-gritty of local data sources, merging data, and maybe even secure transactions.

Friday, April 18, 2008

HealthGrid 2008 Conference - Press Release

==================================

PRESS RELEASE

For Immediate Release

For more information: info@healthgrid.us

=================================

Program Finalized for First International HealthGrid meeting to be held in U.S., June 2-4, 2008

The HealthGrid U.S. Alliance has finalized the program for the sixth annual International HealthGrid conference - the first one to be held in the United States, “Global HealthGrid: eScience Meets Biomedical Informatics.”

"As the first HealthGrid conference in the America's, this is an historic event," says Jonathan Silverstein, M.D., President, HealthGrid.US, and Associate Director, Computation Institute, Argonne/University of Chicago. "The program will appeal broadly to the interdisciplinary eScience and biomedical informatics communities, including physicians, medical educators, students, epidemiologists, biomedical informaticians, military medicine specialists, computer scientists, security and policy makers, economists, and futurists.”

The conference begins on June 2 with a day of workshops and tutorials which will provide training and demonstrations, including basic Grid concepts, case studies and the most advanced topics on infrastructure and applications for computational biologists and public health informaticians.

The formal conference will kick-off on June 3, with welcoming remarks by Robert J. Zimmer, Ph.D., President of the University of Chicago. Following the welcoming ceremonies, Ian Foster, Ph.D., will provide a keynote presentation on "eScience meets Biomedical Informatics." Scientific papers will be complemented with a roundtable discussion from U.S., European, and Asian government leaders on "Government eScience and Cyberinfrastructure Programs for HealthGrid," moderated by Michael Cowan, M.D., Chief Medical Officer, Bearing Point, and former U.S. Navy Surgeon General. Cowan will precede the government roundtable discussion with a keynote presentation on "The Role of Government in the Future Knowledge Society."

"The main importance of the meeting, apart from communicating information about research programs, is for partners to meet and network. The annual HealthGrid conference provides an opportunity to plan and prepare for future joint collaborations of global scale." says Vincent Breton, Centre National de la Recherche Scientifique and co-chair of the Program Committee.

The final day of the conference provides a major networking opportunity, starting with a keynote presentation by Peter Hunter, Ph.D., director, The Bioengineering Institute, University of Aukland, on the "Physiome Project," a worldwide public domain effort to provide a computational framework for understanding human and other eukaryotic physiology, i.e., organisms whose cells are organized into complex structures enclosed within membranes. It aims to develop integrative models at all levels of biological organization, from genes to the whole organism via gene regulatory networks, protein pathways, integrative cell function, and tissue and whole organ structure/function relations.

Because public private partnerships become increasingly important to global science, an industry roundtable will provide business models for HealthGrid in the emerging knowledge economy and the impact of industry innovation.

The closing session will highlight perspectives from GridAsia, by Simon Lin, Ph.D., Academia Sinica, Taiwan, as the HealthGrid continues to expand globally. "The international networking aspects of the HealthGrid benefit a broad range of biomedical informatics programs," says Mary Kratz, Executive Vice President of the HealthGrid.US Alliance and Senior Information Services Specialist, University of Michigan. "We are delighted with the strong work of the Program Committee to bring together an outstanding program for HealthGrid 2008." The selected papers will be published in the series /Studies in Health Technology and Informatics/, IOS Press (http://www.iospress.nl/), and referenced in MEDLINE.

The HealthGrid conference is the premier conference on the transformation of biomedical research, education and medical care through the application of Grid technologies. HealthGrid is dedicated to enhancing biomedical research and healthcare delivery, creating an open collaborative virtual community, and communicating the collective knowledge of the HealthGrid. Scheduling and other conference details can be found at http://chicago2008.healthgrid.org/.

Daily Lab / POC Activities

Extramural:

Updated the VDT installation document to reflect the updated certificate authority information
Created a MonALISA installation document
Installed MonALISA on Lab 1001 based on document content
Issued user and host certificates for Washington and Johns-Hopkins
Started researching Exchange Network nodes.

Recruiting the hinterlands

On Wednesday, April 15, Tom Savel, Ken Hall, and John Stinn spoke with Tim Stephens and Mike Hill from the National Association of Public Health Information Technology (www.naphit.org), a national non-profit organization that provides leaders in public health information technology (IT) with a venue to exchange ideas and experiences, discuss and shape current public health information policy, and learn about tools and technologies that help them better support public health.

The discussion centered on the background of the PH Research Grid, the public health and business drivers for the CDC for exploring federated architectures, and the research approach the CDC is sponsoring in standing up nodes, and exploring the technology stack. The NAPHIT group was very intrigued by the possibilities, and is eager to help in defining the administration and security models the grid would entail.

The next steps may include a webinar to the NAPHIT membership, and a definition of prospective projects to collaborate on in the future. More to come....

Thursday, April 17, 2008

2008 PHIN Conference Submissions & PH Grid

Greetings all... just a quick update:

1. Kudos to everyone working on PH Grid Research- the advances being made are amazing!

2. Submissions are in process to request at least 2 full sessions on Grid-related research activities. My sense is, before all is said and done, there may end up being 6 to 9 separate grid-related presentations at the conference - only time will tell.

3. Our NCPHI OD public health informatics fellow, Dr. Muzna Mirza, will now be spending much of her time focusing on PH Grid research as well. Welcome Muzna!

Make that three working nodes

I got in early today to have a bit more time with the OGSA-DAI folks at Edinburgh, and the results are wonderful.

I found the configuration file that was improperly set up with their help and now local OD calls are working on all three nodes. Furthermore, I can ping nodes on other computers and access their resources... so I am now ready to start writing clients.

Alastair gave the sage advice that I should probably work on building clients before I start focusing heavily on security... because security is pretty much changing the calls and the hosts and adds a whole new level of complexity and I should get a better feel for how the clients operate when I am not worrying about everything being validated on the grid.

Thus, the next step is to pore over the included source examples to see how various things are called and then to start writing my own little clients that put various activities in the pipeline.

Otherwise, I am very excited and look forward to making some neat little clients and some snappy JSPs to show off the various functionality of OD.

Cheers!

Way Cool from Ian Foster

Clouds over Chicago

Way before clouds were popular (remember then?) my colleagues Kate Keahey and Tim Freeman started work on their workspace service, a system for on-demand creation and management of virtual machines on remote computing systems. They now have an implementation that interfaces both to clusters running conventional schedulers and to Amazon EC2. It's distributed as part of the Globus software, or you can download it separately. Kate and Tim have recently established a deployment of the workspace service in the Computation Institute at U.Chicago and Argonne. With a nod to the new cloud meme, they've named it Nimbus. They say:

The University of Chicago Science Cloud, codenamed "Nimbus", is a web service that delivers compute capacity in the cloud for scientific communities. The Nimbus' simple client allows you to obtain customized compute nodes (that we call "workspaces") that you have full control over quickly, easily, and in ways that can be fully automated. Using the Nimbus cloud you can request the exact compute capability you currently need for your application and scale it up or down as your needs dictate.

Nimbus provides compute capability in the form of Xen virtual machines (VMs) that are deployed on physical nodes of the University of Chicago TeraPort cluster using the workspace service. We currently make 16 nodes of the TeraPort cluster available for cloud computing. Nimbus is available for members of scientific community wanting to run in the cloud. To obtain access you will need to provide a justification
(a few sentences explaining your science project) and a valid grid credential (If you don't have a credential, email us. We can help). Based on the project, you will be given an allocation on the cloud. Send your requests, demands and cries of anguish to workspace-user@globus.org (for cries of anguish mp3 format is acceptable).
In a typical session you will make a request to deploy a workspace based on a specified VM image. You can either use one of the VM images already available on the cloud (we provide a command that allows you to see what's already there) or upload your own VM image. On deployment, the image will be configured with an ssh public key you provide -- in this way once the workspace is deployed, you will be able to ssh into it and configure it further, upload data, or run your applications.

Wednesday, April 16, 2008

Daily Lab / POC Activities

Extramural:

Built and tested the new NCPHI Certificate Authority
Testing of Globus 4.0.7 halted due to MySQL errors thrown during the container installation. (Missing file libssl.so.4) This is part of the Openssl package, but Globus is unable to locate the file. A possible fix is creating symbolic links to the file's current location. I will resume testing tomorrow.
Updated VDT installation document based on the new NCPHI CA.
An updated Certificate Authority file and new install instructions has been distributed to the new grid sites.
The NCPHI Certificate Authority has been configured as a Trusted_Authority on Dallas and Tarrant.

Make that two non-functional OD boxes

Well, I got the Ubuntu box set up for running Globus and Tomcat... downloaded and built OGSA-DAI, deployed it to tomcat and the Globus container... created a resource, deployed the resource, and tried to run a SQLClient against the resource... and got this..

Exception in thread "main" uk.org.ogsadai.client.toolkit.exception.RequestCompletedWithErrorException: [1208374501596:4] uk.org.ogsadai.client.toolkit.REQUEST_COMPLETED_WITH_ERROR : ogsadai-1195888958d
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.checkSynchronousExecutionIsComplete(Unknown Source)
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source)
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source)
at SQLClient.executePipeline(SQLClient.java:167)
at SQLClient.main(SQLClient.java:290)

It is the same error I was getting with the second SuSE box that I gave up on... let me tell you what this error does not tell me:

* I do not know what the error really was. Because I don't know what error "ogsadai-1195888958d" is and there is no documentation for it.
* This error does not tell me what broke.
* I am apparently incapable of turning on the logging so that better errors are thrown closer to the problem source.

After narrowing down that other things work (The DRER is accessible via tomcat, MySQL is up and running, the logins.txt file has all the appropriate information, the jdbc file I used worked during the data creation, and the MySQLDataResource file looks to be correct) I have no idea what is broken. I mean maybe something was written wrong in the MYSQL resource, but I thought I fixed it and restarted tomcat and it should work and it doesn't.

I think I am going to have to get with the Edinburgh folks early tomorrow, start up a webex and just have them look at the boxes to see what was mis-configured or allow them to try setting up a resource and get the logging working... because I am running out of ideas that aren't just pure churn. I will also talk to them about making a more robust resource making client that will test resources being created for validity and connectivity before being deployed.

RODS Dataset for OGSA-DAI

Today, I provided a synthetic dataset of emergency department visits from Allegheny County Pennsylvania to the OGSA-DAI group. The dataset contains the same table and fields used by RODS to store emergency department visit information. The fields include date of visit, age, gender, zipcode, and chief complaint. It comprises 53020 records for a period of 1 month.

Unlike the OpenMRS datasets, the chief complaint/reason for visit field is not coded (i.e., uses a vocabulary) and is free text. This is the way that most emergency department surveillance systems collect their chief complaint data (i.e., non-coded free text). This presents interesting challenges for federation.

At RODS, we've solved the free text chief complaint challenge before using text indexing and preclassification of chief complaints into syndrome categories. We'll need to use similar solutions but this time under a federated api.

Tuesday, April 15, 2008

PHGrid on Ubuntu

Yesterday was spent trying to get the postgresql instance on the second computer running through ogsa-dai, and it failed. Luckily, we found that the original install of OGSA-DAI was working and a few servers just hadn't been restarted since the last time it was shut down.

Thus, today, I went ahead and tried some of the other example clients on that server and actually got the secure communications set up with the help of Dan and Alastair.

The other big news is that we got the Ubuntu box installed yesterday, so today we made headway in getting a VDT install of Globus on Ubuntu, it is almost up and running, now we are just dealing with all of the certificate and authentication stuff, then it will be setting up tomcat, deploying OGSA-DAI, and probably setting up a little mysql database.

As for the "second computer" I am going to try and get MySQL running on it too.. then we can start trying to code OGSA-DAI proofs of concepts that pull from a PostgreSQL database and two MYSQL databases on three different computers.

Cheers,

Monday, April 14, 2008

Daily Lab / POC Activities

Extramural:

Installed Globus 4.0.7 on lab 1003
Updated VDT installation document based on 1.10.0 changes
Contacted Laura about HealthGrid CA details
When installing VDT 1.10.0 Globus-WS(Web Services Container) the following error is returned:

Downloading [vdt_globus_rft_server-VDT1.10.0-x86_rhas_3.tar.gz] from [http://vdt.cs.wisc.edu/software//globus/4.0.7_VDT-1.10.0]...
Command failed:
/opt/vdt/mysql/bin/mysqladmin -S /opt/vdt/vdt-app-data/mysql/var/mysql.sock -u root create rft_database
Exited with value 127
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-WS] not [installed]:
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-WS-Server] not [installed]:
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-Base-RFT-Server] not [installed]:
Shell command [. /opt/vdt/vdt-questions.sh; . /opt/vdt/globus/etc/globus-user-env.sh; vdt/setup/configure_rft] returns with an error code.

Currently researching the error.

Note: The Globus package completed without error, but Globus-WS did not.

Friday, April 11, 2008

Errors on the original OGSA-DAI box

So, I came back in today thinking I would try some of the more advanced clients on the original box I have gotten OGSA-DAI working on.

Unfortunately, it's not working either, and I am getting the same error.

Also, I have an apppointment Monday morning and I anticipate I will not be able to get in before Alastair sets off for home... so I imagine any debugging will probably have to be put off until Tuesday.

Thus, I will spend the rest of today and probably a significant portion of Monday reading up on documentation and javadoc and updating the PHGrid - OGSA-DAI document so we can better map the application.

Dan has updated the spreadsheet and is still researching CA.

PHGrid / OGSA-DIA Conference Call April 10, 2008

To read the minutes from this discussion click here

Thursday, April 10, 2008

OGSA-DAI almost set up on another computer.

I have OGSA-DAI in a similar position on a second server.... where it all seems to be installed properly but whenever I try to run the simple client I get cryptic errors like this:

ava SQLClientException in thread "main" uk.org.ogsadai.client.toolkit.exception.ResourceUnknownException: The resource null is unknown. at uk.org.ogsadai.client.toolkit.presentation.gt.GTDataRequestExecutionResource.mapExcpetion(Unknown Source) at uk.org.ogsadai.client.toolkit.presentation.gt.GTDataRequestExecutionResource.executeRequest(Unknown Source) at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source) at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source) at SQLClient.executePipeline(SQLClient.java:167) at SQLClient.main(SQLClient.java:290)

That error doesn't help me right now, I am not familiar enough with how OD does things to be able to go "oh, it's failing looking up this"... and I spent most of the day before that trying to just get the heartbeat jsp to stop throwing errors... and all the "unknown source" means I cant delve further into code to see what is being cantankerous.

I have tried double checking the resources and the logins.txt file (which was the problem I was having with the first installation) and I am out of ideas, so I am going home and I won't be able to catch the guys in Edinburgh to help troubleshoot because I won't be able to get in early in the morning. Thus, I will probably spend tomorrow researching and documenting how two PHGrid nodes running OD will talk to each other in the hopes of eventually having two of them functioning properly.

Wednesday, April 9, 2008

Working with the OGSA-DAI folks

I have started putting together a document and am collaborating on it with the folks over at University of Edinburgh who are writing OGSA-DAI code as we read.

The focus is to establish a project for OGSA-DAI on Grid nodes, and I envision that to be a polling framework (so that other programs can select a series of nodes, say "get me data from databases connected to those nodes, merge them (via join or aggregation or union) and send me the results").

More simply, and much sooner, I am documenting and hope to establish a Proof of Concept that will just get data from two test databases on two nodes merged and displayed. The issuing of commands and data transfer needs to be done through Globus (so that the grid security paradigm is maintained). If I can sort out the commands and the workflow for that, it will be the perfect "Hello World" application that shows the essential power of OGSA-DAI and Grid.

Then I can start thinking about how to handle the sort of dynamic commands that will likely need to be built on the fly, and how to handle data resource discovery across different nodes on the grid. And before you know it, grid nodes start getting OGSA-DAI and DBA's at the medical centers start setting up views for data polling and we are able to go "this is the count of various [infection] in [region]"

Cheers!

SimpleCA Explained

SimpleCA is used to create x.509 certificates locally instead of using a remote Trusted Certificate Authority. (Health Grid, Verisign) A SimpleCA is primarily used to issue x.509 certificates for testing purposes. For example, PHGrid is currently using a SimpleCA to issue host and user certificates for an internal grid in support of OGSA-DAI.

Using a SimpleCA:
A SimpleCA can be created by running the following command:

$GLOBUS_LOCATION/setup/globus/setup-simple-ca

This command will generate the file, globus_simple_ca_hash_setup-0.19.tar.gz in the ~/.globus/simpleCA directory. This file needs to be distributed to each grid node that will be using the new SimpleCA. Each node will need to run the following commands in order to recognize the new SimpleCA:

$GLOBUS_LOCATION/sbin/gpt-build globus_simple_ca_hash_setup-0.19.tar.gz
$GLOBUS_LOCATION/sbin/gpt-postinstall
$GLOBUS_LOCATION/setup/globus_simple_ca_hash_setup/setup-gsi

You may request host and user certificates from the new SimpleCA after running the above commands.

Post Glossary:
Certificate - A public key and information about the certificate owner bound together by the
digital signature of a CA. In the case of a CA certificate the certificate is self signed,
i.e. it was signed using its own private key.

Certificate Authority - An entity that issues certificates.

Host certificate - A certificate belonging to a host. (I.E, grid node) Host certificates are typically stored in the /etc/grid-security/hostcert.pem file.

SimpleCA - Simple Certificate Authority

Trusted CA – CA trusted by the grid node. Trusted CA's are found in the /etc/grid-security/certificates directory.

User certificate – A certificate belonging to a user. (I.E, Globus, Bubba, Jenny, Forest) User certificates are typically stored in the $HOME/.globus/usercert.pem file.

Tuesday, April 8, 2008

Daily Lab / POC Activities

Extramural:

Was unable to copy VM to Jeremy's portable drive due to the lab security policy. Chris informed us that we have to get approvals from Toby before any data can be taken from the lab.

Worked on configuring internal grid for OGSA-DAI testing. Currently dealing with clock-skew errors between 1001 and 1003. Configuring NTP does not seem to help in this case. I need to research it further.

A port scan performed by Chris revealed an open VNC port of lab 1002. To correct the violation I shutdown vnc and uninstalled:

xorg-x11-Xvnc-6.9.0-50.45
tightvnc-1.2.9-201.12

It appears VNC was part of the default O/S installation. It has been removed and lab1002 is no longer listening on that port.

Monday, April 7, 2008

OGSA-DAI 3.0 is alive, now to make it dance.

With the help of the wonderful people at U of Edinburgh, I got the SQLClient that had been written to successfully poke a database using the OGSA-DAI tools.

I have since spend the rest of the day getting another grid node set up locally so that I could start pursuing clients that access databases on other grid nodes. The "Hello World" of this app would be the ability to enter a query and have the results of two different grid nodes data resources concatenated (or processed in some other way that otherwise merges them) and returned.

I feel that the Killer App version that would result from such studies would be something similar to globus-url-copy: A command line tool (or web page, or java program) that would take a query, a series of nodes, and then run the queries on the data resources of those given nodes.

Right now I am in that phase where I have seen several possibilities but not gauged the limitations... and I am not quite sure what all the commands are or what they do or how far they can span. I am also not quite sure how GlobUS fits into the OGSA-DAI toolkit and what would be the best strategy for minimizing the amount of work that would need to be done by a grid-node installer to get a medical centers information as accessible, yet secure, as possible.

Saturday, April 5, 2008

Added feedback and glossary to blog!

Ok... they may not be perfect questions... but any feedback is better than no feedback. If you have ideas about changing/adding questions...just share them. And yes, I was the one who already voted on the "Great Scott" option.... how could I not?

Dr. Tom

p.s. Could someone add a one-line post explaining what "SimpleCA" is? Better yet, I added a glossary tab to the grid status spreadsheet... let's see how that works.

tgs

Friday, April 4, 2008

Daily Lab / POC Activities

Daily Lab / POC Activities

Extramural:

Configured a SimpleCA on lab 1001 to test OGSA-DAI installation. We are testing OGSA-DAI on an internal grid before we do a roll out on PHGRID.
Analyzed hack attacks on lab servers and plugged security holes:

It is recommended that strong security measures be put in place to fend off hacker attacks like the examples listed below. The first two attacks were conducted before the server was protected by hardened security measures. The third attack was conducted after the server was hardened. I did a trace on the source of the hack and included the hacker information and method of attack below.

Hacker Info:
IP address: 203.144.221.26
Host server: 203-144-221-26.static.asianet.co.th
Network: TRUENET-TH
ISP/organization: True Internet Co., Ltd.
ISP/organization address: Internet Service Provider, Bangkok, Thailand.
Geographical location: Thailand
Email: abuse@trueinternet.co.th
Phone: +662 6411800
Fax: +662 6421557

Attack Method:
Attempted compromise the server using a dictionary hack on common system accounts and common user names. This attack was attempted hundreds of time by this hacker. The hacker was clearly using a to generate so many attacks in a short amount of time. Below is an excerpt of the attack.

Feb 28 15:53:29 gump sshd[5368]: Invalid user admin from 203.144.221.26
Feb 28 15:53:32 gump sshd[5370]: Invalid user guest from 203.144.221.26
Feb 28 15:53:35 gump sshd[5373]: Invalid user master from 203.144.221.26
Feb 28 15:53:56 gump sshd[5385]: Invalid user admin from 203.144.221.26
Feb 28 15:53:58 gump sshd[5387]: Invalid user admin from 203.144.221.26
Feb 28 15:54:01 gump sshd[5389]: Invalid user admin from 203.144.221.26
Feb 28 15:54:04 gump sshd[5392]: Invalid user admin from 203.144.221.26
Feb 28 15:54:19 gump sshd[5402]: Invalid user webmaster from 203.144.221.26
Feb 28 15:54:22 gump sshd[5404]: Invalid user username from 203.144.221.26
Feb 28 15:54:25 gump sshd[5406]: Invalid user user from 203.144.221.26
Feb 28 15:54:30 gump sshd[5410]: Invalid user admin from 203.144.221.26
Feb 28 15:54:44 gump sshd[5424]: Invalid user danny from 203.144.221.26
Feb 28 15:54:47 gump sshd[5426]: Invalid user alex from 203.144.221.26
Feb 28 15:54:50 gump sshd[5428]: Invalid user brett from 203.144.221.26

Hacker Info:
IP address: 202.63.185.230
Host server: 202-63-185-230.static.exatt.net
Network: EXATT
ISP/organization: Exatt Technologies Pvt. Ltd.
ISP/organization address: 510 Akruti Arcade,, Opp Wadia School,, J. P. Road., Andheri (W), Mumbai, Maharashtra, India., Internet Service Provider
Geographical location: India in
Name: IP-Admin NOC
Email: noc_mum@exatt.com
Phone: +91-022-5645-0200
Fax: +91-022-5691-9342

Attack Method:
Spoofing while attempting to compromise the server using a dictionary hack on common system accounts and common user names.

- POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:43 gump sshd[4986]: Invalid user sara from 202.63.185.230
Apr 2 11:02:43 gump sshd[4986]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:53 gump sshd[4990]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:55 gump sshd[4992]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:57 gump sshd[4994]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:00 gump sshd[4996]: Invalid user ftpuser from 202.63.185.230
Apr 2 11:03:00 gump sshd[4996]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:02 gump sshd[4998]: Invalid user uid from 202.63.185.230
Apr 2 11:03:02 gump sshd[4998]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:04 gump sshd[5000]: Invalid user gid from 202.63.185.230
Apr 2 11:03:04 gump sshd[5000]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:06 gump sshd[5002]: Invalid user shell from 202.63.185.230
Apr 2 11:03:06 gump sshd[5002]: Address 202.63.185.230 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!

Hacker Info:
IP address: 218.207.69.139
Host server: 218.207.69.139
Network: CMNET
ISP/organization: China Mobile Communications Corporation
ISP/organization address: Mobile Communications Network Operator in China, Internet Service Provider in China
Geographical location: China cn
Name: Jinxia Sun
Email: abuse@chinamobile.com
Phone: +86-10-66006688-1755
Fax: +86-10-66006012

Attack Method:
Attempted to breach the server via SSH, but the server has been modified to reject unauthorized users. The hacker tried to breach the server twice and moved on.

Apr 4 06:26:41 gump sshd[12345]: refused connect from ::ffff:218.207.69.139 (::ffff:218.207.69.139)
Apr 4 06:32:33 gump sshd[12390]: refused connect from ::ffff:218.207.69.139 (::ffff:218.207.69.139)

OGSA-DAI 3.0 is fibrillating

Today I spent the better part reading through the documentation configuration guide, and then reading through the client configuration guide. The end result is I had a small client java program written mostly for me, with a few modifications by me to adjust hostnames and the like.

The program is designed to send a small query out to the OGSA-DAI attached database, and then transform the results and perform a task that would enhance efficiency. The program as written in the documentation assumes a MySQL-based data resourse, so I adjusted the program to use the PostgreSQL-based data resource I had deployed earlier that morning.

When I run it, I get an error. I don't know what is causing the error since it seems to happen in another thread and the cause is cited as "unknown source". I am guessing I either borked the deploy to tomcat/globus, or I am heading to the wrong URL in the program. It's strange because the dai-manager.jsp is telling me the services are there... but when I try to access their WSDL's I get AXIS errors.

At this point I am at a loss for ideas, so I will reach out to some of the OGSA-DAI contacts about some help checking my configuration and start to request more information about how they set up the distributed query application seen running through the link Ken posted yesterday.

Cheers, Peter

Thursday, April 3, 2008

Grid Node Status Document Moved...

Moved to the the upper left corner of the page - for fast and visible access. Ok, that was a short and sweet update... go team!

OGSA-DAI 3.0 has a heartbeat...

Today I spent most of the morning and a bit of the afternoon finishing the installation (lots of builds) for OGSA-DAI on my main environment. I was hampered a bit by silly things like setting the classpath due to my occasional memory lapses with linux conventions (remember, it's 'source ./setenv.sh'... not just './setenv.sh') I then spent the second part of the day installing OGSA-DAI on another lab node. But they seem to be running well enough between the two nodes.

Tomorrow I expect to read through the documentation for creating clients that use OGSA-DAI, and then I will hopefully run a few across the two machines. I also think there will be some deployment of the OGSA-DAI code to other globus nodes, perhaps some of them on the external phgrid networks.

After that, accessing the client via a JSP and then that goal will have been complete. I imagine the next steps will involve using different databases at different globus nodes and seeing how well the data integrates and showing that we can do it.

The site IT team still hasn't installed ubuntu on my dev box, so that is on hold.

Cheers,
Peter

Daily Lab / POC Activities

Extramural:

Updated Spreadsheet
Installed and configured prerequisite OGSA-DAI software on lab 1003 and 1001 (I.E. Postgresql, Tomcat, Ant, Java)
Installed Globus Web Services on lab 1001
Contacted grid sites for node installation. See spreadsheet for details.

OGSA-DAI Demo

This is to inform the list that a demo has been put up which implements a basic and optimised version of the example query:

Select postal_code, count(patients) from global_db where diagnosis = 'X' order by count
where global_db

is a federation of two databases. In this case, the databases are the split version of the OpenMRS demo data with dummy post codes.

The demo is accessed through JSP pages which will return the results as an HTML page. The JSP page is based around an OGSA-DAI client program. This is to allow users with OGSA-DAI client libraries to access the demo at this stage.

The basic version of the example query operates without any optimisation to manage the amount of data transfered by the database resources and OGSA-DAI. The optimised version uses optimisations on the OGSA-DAI query plan to push operations to the databases to reduce the amount of data transfer needed but still retain the ability to do federated queries on multiple resources using OGSA-DAI and will allow different database schemas to be used. The demo is setup as JSP pages,

Basic:
http://test.ogsadai.org.uk:8080/dai/cdc-query.jsp
Optimised:
http://test.ogsadai.org.uk:8080/dai/cdc-s-query.jsp

This are available for use now. If the URLs do not work, it is possible the OGSA-DAI instance is down for updating or maintenance. An email will be sent when the URLs are removed from use permenantly.

If possible, could a more realistic data set be made available which can be integrated into the demo and used for further development.

Wednesday, April 2, 2008

Greetings from the new guy!

Greetings everyone, I am the new developer to the project, Peter M White.

I have been around for about a week and a half now and have spent my time familiarizing myself with all the different facets of

The GlobUS grid toolkit (which contains all the bits and peices to communicate between grid nodes)
The MonALISA distributed monitoring toolkit, which basically gives you many interfaces and ways to monitor the health/capabilities of various nodes on a grid.
The Java GlobUS CoG toolkit (which is a series of Java GUI front ends and tools for the GlobUS library of commands... there is also a CoG for python.)
The OGSA-DAI toolkit... which is described on the website as "[a] project [...] to develop middleware to assist with access and integration of data from separate sources via the grid." I will now characterize it as "distributed database processing using webservices" and proceed to make many, many apologies for the oversimplification.

When I first came to the this project I threw myself into building an environment and reading all the documentation of the above projects. Quickly finding that they were all open source and had carefully constructed areas for plugging in modules to extend functionality. They also sprouted off millions of ideas in my head for things to try, so I was at a complete loss as to where to start for a little bit.

Luckily, after some conversations with Ken and Dan, I have a few goals. The main goal right now is to get OGSA-DAI up and running to a point where I can pull up a jsp-page (or some other gui) with a google-esque text box and a run-button. When one enters a query and hits "run" it should run the query and report back the results. More specifically, OGSA-DAI should query its test database and go out to other OGSA-DAI services which will query their respective test databases and then all the results will get compiled into one big set and then that set should be displayed.

Here is a list of steps towards our main goal.

Finish setting up the OGSA-DAI code on the node I am working on (which includes finishing a test database and running some test and monitoring commands)
Install OGSA-DAI on other nodes in the grid, and set up similar test databases there.
Run some distributed query commands, and debug them.
Evaluate the currently coded JSP pages that come with OGSA-DAI for expandability and make a JSP front end for running a distributed query.

The other, more minor goal, is to see if it is possible to set up a GlobUS node (and eventually the CoG tools and the OGSA-DAI services) on an Ubuntu box. Right now, all this code has been set up on SuSE boxes. While SuSE is powerful and wonderful and Yast is really cool, it is my experience that Ubuntu is still more user-friendly. Thus, going "you can run [grid] on Ubuntu too" (or even "we tried to set [grid] up on Ubuntu and it spit fire and tacks before shutting down the system, so you should stick with SuSE") would be beneficial.

Right now I am waiting on an Ubuntu install from our hosts and when that is set up I shall start trying to set up a node and asking Dan (who has already set up so much and I am grateful for it) for help when I get stuck.

Otherwise, I am incredibly excited to be working on this project. I think there is a lot of cool stuff that can be done with grid technology, and we are also in a position to make a lot of contributions to the technologies involved.

Cheers,

Peter

Tuesday, April 1, 2008

Daily Lab / POC Activities

Extramural:

Installed dependencies for Subversion 1.4.6 on lab 1003
Worked on RLS configuration of Dallas and Tarrant
Peter currently researching OGSA-DAI

Wednesday, April 30, 2008

Monday, April 28, 2008

Friday, April 25, 2008

Thursday, April 24, 2008

Wednesday, April 23, 2008

Tuesday, April 22, 2008

Monday, April 21, 2008

Friday, April 18, 2008

Thursday, April 17, 2008

Wednesday, April 16, 2008

Tuesday, April 15, 2008

Monday, April 14, 2008

Friday, April 11, 2008

Thursday, April 10, 2008

Wednesday, April 9, 2008

Tuesday, April 8, 2008

Monday, April 7, 2008

Saturday, April 5, 2008

Friday, April 4, 2008

Thursday, April 3, 2008

Wednesday, April 2, 2008

Tuesday, April 1, 2008

What is our story?

PHGrid Wiki

Active Lab Projects

PHGrid Service Registry

Grid Source Code Repositories

Useful Web Resources

Documents

In the news / Publications

Contact Us

PHGrid Participants

Blog Archive

Search by Labels

Disclaimer