Wednesday, April 30, 2008
The first is that I keep getting cryptic OD errors and for some reason my setup is immune to logging... and the second is that I cannot try and debug using my favorite IDE (Eclipse) because it just seems to not want to run any OD queries from the workspace.
In both cases I get cryptic errors which seem like "Ogsa-dai-" followed by a timestamp.. which are thoroughly unhelpful.
I am guessing this all boils down to some sort of classpath error, libraries not being found somewhere... especially because all this code seems to work but only from one directory and only from the command line. If I try and run a modified class that worked perfectly from one directory in the servlet engine or eclipse... *bam* cryptic error.
I am going to have to sit with the Edinburgh guys for some insights methinks.
What services will be used on the grid?
We are currently using the following Globus services:
GridFTP - A file transfer protocol that provides secure, robust, fast and efficient transfer of data.
Grid Security Infrastructure OpenSSH (GSI-OpenSSH) - A Globus replacement for OpenSSH
Open Grid Services Architecture Data Access and Integration (OGSA-DAI) - A middleware product which supports the exposure of data resources, such as relational or XML databases, on to grids.
Reliable File Transfer Service (RFT) - A web service that provides “job scheduler"-like functionality for data movement. RFT should be used for extremely large data transfers.
Replica Location Service (RLS) - A registry that keeps track of where one or more copies, or replicas, of files exists on physical storage systems in a Grid environment.
Simple Certificate Authority (SimpleCA) - A package that provides a simplified certification authority for the purpose of issuing credentials to Globus Toolkit users and services. This package is often used for testing Globus installations.
Web Services Grid Resource and Allocation Management (WS GRAM) - A Unix server suite that enables users to submit, monitor, and cancel jobs on Grid computing resources.
Web Monitoring and Discovery System (WebMDS) - A web-based interface to WSRF resource property information that is available as a user-friendly front-end to the Index Service.
Data Replication Service (DRS) - A web service that combines RFT and RLS to provide a pull-based replication capability that ensures a specified set of files exists on a storage site.
Java Commodity Grid Kit (Java CoG kit) - A Jglobus library that provides a client-side API and limited server side functionality to the GT2-based services such as GRAM and MDS. It also provides a client-side API for GridFTP, MyProxy and has extensive GSI support.
MyProxy - An online credential repository used to eliminate the need for manually copying private key and certificate files between machines.
How will you demonstrate grid functionality at the PHIN conference? The method for demonstrating grid functionality is still being decided.
Will multiple grid user accounts be created for the PHIN conference? No, we will create a single demo account that will have controlled access to grid data.
Monday, April 28, 2008
I am also discussing the best way to discover OGSA-DAI resources on the grid. We were thinking of perhaps building a MonALISA extension to poll the OD on a given box and then sending that back to a repository, which would reduce the hits on a network since it would be passive polling.
Tomorrow, I hope to get to complete a query through a webpage... but I am anticipating a lot of classpath resolution errors will hold me up (as I have very rarely worked on JSP and Servlet projects where this didn't occur)
I was wondering if there was any support for using existing
authentication sources with the PHGRID. I am interested in using the
University of Washington's Kerberos and Shibboleth services to
authenticate against PHGRID services.
Yes, Globus can be configured to trust x.509 certificates issued by a 3rd party CA. This is accomplished by copying the 3rd party CA's certificate hash file and signing policy to the /etc/grid-security/certificates directory.
Example Hash Files:
Note: The certificate hash is located by running the following command:
$GLOBUS_LOCATION/bin/openssl x509 -hash -noout < ca_certificate_file
The certificate's distinguished name must match the pattern found in the signing policy file.
Friday, April 25, 2008
Configured Myproxy server on lab1002. Further testing needs to be done in order to determine the best practices for implementing security and scalability within Myproxy.
(I.E Accepted Credentials, Renewal Policy, Passprase Enforcement, Certificate Map, etc.)
The CaBIG developers are currently working on installing a CaBIG grid node on lab1004. I provided them with a history of the previous configuration efforts. The current developers ended the day by configuring /cacoresdk/conf/deploy.properties. Progress will resume on Monday.
Thursday, April 24, 2008
Some queries would be able to just crawl... namely things like aggregations (counts of an infection by zip code) where the set wouldn't get large just the counts would increase... but if you were doing joins on national or global data... it would probably need some sort of tiered architecture.
In my mind I see a MonALISAextension being built to monitor and manage OGSA-DAI instances... handling the delegation of what collects what data from 100 or so nodes and then propogates it up... Alastair pointed out that there were already plans for OD to manage itself in a assymetrical tree concept. It would probably be merged.. with MonALISA providing feedback about which nodes were reachable and their lag times... and OD then selecting to pull the queries through it's more efficient channels.
Otherwise, I have since started to focus on discovery concepts for my demonstrations, being able to use the APIs of globus and OGSA-DAI to figure out what data resources are available and at what locations.
Wednesday, April 23, 2008
I am also starting to consider the scalability and distributions of the grid when it starts to get Big. I think that OGSA-DAI will oversee most of the data pulls and merging, but only across about a hundred nodes which would then store their results in repositories. That way any polling algorithms will only have to pull from about a hundred "super" nodes instead of tens of thousands of regular nodes. The same would go for resource discovery.
Tuesday, April 22, 2008
After some research it appears that the new resources are being installed on the Globus container but not the tomcat container and I am testing with the tomcat container.
Thus, tomorrow morning will either be help in getting the new bits deployed to the tomcat container, or help debugging the Globus container connection.
Tomorrow I will probably start pulling from similar schemas on different computers and trying more complicated things like aggrigation. At the same time branching out to dynamically built workflows and more secure communications, assuming we get this bug out of the way quickly
Pre-Public Health Information Network (PHIN) conference
- Target a webinar for July / Aug for the NAPHIT membership to more concretely describe PH Research Grid activities to date, and possibilities for the future.
- Hold meeting to scope potential pilots, (e.g. for the security model/ administration).
- Conduct pilot project with NAPHIT membership.
Monday, April 21, 2008
I spent a lot of today reading through documentation about grids, and updating the OGSA-DAI use case document and Proof of Concept, and then reading through some of the client and developer documentation.
Tomorrow I hope to get into the nitty-gritty of local data sources, merging data, and maybe even secure transactions.
Friday, April 18, 2008
"As the first HealthGrid conference in the America's, this is an historic event," says Jonathan Silverstein, M.D., President, HealthGrid.US, and Associate Director, Computation Institute, Argonne/University of Chicago. "The program will appeal broadly to the interdisciplinary eScience and biomedical informatics communities, including physicians, medical educators, students, epidemiologists, biomedical informaticians, military medicine specialists, computer scientists, security and policy makers, economists, and futurists.”
The formal conference will kick-off on June 3, with welcoming remarks by Robert J. Zimmer, Ph.D., President of the University of Chicago. Following the welcoming ceremonies, Ian Foster, Ph.D., will provide a keynote presentation on "eScience meets Biomedical Informatics." Scientific papers will be complemented with a roundtable discussion from U.S., European, and Asian government leaders on "Government eScience and Cyberinfrastructure Programs for HealthGrid," moderated by Michael Cowan, M.D., Chief Medical Officer, Bearing Point, and former U.S. Navy Surgeon General. Cowan will precede the government roundtable discussion with a keynote presentation on "The Role of Government in the Future Knowledge Society."
The HealthGrid conference is the premier conference on the transformation of biomedical research, education and medical care through the application of Grid technologies. HealthGrid is dedicated to enhancing biomedical research and healthcare delivery, creating an open collaborative virtual community, and communicating the collective knowledge of the HealthGrid. Scheduling and other conference details can be found at http://chicago2008.healthgrid.org/.
- Updated the VDT installation document to reflect the updated certificate authority information
- Created a MonALISA installation document
- Installed MonALISA on Lab 1001 based on document content
- Issued user and host certificates for Washington and Johns-Hopkins
- Started researching Exchange Network nodes.
The discussion centered on the background of the PH Research Grid, the public health and business drivers for the CDC for exploring federated architectures, and the research approach the CDC is sponsoring in standing up nodes, and exploring the technology stack. The NAPHIT group was very intrigued by the possibilities, and is eager to help in defining the administration and security models the grid would entail.
The next steps may include a webinar to the NAPHIT membership, and a definition of prospective projects to collaborate on in the future. More to come....
Thursday, April 17, 2008
I found the configuration file that was improperly set up with their help and now local OD calls are working on all three nodes. Furthermore, I can ping nodes on other computers and access their resources... so I am now ready to start writing clients.
Alastair gave the sage advice that I should probably work on building clients before I start focusing heavily on security... because security is pretty much changing the calls and the hosts and adds a whole new level of complexity and I should get a better feel for how the clients operate when I am not worrying about everything being validated on the grid.
Thus, the next step is to pore over the included source examples to see how various things are called and then to start writing my own little clients that put various activities in the pipeline.
Otherwise, I am very excited and look forward to making some neat little clients and some snappy JSPs to show off the various functionality of OD.
Way before clouds were popular (remember then?) my colleagues Kate Keahey and Tim Freeman started work on their workspace service, a system for on-demand creation and management of virtual machines on remote computing systems. They now have an implementation that interfaces both to clusters running conventional schedulers and to Amazon EC2. It's distributed as part of the Globus software, or you can download it separately. Kate and Tim have recently established a deployment of the workspace service in the Computation Institute at U.Chicago and Argonne. With a nod to the new cloud meme, they've named it Nimbus. They say:
The University of Chicago Science Cloud, codenamed "Nimbus", is a web service that delivers compute capacity in the cloud for scientific communities. The Nimbus' simple client allows you to obtain customized compute nodes (that we call "workspaces") that you have full control over quickly, easily, and in ways that can be fully automated. Using the Nimbus cloud you can request the exact compute capability you currently need for your application and scale it up or down as your needs dictate.
Nimbus provides compute capability in the form of Xen virtual machines (VMs) that are deployed on physical nodes of the University of Chicago TeraPort cluster using the workspace service. We currently make 16 nodes of the TeraPort cluster available for cloud computing. Nimbus is available for members of scientific community wanting to run in the cloud. To obtain access you will need to provide a justification
(a few sentences explaining your science project) and a valid grid credential (If you don't have a credential, email us. We can help). Based on the project, you will be given an allocation on the cloud. Send your requests, demands and cries of anguish to firstname.lastname@example.org (for cries of anguish mp3 format is acceptable).
In a typical session you will make a request to deploy a workspace based on a specified VM image. You can either use one of the VM images already available on the cloud (we provide a command that allows you to see what's already there) or upload your own VM image. On deployment, the image will be configured with an ssh public key you provide -- in this way once the workspace is deployed, you will be able to ssh into it and configure it further, upload data, or run your applications.
Wednesday, April 16, 2008
- Built and tested the new NCPHI Certificate Authority
- Testing of Globus 4.0.7 halted due to MySQL errors thrown during the container installation. (Missing file libssl.so.4) This is part of the Openssl package, but Globus is unable to locate the file. A possible fix is creating symbolic links to the file's current location. I will resume testing tomorrow.
- Updated VDT installation document based on the new NCPHI CA.
- An updated Certificate Authority file and new install instructions has been distributed to the new grid sites.
- The NCPHI Certificate Authority has been configured as a Trusted_Authority on Dallas and Tarrant.
Exception in thread "main" uk.org.ogsadai.client.toolkit.exception.RequestCompletedWithErrorException: [1208374501596:4] uk.org.ogsadai.client.toolkit.REQUEST_COMPLETED_WITH_ERROR : ogsadai-1195888958d
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.checkSynchronousExecutionIsComplete(Unknown Source)
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source)
at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source)
It is the same error I was getting with the second SuSE box that I gave up on... let me tell you what this error does not tell me:
* I do not know what the error really was. Because I don't know what error "ogsadai-1195888958d" is and there is no documentation for it.
* This error does not tell me what broke.
* I am apparently incapable of turning on the logging so that better errors are thrown closer to the problem source.
After narrowing down that other things work (The DRER is accessible via tomcat, MySQL is up and running, the logins.txt file has all the appropriate information, the jdbc file I used worked during the data creation, and the MySQLDataResource file looks to be correct) I have no idea what is broken. I mean maybe something was written wrong in the MYSQL resource, but I thought I fixed it and restarted tomcat and it should work and it doesn't.
I think I am going to have to get with the Edinburgh folks early tomorrow, start up a webex and just have them look at the boxes to see what was mis-configured or allow them to try setting up a resource and get the logging working... because I am running out of ideas that aren't just pure churn. I will also talk to them about making a more robust resource making client that will test resources being created for validity and connectivity before being deployed.
Unlike the OpenMRS datasets, the chief complaint/reason for visit field is not coded (i.e., uses a vocabulary) and is free text. This is the way that most emergency department surveillance systems collect their chief complaint data (i.e., non-coded free text). This presents interesting challenges for federation.
At RODS, we've solved the free text chief complaint challenge before using text indexing and preclassification of chief complaints into syndrome categories. We'll need to use similar solutions but this time under a federated api.
Tuesday, April 15, 2008
Thus, today, I went ahead and tried some of the other example clients on that server and actually got the secure communications set up with the help of Dan and Alastair.
The other big news is that we got the Ubuntu box installed yesterday, so today we made headway in getting a VDT install of Globus on Ubuntu, it is almost up and running, now we are just dealing with all of the certificate and authentication stuff, then it will be setting up tomcat, deploying OGSA-DAI, and probably setting up a little mysql database.
As for the "second computer" I am going to try and get MySQL running on it too.. then we can start trying to code OGSA-DAI proofs of concepts that pull from a PostgreSQL database and two MYSQL databases on three different computers.
Monday, April 14, 2008
- Installed Globus 4.0.7 on lab 1003
- Updated VDT installation document based on 1.10.0 changes
- Contacted Laura about HealthGrid CA details
- When installing VDT 1.10.0 Globus-WS(Web Services Container) the following error is returned:
Downloading [vdt_globus_rft_server-VDT1.10.0-x86_rhas_3.tar.gz] from [http://vdt.cs.wisc.edu/software//globus/4.0.7_VDT-1.10.0]...
/opt/vdt/mysql/bin/mysqladmin -S /opt/vdt/vdt-app-data/mysql/var/mysql.sock -u root create rft_database
Exited with value 127
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-WS] not [installed]:
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-WS-Server] not [installed]:
Package [/opt/vdt:http://vdt.cs.wisc.edu/vdt_1100_cache:Globus-Base-RFT-Server] not [installed]:
Shell command [. /opt/vdt/vdt-questions.sh; . /opt/vdt/globus/etc/globus-user-env.sh; vdt/setup/configure_rft] returns with an error code.
- Currently researching the error.
Friday, April 11, 2008
Unfortunately, it's not working either, and I am getting the same error.
Also, I have an apppointment Monday morning and I anticipate I will not be able to get in before Alastair sets off for home... so I imagine any debugging will probably have to be put off until Tuesday.
Thus, I will spend the rest of today and probably a significant portion of Monday reading up on documentation and javadoc and updating the PHGrid - OGSA-DAI document so we can better map the application.
Dan has updated the spreadsheet and is still researching CA.
Thursday, April 10, 2008
ava SQLClientException in thread "main" uk.org.ogsadai.client.toolkit.exception.ResourceUnknownException: The resource null is unknown. at uk.org.ogsadai.client.toolkit.presentation.gt.GTDataRequestExecutionResource.mapExcpetion(Unknown Source) at uk.org.ogsadai.client.toolkit.presentation.gt.GTDataRequestExecutionResource.executeRequest(Unknown Source) at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source) at uk.org.ogsadai.client.toolkit.resource.BaseDataRequestExecutionResource.execute(Unknown Source) at SQLClient.executePipeline(SQLClient.java:167) at SQLClient.main(SQLClient.java:290)
That error doesn't help me right now, I am not familiar enough with how OD does things to be able to go "oh, it's failing looking up this"... and I spent most of the day before that trying to just get the heartbeat jsp to stop throwing errors... and all the "unknown source" means I cant delve further into code to see what is being cantankerous.
I have tried double checking the resources and the logins.txt file (which was the problem I was having with the first installation) and I am out of ideas, so I am going home and I won't be able to catch the guys in Edinburgh to help troubleshoot because I won't be able to get in early in the morning. Thus, I will probably spend tomorrow researching and documenting how two PHGrid nodes running OD will talk to each other in the hopes of eventually having two of them functioning properly.
Wednesday, April 9, 2008
The focus is to establish a project for OGSA-DAI on Grid nodes, and I envision that to be a polling framework (so that other programs can select a series of nodes, say "get me data from databases connected to those nodes, merge them (via join or aggregation or union) and send me the results").
More simply, and much sooner, I am documenting and hope to establish a Proof of Concept that will just get data from two test databases on two nodes merged and displayed. The issuing of commands and data transfer needs to be done through Globus (so that the grid security paradigm is maintained). If I can sort out the commands and the workflow for that, it will be the perfect "Hello World" application that shows the essential power of OGSA-DAI and Grid.
Then I can start thinking about how to handle the sort of dynamic commands that will likely need to be built on the fly, and how to handle data resource discovery across different nodes on the grid. And before you know it, grid nodes start getting OGSA-DAI and DBA's at the medical centers start setting up views for data polling and we are able to go "this is the count of various [infection] in [region]"
Using a SimpleCA:
A SimpleCA can be created by running the following command:
This command will generate the file, globus_simple_ca_hash_setup-0.19.tar.gz in the ~/.globus/simpleCA directory. This file needs to be distributed to each grid node that will be using the new SimpleCA. Each node will need to run the following commands in order to recognize the new SimpleCA:
- $GLOBUS_LOCATION/sbin/gpt-build globus_simple_ca_hash_setup-0.19.tar.gz
You may request host and user certificates from the new SimpleCA after running the above commands.
Certificate - A public key and information about the certificate owner bound together by the
digital signature of a CA. In the case of a CA certificate the certificate is self signed,
i.e. it was signed using its own private key.
Certificate Authority - An entity that issues certificates.
Host certificate - A certificate belonging to a host. (I.E, grid node) Host certificates are typically stored in the /etc/grid-security/hostcert.pem file.
SimpleCA - Simple Certificate Authority
Trusted CA – CA trusted by the grid node. Trusted CA's are found in the /etc/grid-security/certificates directory.
User certificate – A certificate belonging to a user. (I.E, Globus, Bubba, Jenny, Forest) User certificates are typically stored in the $HOME/.globus/usercert.pem file.
Tuesday, April 8, 2008
- Was unable to copy VM to Jeremy's portable drive due to the lab security policy. Chris informed us that we have to get approvals from Toby before any data can be taken from the lab.
- Worked on configuring internal grid for OGSA-DAI testing. Currently dealing with clock-skew errors between 1001 and 1003. Configuring NTP does not seem to help in this case. I need to research it further.
- A port scan performed by Chris revealed an open VNC port of lab 1002. To correct the violation I shutdown vnc and uninstalled:
- It appears VNC was part of the default O/S installation. It has been removed and lab1002 is no longer listening on that port.
Monday, April 7, 2008
I have since spend the rest of the day getting another grid node set up locally so that I could start pursuing clients that access databases on other grid nodes. The "Hello World" of this app would be the ability to enter a query and have the results of two different grid nodes data resources concatenated (or processed in some other way that otherwise merges them) and returned.
I feel that the Killer App version that would result from such studies would be something similar to globus-url-copy: A command line tool (or web page, or java program) that would take a query, a series of nodes, and then run the queries on the data resources of those given nodes.
Right now I am in that phase where I have seen several possibilities but not gauged the limitations... and I am not quite sure what all the commands are or what they do or how far they can span. I am also not quite sure how GlobUS fits into the OGSA-DAI toolkit and what would be the best strategy for minimizing the amount of work that would need to be done by a grid-node installer to get a medical centers information as accessible, yet secure, as possible.
Saturday, April 5, 2008
Friday, April 4, 2008
- Configured a SimpleCA on lab 1001 to test OGSA-DAI installation. We are testing OGSA-DAI on an internal grid before we do a roll out on PHGRID.
- Analyzed hack attacks on lab servers and plugged security holes:
IP address: 184.108.40.206
Host server: 203-144-221-26.static.asianet.co.th
ISP/organization: True Internet Co., Ltd.
ISP/organization address: Internet Service Provider, Bangkok, Thailand.
Geographical location: Thailand
Phone: +662 6411800
Fax: +662 6421557
Attempted compromise the server using a dictionary hack on common system accounts and common user names. This attack was attempted hundreds of time by this hacker. The hacker was clearly using a to generate so many attacks in a short amount of time. Below is an excerpt of the attack.
Feb 28 15:53:29 gump sshd: Invalid user admin from 220.127.116.11
Feb 28 15:53:32 gump sshd: Invalid user guest from 18.104.22.168
Feb 28 15:53:35 gump sshd: Invalid user master from 22.214.171.124
Feb 28 15:53:56 gump sshd: Invalid user admin from 126.96.36.199
Feb 28 15:53:58 gump sshd: Invalid user admin from 188.8.131.52
Feb 28 15:54:01 gump sshd: Invalid user admin from 184.108.40.206
Feb 28 15:54:04 gump sshd: Invalid user admin from 220.127.116.11
Feb 28 15:54:19 gump sshd: Invalid user webmaster from 18.104.22.168
Feb 28 15:54:22 gump sshd: Invalid user username from 22.214.171.124
Feb 28 15:54:25 gump sshd: Invalid user user from 126.96.36.199
Feb 28 15:54:30 gump sshd: Invalid user admin from 188.8.131.52
Feb 28 15:54:44 gump sshd: Invalid user danny from 184.108.40.206
Feb 28 15:54:47 gump sshd: Invalid user alex from 220.127.116.11
Feb 28 15:54:50 gump sshd: Invalid user brett from 18.104.22.168
IP address: 22.214.171.124
Host server: 202-63-185-230.static.exatt.net
ISP/organization: Exatt Technologies Pvt. Ltd.
ISP/organization address: 510 Akruti Arcade,, Opp Wadia School,, J. P. Road., Andheri (W), Mumbai, Maharashtra, India., Internet Service Provider
Geographical location: India in
Name: IP-Admin NOC
Spoofing while attempting to compromise the server using a dictionary hack on common system accounts and common user names.
- POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:43 gump sshd: Invalid user sara from 126.96.36.199
Apr 2 11:02:43 gump sshd: Address 188.8.131.52 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:53 gump sshd: Address 184.108.40.206 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:55 gump sshd: Address 220.127.116.11 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:02:57 gump sshd: Address 18.104.22.168 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:00 gump sshd: Invalid user ftpuser from 22.214.171.124
Apr 2 11:03:00 gump sshd: Address 126.96.36.199 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:02 gump sshd: Invalid user uid from 188.8.131.52
Apr 2 11:03:02 gump sshd: Address 184.108.40.206 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:04 gump sshd: Invalid user gid from 220.127.116.11
Apr 2 11:03:04 gump sshd: Address 18.104.22.168 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
Apr 2 11:03:06 gump sshd: Invalid user shell from 22.214.171.124
Apr 2 11:03:06 gump sshd: Address 126.96.36.199 maps to 202-63-185-230.static.exatt.net, but this does not map back to the address - POSSIBLE BREAKIN ATTEMPT!
IP address: 188.8.131.52
Host server: 184.108.40.206
ISP/organization: China Mobile Communications Corporation
ISP/organization address: Mobile Communications Network Operator in China, Internet Service Provider in China
Geographical location: China cn
Name: Jinxia Sun
Attempted to breach the server via SSH, but the server has been modified to reject unauthorized users. The hacker tried to breach the server twice and moved on.
Apr 4 06:26:41 gump sshd: refused connect from ::ffff:220.127.116.11 (::ffff:18.104.22.168)
Apr 4 06:32:33 gump sshd: refused connect from ::ffff:22.214.171.124 (::ffff:126.96.36.199)
Today I spent the better part reading through the documentation configuration guide, and then reading through the client configuration guide. The end result is I had a small client java program written mostly for me, with a few modifications by me to adjust hostnames and the like.
The program is designed to send a small query out to the OGSA-DAI attached database, and then transform the results and perform a task that would enhance efficiency. The program as written in the documentation assumes a MySQL-based data resourse, so I adjusted the program to use the PostgreSQL-based data resource I had deployed earlier that morning.
When I run it, I get an error. I don't know what is causing the error since it seems to happen in another thread and the cause is cited as "unknown source". I am guessing I either borked the deploy to tomcat/globus, or I am heading to the wrong URL in the program. It's strange because the dai-manager.jsp is telling me the services are there... but when I try to access their WSDL's I get AXIS errors.
At this point I am at a loss for ideas, so I will reach out to some of the OGSA-DAI contacts about some help checking my configuration and start to request more information about how they set up the distributed query application seen running through the link Ken posted yesterday.
Thursday, April 3, 2008
Tomorrow I expect to read through the documentation for creating clients that use OGSA-DAI, and then I will hopefully run a few across the two machines. I also think there will be some deployment of the OGSA-DAI code to other globus nodes, perhaps some of them on the external phgrid networks.
After that, accessing the client via a JSP and then that goal will have been complete. I imagine the next steps will involve using different databases at different globus nodes and seeing how well the data integrates and showing that we can do it.
The site IT team still hasn't installed ubuntu on my dev box, so that is on hold.
- Updated Spreadsheet
- Installed and configured prerequisite OGSA-DAI software on lab 1003 and 1001 (I.E. Postgresql, Tomcat, Ant, Java)
- Installed Globus Web Services on lab 1001
- Contacted grid sites for node installation. See spreadsheet for details.
Select postal_code, count(patients) from global_db where diagnosis = 'X' order by count
is a federation of two databases. In this case, the databases are the split version of the OpenMRS demo data with dummy post codes.
The demo is accessed through JSP pages which will return the results as an HTML page. The JSP page is based around an OGSA-DAI client program. This is to allow users with OGSA-DAI client libraries to access the demo at this stage.
The basic version of the example query operates without any optimisation to manage the amount of data transfered by the database resources and OGSA-DAI. The optimised version uses optimisations on the OGSA-DAI query plan to push operations to the databases to reduce the amount of data transfer needed but still retain the ability to do federated queries on multiple resources using OGSA-DAI and will allow different database schemas to be used. The demo is setup as JSP pages,
This are available for use now. If the URLs do not work, it is possible the OGSA-DAI instance is down for updating or maintenance. An email will be sent when the URLs are removed from use permenantly.
If possible, could a more realistic data set be made available which can be integrated into the demo and used for further development.
Wednesday, April 2, 2008
I have been around for about a week and a half now and have spent my time familiarizing myself with all the different facets of
- The GlobUS grid toolkit (which contains all the bits and peices to communicate between grid nodes)
- The MonALISA distributed monitoring toolkit, which basically gives you many interfaces and ways to monitor the health/capabilities of various nodes on a grid.
- The Java GlobUS CoG toolkit (which is a series of Java GUI front ends and tools for the GlobUS library of commands... there is also a CoG for python.)
- The OGSA-DAI toolkit... which is described on the website as "[a] project [...] to develop middleware to assist with access and integration of data from separate sources via the grid." I will now characterize it as "distributed database processing using webservices" and proceed to make many, many apologies for the oversimplification.
When I first came to the this project I threw myself into building an environment and reading all the documentation of the above projects. Quickly finding that they were all open source and had carefully constructed areas for plugging in modules to extend functionality. They also sprouted off millions of ideas in my head for things to try, so I was at a complete loss as to where to start for a little bit.
Luckily, after some conversations with Ken and Dan, I have a few goals. The main goal right now is to get OGSA-DAI up and running to a point where I can pull up a jsp-page (or some other gui) with a google-esque text box and a run-button. When one enters a query and hits "run" it should run the query and report back the results. More specifically, OGSA-DAI should query its test database and go out to other OGSA-DAI services which will query their respective test databases and then all the results will get compiled into one big set and then that set should be displayed.
Here is a list of steps towards our main goal.
- Finish setting up the OGSA-DAI code on the node I am working on (which includes finishing a test database and running some test and monitoring commands)
- Install OGSA-DAI on other nodes in the grid, and set up similar test databases there.
- Run some distributed query commands, and debug them.
- Evaluate the currently coded JSP pages that come with OGSA-DAI for expandability and make a JSP front end for running a distributed query.
The other, more minor goal, is to see if it is possible to set up a GlobUS node (and eventually the CoG tools and the OGSA-DAI services) on an Ubuntu box. Right now, all this code has been set up on SuSE boxes. While SuSE is powerful and wonderful and Yast is really cool, it is my experience that Ubuntu is still more user-friendly. Thus, going "you can run [grid] on Ubuntu too" (or even "we tried to set [grid] up on Ubuntu and it spit fire and tacks before shutting down the system, so you should stick with SuSE") would be beneficial.
Right now I am waiting on an Ubuntu install from our hosts and when that is set up I shall start trying to set up a node and asking Dan (who has already set up so much and I am grateful for it) for help when I get stuck.
Otherwise, I am incredibly excited to be working on this project. I think there is a lot of cool stuff that can be done with grid technology, and we are also in a position to make a lot of contributions to the technologies involved.