Wednesday, April 8, 2009

AMDS Store for BioSense data

So part of the BioSense program strategic plan is to share national data sources in a federated manner. In order to support this using grid and AMDS, we've started working with the BioSense Data Quality team (they run the analysis on the Data Mart) to generate some sample files that we can use to populate an AMDS data store with all the aggregate counts of BioSense syndromes by zip code and by day.

So far, we've got a sample file for all the zip codes covered by VA, DoD and Real-Time data. Of course all the counts are redacted as we don't deal with real data in the lab. But this will let me start working on an ETL process to get this into the AMDS extract data base that Vaughn is developing against.

So far it's only about 5MB/day of data and that's before I normalize out a lot of the content (we won't actually store syndrome names in the tables, etc. etc.) so this is still a good sign.

Also, it looks like Tom has started calling AMDS the Population Data Object (PDO) some times so we may be relabling soon. This actually makes more sense as we're not talking about a data set so much as we are a specification for sharing population data (as opposed to patient specific data that is NHIN's currency).

1 comment:

Pema Rigdzin said...

You and Vaughn should take a look at how I designed the RODS 6 data model. One thing you'll notice is that the table names have no notion of the data to be stored in them i.e., you can add additional data types without adding new tables or columns.

We did the same thing with the spatial data types--add another layer without adding more tables. Spatial hierarchies are also supported without adding additional tables. The model has a complete API for interacting with the database and ETL tools that already have extractors for external database tables, comma delimited files. Help mw write an AMDS facade around LocalTimeSeriesCountsResource and we are done.