Monday, January 12, 2009

Oh look, Zip3s have multiple polygons in them.

I am beginning to think there is a new adage: New Polygon, new thing discovered about your system that doesn't support the new polygon.

So, ZIP3's do not form one big contiguous block of area in all cases. In some cases, they will form multiple polygons because some zipcode changed from 61299 to 61304 or whatever.

But, the polygon system I had set up was expecting only one polygon per region. Thus, I will need to do a bit of an overhaul (it actually won't be too bad, yay object oriented coding) but will need to massage a new list of zip3's (and probably zip5s and states) out of the KML files.

Then, hopefully, I'll have entire states full of zip3s, and then be able to select zip5s within zip3s, and then comes the PoiConDai refurb.


Jim Tobias said...

Hi Peter,
Zipcodes are truly awful units.
The worst part is that they may change at any time (not stable) and are not really population units but rather are postal units.
The zipcodes are based upon mail routes and not specifically based upon population units such as Census tracts. Census tracts have their own problems but are at least stable for 1 decade and do conform to county boundaries in a heirarchical manner.
Another issue with Zip 3:
All of Manhattan is contained within 1 polygonal zip3. So, if I know that there is a terrible case of disease within the zip3 of Manhattan....does this really help public health ?
Suppose that I have one smallpox case within all of NYC....
At least zip5 gives a little better resolution; It seems like zip3 is almost completely useless as a unit.

Jim Tobias

Brian Alexander Lee said...

Jim, I think that everyone will agree enthusiastically with your comments. That being said, given that the only levels of geography that our biosurveillance systems have is zip5, the next higher resolution is zip3.

When Edmund Hillary was asked why he climbed Mt. Everest he responded "Because it's there." That's kind of why we are drawing zip3s. Not because we think they have epidemiological value, but because they are one of the few geographic boundaries that we can draw based on how the data is collected and stored.