-
Notifications
You must be signed in to change notification settings - Fork 6
Add Djibouti local shapefile configuration and updates #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for removing the data files. Sorry for the confusion there. Please put back the configs for other countries that you removed from fbfmaproom-sample.yaml. Only make changes to the djibouti section for now. Let's keep the shapefiles alongside the data files, e.g. in /data/aaron/fbf-candidate/djibouti/shapes. We may have to deal with version updates, so let's use a subdir for each version, e.g. /data/aaron/fbf-candidate/djibouti/shapes/2025-08. If you have a version number that comes from the provider, you could use that instead of 2025-08. Then specify the shapefile directory in the config file as a relative path, relative to the data root, just as we have done for zarr files. Remove all country-specific information from |
…l country-specific information from fbfmaproom.py
I've updated it based on your feedback. I've also tested it on the server and it works. Let me know if you face any issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I would like to make some changes that I feel will simplify the code and make it more maintainable. I'm entering my comments here both as notes for myself, and to explain the changes to you. I'm not asking you to make these changes--I will do it.
In addition to the line-by-line comments I made below, I have this general comment: there are now two different ways to get shape data, and the knowledge about how to query each one is spread out throughout the program. I will centralize that knowledge in two classes, one for db and one for files, and have both of them implement the same interface, so that code that uses them doesn't need to know which kind of shape source it's using.
# Build path | ||
data_root = CONFIG["data_root"] | ||
shapes_version = config.get("shapes_version") | ||
shapefile_path = f"{data_root}/{country_key}/shapes/{shapes_version}/{shape_config['local_file']}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
country_key
isn't actually appropriate here, because we ended up having multiple country_key
s for a single country, e.g. djibouti
and djibouti-mam
, but there's only going to be one set of shapes per country. Let's handle shapefile paths the same way we handle zarr paths, i.e. the config file specifies a path that's relative to data_root
. (That will also eliminate the shapes_version
field.)
from shapely import wkb | ||
from shapely.geometry import Polygon, Point | ||
from shapely.geometry.multipoint import MultiPoint | ||
import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked into eliminating geopandas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked into it. Shapely doesn't speak shapefiles, so we do need to add a dependency.
If we're going to use geopandas for the shapefiles, I will also use it for the db shapes. Having two different interfaces depending on the source of the shapes is an unnecessary complication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
geopandas is an insanely expensive dependency to add. Adding geopandas to the project pulls in the following other packages that we weren't previously using:
ranca, brotli, charset-normalizer, contourpy, cycler, fiona, folium, fonttools, h2, hypack, hyperframe, idna, importlib-resources, joblib, kiwisolver, lcms2, libgfortran-ng (in addition to libgfortran and libgfortran5 that we already had), libspatialindex, mapclassify, matplotlib-base, munkres, networkx, openjpeg, pillow, pyproj (surprised we didn't have that), pysocks, qhull, requests, rtree, scikit-learn, scipy, threadpoolctl, unicodedata2, urllib3, xyzservices.
fiona is the library geopandas uses for reading shapefiles. All of fiona's dependencies are already present in our environment, so it's just one package to add. I'll try using fiona without the rest of geopandas.
'the_geom': gdf.geometry # Keep as shapely geometry objects | ||
}) | ||
|
||
return df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally try to avoid having multiple returns in the same function, because I find that when I edit a function that has multiple returns, I often change one and forget to change the other.
Suggestion:
try:
df = ...
except Exception as e:
...
df = ...
return df
# Try local shapefile first | ||
if 'local_file' in sc: | ||
try: | ||
return retrieve_shapes_local(country_key, mode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I would prefer not to have multiple returns.
if 'local_file' in sc: | ||
try: | ||
return retrieve_shapes_local(country_key, mode) | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would make most sense to remove the error-handling code from retrieve_shapes_local
, let it throw an exception if something goes wrong, and have a single try/except
in this function that handles exceptions for both cases. Less duplication, less risk of changing one and forgetting to change the other.
try: | ||
return retrieve_shapes_local(country_key, mode) | ||
except Exception as e: | ||
print(f"Local shapefile failed, trying database: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't agree with this error handling strategy. I think the config file should specify either shapefiles or db queries, but not both; and if it specifies shapefiles, and we get an error while trying to load those shapefiles, then it should fail, not fall back to something else.
if mode == "pixel": | ||
[[y0, x0], [y1, x1]] = json.loads(geom_key) | ||
shape = Polygon([(x0, y0), (x0, y1), (x1, y1), (x1, y0)]) | ||
return shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid multiple returns.
|
Superseded by #570 . |
Code for Local Shapefile Support.