-
Notifications
You must be signed in to change notification settings - Fork 120
Add Tests for Transit Matching Logic and Rule Engine #1039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f467042
to
6128094
Compare
The match_stops module implements a file-based caching system to avoid making redundant API calls to the Overpass service. Here's how it works: Caching ImplementationResponses from the Overpass API are stored as CSV files When get_stops_near() is called, it first computes a hash of the query parameters
The test verifies the caching system works by:
This confirms that the second call used the cached file instead of making another API call |
TestMatchStopsWithSavedData - using testoverpass data and stores it |
TestPipelineIntegration is a debug test; not intended for final pr -- for debugging purposes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The caching changes appear to be the right idea but there's a lot of extra changes. Please remove all unneeded changes
This PR should include:
- change eacimp to eacimr in
common.py
- the changes in
match_stops.py
to use a local filesystem cache for the overpass queries when using the public overpass API (ie when not in production)- If you run all tests locally, it should perform the API calls and create a bunch of cache files. If you run all tests a second time, it should use the cache and not perform any API calls. If that is working, commit the cache files so that when the GH workflows run all the tests, they do not call the API repeatedly
- a test file
emission/individual_tests/TestMatchStops.py
that ensures that the filesystem caching is working- you could even just include this as additional test in the existing file
TestOverpass.py
rather than making a new file
- you could even just include this as additional test in the existing file
- a test file
emission/tests/analysisTests/modeInferTests/TestRuleEngine.py
that ensures that the MODE_INFERENCE stage gives valid/consistent results, at least at a basic level
The rest of the changes are unnecessary
TODO: Update 13 tests that display error due to switching to rule engine:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great improvements! I would suggest one or two adjustments in match_stops.py
to make it easier for future maintainers to understand
-
At the top of the module, we already check if
GEOFABRIK_OVERPASS_KEY
is configured so we can just do our "is production" check there.
I would just defineOVERPASS_CACHE_DIR
conditionally; make the path when using public Overpass API (i.e. when not in production), else let it beNone
OVERPASS_CACHE_DIR = None try: GEOFABRIK_OVERPASS_KEY = os.environ.get("GEOFABRIK_OVERPASS_KEY") url = 'https://overpass.geofabrik.de/' + GEOFABRIK_OVERPASS_KEY + '/' print("overpass configured") except: print("overpass not configured, falling back to public overpass api") url = "https://lz4.overpass-api.de/" # Enable cache when using the public API OVERPASS_CACHE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), ".overpass_cache") os.makedirs(OVERPASS_CACHE_DIR, exist_ok=True)
-
Avoid having too much complexity in
make_request_and_catch
because it now handles R/W from cache on top of making the requests and handling errorsI recommend making a wrapper around it (
query_overpass
)This way,
make_request_and_catch
actually stays exactly the same as it was before and all the caching business can be handled inquery_overpass
.
If we are in production and don't have to worry about caching, we can just return earlydef query_overpass(overpass_query): if OVERPASS_CACHE_DIR is None: return make_request_and_catch(overpass_query) # Create a unique filename based on the query hash query_hash = hashlib.md5(overpass_query.encode()).hexdigest() cache_file = os.path.join(OVERPASS_CACHE_DIR, f"{query_hash}.json") # If the cached response exists, use it if os.path.exists(cache_file): logging.info(f"Using cached response from {cache_file}") with open(cache_file, 'r') as f: all_results = json.load(f) return all_results # Else, make the request and cache the response before returning all_results = make_request_and_catch(overpass_query) with open(cache_file, 'w') as f: json.dump(all_results, f) logging.info(f"Cached API response to {cache_file}") return all_results
The test files themselves look pretty good to me
changes to existing tests that failed recently due to migration to rule engine:
The tests were failing because they were comparing different string representations of the same mode types. For example, a mode might be represented as PredictedModeTypes.BICYCLING in one place but MotionTypes.BICYCLING in another, or SUBWAY vs. IN_VEHICLE.
The rule engine now populates fields like inferred_section_summary and ble_sensed_summary with actual values (e.g., distance and duration for modes like "WALKING" or "UNKNOWN"), while the test expectations had empty dictionaries.
The original code expected at most one inferred section per cleaned section, but the pipeline reset tests encountered multiple entries, causing assertion failures. Now, when multiple entries are found, we sort them by timestamp and take the most recent one, logging a warning instead of failing. thoughts on these changes @JGreenlee ? -- it is quite a 'patchy' fix |
Well it seems I ALMOST have it; 1 test remains:
Will be fixing that now |
Test Overpass changesImproved the reliability of the TestOverpass tests, especially for GitHub Actions workflows as @JGreenlee suggested
This ensures that misconfigured GitHub secrets won't go unnoticed
TestPipelineReset.py and TestPipelineRealData FixesReverted previous checks after regenerating ground truths, fixed minor bugs testResetToPastWithCrash:
This prevents KeyError exceptions when the key doesn't exist testNormalizeWithACursor:
|
TODO: Extraneous whitespace changes, commit churn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of silently skipping tests when GEOFABRIK_OVERPASS_KEY is not set, we now log clear warnings and still test the public API
This ensures that misconfigured GitHub secrets won't go unnoticed
That doesn't really solve our problem because we don't routinely check the GH Actions logs; we will only do that when a workflow is failing.
In the scenario I described where the GH Secret is misconfigured, the workflow would use the public API, so it would still pass, and we would have no idea anything was wrong
I gave it some thought and I think the only clean way to do this is to split it up into separate test files.
Recall that the original TestOverpass
failed when run locally. That was on purpose so that if GEOFABRIK_OVERPASS_KEY
is misconfigured in GH actions, we will notice immediately.
Since we expect it to fail locally, it only runs during the test-overpass
workflow. It does not run during runAllTests.sh
.
The new tests you added, which test the caching in match_stops
, should be run as part of runAllTests.sh
. So I think you should create another test file, perhaps called TestMatchStops
, and add your new tests there.
You can then expect that GEOFABRIK_OVERPASS_KEY
will not be defined in TestMatchStops
and it will be defined in TestOverpass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestMatchStops
should only include the new tests you added, not the tests that were already in TestOverpass
(you have duplicated test_get_stops_near
and test_get_predicted_transit_mode
)
Also, it should be located somewhere in emission/tests/
so that it gets picked up by runAllTests.sh
.
To match the existing structure I'd recommend emission/tests/netTests/extServiceTests/TestMatchStops.py
1d14db9
to
e5e25d5
Compare
Current test fails because of changes to be made in another PR:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments, and I'm sending you shankari_2023-07-18_xmastrain
separately on Teams
I tried to upload it here, but it's ~50MB, which is a lot bigger than all the other days we have. There must be a lot of travel during that day
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try to get tests to pass without these patches?
If yes and you couldn't get it to work, please explain what you tried, how the patches fix it and why you think the patches are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests I added (in rule engine) are specifically designed to verify the functionality implemented in these patches. They weren't written to pass without the patches - they're intended to validate that the patches correctly implement the required behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
patches are removed now that we have the xmas train file; changed the whole test case entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to the patches in TestPipelineReset
@@ -42,6 +42,7 @@ | |||
class TestPipelineReset(unittest.TestCase): | |||
def setUp(self): | |||
np.random.seed(61297777) | |||
etc.set_analysis_config("analysis.result.section.key", "analysis/cleaned_section") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, without this, it would fail like:
Traceback (most recent call last):
File "/Users/aatrox/e-mission-server/emission/tests/pipelineTests/TestPipelineReset.py", line 295, in testResetToTsInMiddleOfTrip
self.compare_result(ad.AttrDict({'result': api_result}).result,
File "/Users/aatrox/e-mission-server/emission/tests/pipelineTests/TestPipelineReset.py", line 117, in compare_result
self.assertEqual(rs.features[0].properties.sensed_mode, es.features[0].properties.sensed_mode)
AssertionError: 'PredictedModeTypes.BICYCLING' != 'MotionTypes.BICYCLING'
- PredictedModeTypes.BICYCLING
+ MotionTypes.BICYCLING
see #1039 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you regenerate the ground truth? What was analysis.result.section.key
set to?
If it was set to "analysis/cleaned_section"
, the ground truth should be expecting MotionTypes.*
If it was set to "analysis/inferred_section"
, the ground truth should expecting PredictedMotionTypes.*
see #1039 (comment)
In that comment, I was giving you 2 options to try; I didn't mean you should do both. Sorry if I was unclear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's set to "analysis/cleaned_section", the ground truth files should contain sensed_mode values with "MotionTypes." format, rather than "PredictedModeTypes." (which it does -- see shankari_2016-07-25 and shankari_2016-07-27 ground truth files)
I regenerated via save_ground_truth script in the bin directory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for analysis.result.section.key
is "analysis/inferred_section"
When you remove this line:
etc.set_analysis_config("analysis.result.section.key", "analysis/cleaned_section")
it uses the default value and we start getting PredictedModeTypes.*
So I think PredictedModeTypes.*
is the expected result and what should be in the ground truth files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought for TestPipelineReset, we ought to set the analysis config to cleaned_section rather than default inferred section per the previous comment; I can regenerate the ground truths just had some confusion in regards to the back and forth between the config setting
If we omit ret_list asserts in section_queries.py:
we get:
which passes the test. Question is, what is ret_list |
50f3a84
to
f10818e
Compare
This is the current output:
|
My theories:
|
Hmm or is it because theres 2 predictions? In rule_engine
|
From my understanding, the rule engine creates TWO objects for each section:
Both of these share the same section ID that gets used in query parameters The _get_inference_entry_for_section function in section_queries.py wasn't updated to handle the case where multiple entries exist for a section With that said I can think of 3 possible solutions:
thoughts on this @JGreenlee |
9ff711d
to
b30d827
Compare
b30d827
to
06b3332
Compare
- Created TestMatchStops.py with full test coverage for transit stop matching functionality - Updated match_stops.py to incorporate caching of overpass api results - Implemented TestRuleEngine.py with test cases for mode inference rules. Originally there was no test coverage. This PR seeks to rectify that. See: e-mission#1026 (comment). - Added additional cases based on 'bad' labels, see e-mission/e-mission-docs#1124 (comment) - Regenerated ground truths now that we are using rule engine. For Transit Matching Logic Tests: - Test Overpass already tests the get_stops_near and predicted_transit_modes. TestMatchStops focuses on the caching mechanism to validate it works. For RuleEngine Tests: - Seeks to test several mode predictions such as walking, cycling, driving, etc based on different factors. - Cases include empty sections, AirOrHSR, Motorized, Unknown, Combination - Added test based on prefixed like 'XMAS:Train' added shankari xmas real data to test behavior on prefix modes like XMAS:Train
06b3332
to
8e4fac2
Compare
@TeachMeTW @JGreenlee I don't see why the code fix should be in the same PR as the tests, given that this is an expansion of the tests and is not focused on the new functionality in the code. I'm going to include the code change in a separate PR, where I will be fixing some corner cases in #1051 Please pull after that is merged and resolve the conflict. |
a5f7da1
to
8e4fac2
Compare
I decided to merge #1051 without this commit because I am not sure whether this actually creates an inferred section (with an UNKNOWN mode) or it skips the section. We should create an inferred section with an UNKNOWN mode. Please verify that this successfully does that before I add that change |
I modified a test (not pushed yet):
I believe this confirms that:
The rule engine now correctly handles errors by:
|
I modified rule_engine by making it UNKNOWN:
Should be now in #1058 Also added the updated test now in #1057
Transit Mode Prediction Mocking: mock_get_predicted_transit_mode uses the side_effects list to:
This test:
|
#1058 has one commit with 141 files. I have not yet reviewed the design of the cache or the tests, and I don't have the time to do so now. I don't want to merge random code or files that I haven't reviewed. Please pull out only the change in In that PR, please indicate the "Testing done" with a manual run of the pipeline instead of a new test case, since I don't have the time to review the philosophy of your new test cases right now |
Moved to #1059. Tried testing with manual pipeline runs but have been running into trouble all night (unrelated to these changes as my master branch seems to not work as intended) I will follow up on this debugging process later today, but I do not think its related to my changes. |
As it turns out this pipeline issue was noted in: e-mission/e-mission-docs#1126 (comment) I implemented a fix: #1061 This allowed me to do testing on the rule engine with a manual pipeline run. |
Currently, we test the API calls to Overpass, but we don't test the logic that comes after those calls. Specifically, we're missing test coverage for:
The match_stops module's transit mode prediction logic
The rule_engine module's transit classification logic (bus vs. car disambiguation)
This lack of test coverage makes it difficult to ensure these components work correctly, especially when making changes to the codebase.
Changes Made
Two new test files have been added to provide comprehensive test coverage for the transit matching logic, they are basically the same, I included both so we can decided which one to go with or both:
How are they different?
Semi-mocked Approach (TestMatchStopsWithSavedData):
Fully Mocked Approach (TestMatchStopsWithMockData)
Areas Tested
The test files cover these critical components:
Test Scenarios
Tests include various scenarios: