-
Notifications
You must be signed in to change notification settings - Fork 1k
Performance enhancements for Model.agents #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for more information, see https://pre-commit.ci
Performance benchmarks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks interesting!
@quaquel can you reproduce the speedup of WolfSheep locally? Edit: on of the benchmark models also gives a warning now:
|
I'll try to figure out what is happening with the init times here. Its probably overhead from setting up the additional datastructures, which we get back when running the model (at least for wolf sheep?). |
Thanks for this PR, I like the way it encapsulates the logic much better. I think it is a good default to return the actual agentset and not just a copy. I think this is actually more intuitive. Regarding the copy function, doesn't |
Yes, select does this as a corner case. But fair enought to seperate |
Reran benchmarks locally multiple times. Behavior is rather varied. In general inits are allways up, but not by the same percentage as shown above. Typicaly it is between 5% and 15% slower init times. I cannot see the massive speedups on wolf sheep. Rather, I see sometimes mosest increases or modest decreases. I am going to dig a bit more to see what is going on. |
Performance benchmarks:
|
@property | ||
def agent_types(self) -> list[type]: | ||
"""Return a list of different agent types.""" | ||
return list(self.agents_.keys()) | ||
return list(self._agents_by_type.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous and the new change have different meaning altogether (list of all type of each agents vs list(set(agent types))). I suppose the docstring is ambiguous even though it is closer to the former meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not correct. In the old version self.agents_
was the exact same datastructure as what is now _agents_by_type
. In the old version: self.agents_: defaultdict[type, dict] = defaultdict(dict)
. In the new version: self._agents_by_type: dict[type, AgentSet] = {}
. So, in either case, keys is a list of types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means it still stands that the docstring is not clear enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring has not been changed by me in this PR, so it is already in the existing code. If you suggest changing it, that would be fine, but why would that hold up this PR?
Also, what do you suggest changing? The return is a list, and this list contains the types of the agents in the model. The only, minor, ambiguity is that it is a list of the unique types of agents, not the type for each agent in the model.
Let me review it in depth tonight (I'm trying to stay in the model-building flow) |
Do we need to modernize |
This version of the model is on a grid. #2224 is with all agents in the model. So, I am not sure what to expect in the first place. |
Yeah so the Grid operations might be the bottleneck. Sounds logical. I will try to update the benchmarks later tonight or tomorrow, and then we can bring this one in. I'm also curious if we can speed up |
In the case of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I really like the model.register / deregister. It's very clear and cleanly separated, and the right way around.
model.agents
now is the ground truth right?- I would really appreciate an overview of all the variables and private variables we now keep around agents, agent types and AgentSets.
mesa/model.py
Outdated
self._agents = {} | ||
self._agents_by_type: dict[type, AgentSet] = {} | ||
self._all_agents = AgentSet([], self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so now we're keeping 4 records of all our agents (if we count self.agents
)? Could you explain a bit why each is needed?
I think this can be a place where some performance is lost, in models that frequently kill and create agents (like wolf-sheep)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._agents contains the hard refs, so this is essential.
self._all_agents is the agentset with all agents. The main performance motivation for this PR. So again essential.
The only one you could debate is agents_by_type. This you could also do via self._all_agents.groupby(types).groups
and thus as a property. What is better from a performance standpoint is hard to say. adding and removing stuff from dicts is relatively cheap to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, appreciated. Could you add this in a comment or docstring behind/above each?
self.agents
is the only thing we formally expose, all others are private and use-at-you-own risk.
For now I see the use case of a separate _agents_by_type
. We can dive into the performance later.
And at least we got rid of self.agents_
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.agents is the only thing we formally expose, all others are private and use-at-you-own risk.
Which is exactly why I moved everything else into private. It keeps the public side the same as before.
I'll add a few comments on the datastructures later today
no, model._agents is the hardref dict and the ground truth.
can we wait with that until we remove all the old 2.x stuff from the model? |
Performance benchmarks:
|
for more information, see https://pre-commit.ci
This PR is a performance enhancement for Model.agents. It emerged from a discussion on [the weird scaling performance of the Boltzman wealth model](projectmesa#2224). model.agents now returns the agentset as maintained by the model, rather than a new copy based on the hard references agent registration and deregistration have been moved from the Agent into the model. The agent now calls model.register and model.deregister. This encapsulates everything cleanly inside the model class and makes Agent less dependent on the inner details of how Model manages the hard references to agents the setup of the relevant datastructures is moved into its own helper method, again, this cleans up code.
Should |
If you want to get technical, I guess in a java sense they would be protected rather than private. They are not private because that would mean that agent could not call model.register_agent. They are not public in the sense of user facing. |
In Python speak, however, it is either a public or an internal method. Public is everything user-faced, documented and guaranteed to be stable. So if there isn't any user facing use case for these functions I think they should indeed be made non-public by a leading underscore. This means its meant to be used internally, which is exactly whats happening here. |
Thanks for the explanation Jan, and I agree with Corvince. |
This PR is a performance enhancement for Model.agents. It emerged out of a discussion on the weird scaling performance of the Boltzman wealth model.
Key changes
model.agents
now returns the agentset as maintained by the model, rather than a new copy based on the hard referencesmodel.register
andmodel.deregister
. This encapsulates everything cleanly inside the model class and makes Agent less dependent on the inner details of how Model manages the hard references to agents