-
Notifications
You must be signed in to change notification settings - Fork 6
Stop clustering
In many cases the GTFS data does not aggregate related stops in a single station, or several related stops comes from distinct GTFS feed and thus will not have any relationship. For this use case one would like to cluster stops in groups, based on spatial proximity.
To group stops by spatial proximity, one can use the provided SpatialClusterizer
class. Example of use:
from gtfslib.spatial import SpatialClusterizer
sc = SpatialClusterizer(100) # 100 meters
sc.add_points(dao.stops())
sc.clusterize()
for cluster in sc.clusters():
print(cluster.id)
for stop in cluster.items:
print(stop.stop_name)
There are various methods on the SpatialClusterizer
class:
-
add_point(p)
- Add a new point-like object -
add_points(points)
- Add a collection of points -
clusterize()
- Aggregate added points into clusters. Note: after this method has been called you can't add new points anymore. -
cluster_of(p)
- Return the cluster of a given point -
in_same_cluster(p1, p2)
- Return true if p1 and p2 are in the same cluster -
clusters()
- Return the collection of all clusters. Each cluster contains the following:-
id
- A numeric unique ID (arbitrary and non stable across runs) -
items
- A collection of points in this cluster
-
Note: A point-like object is anything having a lat()
and lon()
methods.
Clusters are stored in memory, not in the database along with the loaded GTFS data.
For more information, see the code.
You can customize the heuristic to determine if two stops should belong to the same cluster, by passing a comparator
parameter to the clusterize()
method:
def in_same_cluster(d, d0, stop1, stop2):
# Apply a penalty if the stops do not have the same name
if stop1.stop_name == stop2.stop_name:
return True
return d < d0 / 2.0
sc = SpatialClusterizer(500) # 500 meters
sc.add_points(dao.stops())
sc.clusterize(comparator=in_same_cluster)
This example will group stops in the same cluster when closer than 500 meters if they have the same name, and if not, only if they are closer than 250 meters.
A helper factory method make_comparator()
give you the opportunity to build the most common comparator (forcing the same name and/or adding a penalty if stops do not belong to the same station):
sc.clusterize(comparator=sc.make_comparator(same_name=True, different_station_penalty=0.3))
This example will group stops in the same cluster 1) if they have the same name, and 2) when closer than 500 meters if they belong to the same station, or if not, if they are closer than 150 meters (500 * 0.3).