Stop clustering

Caveat

In many cases the GTFS data does not aggregate related stops in a single station, or several related stops comes from distinct GTFS feed and thus will not have any relationship. For this use case one would like to cluster stops in groups, based on spatial proximity.

The `SpatialClusterizer` class

To group stops by spatial proximity, one can use the provided SpatialClusterizer class. Example of use:

from gtfslib.spatial import SpatialClusterizer

sc = SpatialClusterizer(100) # 100 meters
sc.add_points(dao.stops())
sc.clusterize()

for cluster in sc.clusters():
    print(cluster.id)
    for stop in cluster.items:
        print(stop.stop_name)

There are various methods on the SpatialClusterizer class:

add_point(p) - Add a new point-like object
add_points(points) - Add a collection of points
clusterize() - Aggregate added points into clusters. Note: after this method has been called you can't add new points anymore.
cluster_of(p) - Return the cluster of a given point
in_same_cluster(p1, p2) - Return true if p1 and p2 are in the same cluster
clusters() - Return the collection of all clusters. Each cluster contains the following:
- id - A numeric unique ID (arbitrary and non stable across runs)
- items - A collection of points in this cluster

Note: A point-like object is anything having a lat() and lon() methods.

Clusters are stored in memory, not in the database along with the loaded GTFS data.

For more information, see the code.

Customizing the heuristic

You can customize the heuristic to determine if two stops should belong to the same cluster, by passing a comparator parameter to the clusterize() method:

def in_same_cluster(d, d0, stop1, stop2):
    # Apply a penalty if the stops do not have the same name
    if stop1.stop_name == stop2.stop_name:
        return True
    return d < d0 / 2.0

sc = SpatialClusterizer(500) # 500 meters
sc.add_points(dao.stops())
sc.clusterize(comparator=in_same_cluster)

This example will group stops in the same cluster when closer than 500 meters if they have the same name, and if not, only if they are closer than 250 meters.

A helper factory method make_comparator() give you the opportunity to build the most common comparator (forcing the same name and/or adding a penalty if stops do not belong to the same station):

sc.clusterize(comparator=sc.make_comparator(same_name=True, different_station_penalty=0.3))

This example will group stops in the same cluster 1) if they have the same name, and 2) when closer than 500 meters if they belong to the same station, or if not, if they are closer than 150 meters (500 * 0.3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stop clustering

Caveat

The `SpatialClusterizer` class

Customizing the heuristic

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Stop clustering

Caveat

The SpatialClusterizer class

Customizing the heuristic

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

The `SpatialClusterizer` class