Overture Instead of OSM – Easy Pipeline¶
In a nutshell, yes 100% you can! However, can it be much better integrated? Of course, always!
The following notebook showcases the UrbanMapper library to process and visualise building counts along road segments in Manhattan, NYC using data coming entirely from Overture. It follows a structured pipeline approach, including data loading, filtering, enrichment, and visualisation.
Setup¶
Prior all, let's simply initialise an UrbanMapper instance, setting the foundation for the pipeline.
import urban_mapper as um
import geopandas as gpd
from urban_mapper.pipeline import UrbanPipeline
mapper = um.UrbanMapper()
Pre Requisites –– Data Preparation¶
As the goal is to use Overture data we must ensure to have them prior all. To do so, follow the (1) https://docs.overturemaps.org/getting-data/overturemaps-py/ or (2) assuming you already have overture installed as in your general pip packages (in your CLI):
overturemaps download --bbox=-74.257159,40.495992,-73.699215,40.915568 -f geojson --type=segment -o nyc_segments.geojson
overturemaps download --bbox=-74.016367,40.702726,-73.934212,40.821589 -f geoparquet --type=building -o manhattan_buildings.parquet
This will, nothing more than downloading the right information (roads and buildings) at the right location from Overture to proceed with Urban Mapper.
Next we simply need to make sure to clip the segments acquired from Overture, to Manhattan for computation's sake, but feel free to explore more!
from shapely.geometry import Polygon
west, south, east, north = -74.016367, 40.702726, -73.934212, 40.821589
bbox = Polygon([(west, south), (east, south), (east, north), (west, north)])
roads_gdf = gpd.read_file("./nyc_segments.geojson")
if roads_gdf.crs != "EPSG:4326":
roads_gdf = roads_gdf.to_crs("EPSG:4326")
road_subtype_gdf = roads_gdf[ # Keeping only the essential!
(roads_gdf['subtype'] == 'road') &
(roads_gdf['class'].isin(['motorway', 'residential', 'living_street', 'primary', 'secondary']))
]
filtered_roads = gpd.clip(road_subtype_gdf, bbox)
filtered_roads.reset_index(drop=True, inplace=True)
filtered_roads.to_file("manhattan_roads.geojson", driver="GeoJSON")
Pre-Requisites –– Transforming the buildings into Shapefile¶
The following step converts building data from a parquet file to a shapefile, as the UrbanMapper API currently requires shapefile input for longitude and latitude to be automatically inferred as later-on are heavily required.
If the parquet buildings file was having longitude and longitude the following step would not be required.
Meanwhile, note that the mechanism behind our ShapefileLoader will need to be repeated in the Parquet's one and others to allow for input files to not have longitude and latitude by default in, yet, via geometry coordinates should automatically be inferred. Mechanism is present already, simply needs to be scaled to more primitives.
tmp_nyc_buildings = gpd.read_parquet("./manhattan_buildings.parquet")
tmp_nyc_buildings.to_file("./manhattan_buildings.shp")
Component Instantiation: Loader¶
The loader component is defined to read the preprocessed building data from the shapefile. Make the primitive ready to be used throughout the pipeline later on.
loader = (
mapper.loader
.from_file("./manhattan_buildings.shp")
.build()
)
Component Instantiation: Urban Layer¶
The urban layer component uses the filtered road segments, mapping building coordinates to the nearest road. Make the primitive ready to be used throughout the pipeline later on.
urban_layer = (
mapper.urban_layer
.with_type("custom_urban_layer")
.from_file("./manhattan_roads.geojson")
.with_mapping(
longitude_column="temporary_longitude",
latitude_column="temporary_latitude",
output_column="nearest_road"
)
.build()
)
Component Instantiation: Imputer¶
The imputer fills in missing longitude and latitude values to ensure data integrity. Make the primitive ready to be used throughout the pipeline later on.
imputer = (
mapper.imputer
.with_type("SimpleGeoImputer")
.on_columns("temporary_longitude", "temporary_latitude")
.build()
)
Component Instantiation: Filter¶
The filter applies a bounding box to refine the dataset spatially, making sure no buildings from Brooklyn are being attached to a road around Manhattan. Make the primitive ready to be used throughout the pipeline later on.
filter_step = (
mapper.filter
.with_type("BoundingBoxFilter")
.build()
)
Component Instantiation: Enricher¶
The following enricher counts buildings per road segment, providing the key analytical output. Make the primitive ready to be used throughout the pipeline later on.
building_count = (
mapper.enricher
.with_data(group_by="nearest_road")
.count_by(output_column="building_count")
.build()
)
Component Instantiation: Visualiser¶
The visualiser sets up a basic static matplotlib figure. Make the primitive ready to be used throughout the pipeline later on.
visualiser = (
mapper.visual
.with_type("Static")
.build()
)
Pipeline Assembly¶
The pipeline combines all pre-instantiated components in a logical sequence for processing.
pipeline = UrbanPipeline([
("loader", loader),
("urban_layer", urban_layer),
("impute", imputer),
("filter", filter_step),
("enrich_building_count", building_count),
("visualiser", visualiser),
])
Pipeline Execution¶
This step runs the pipeline, transforming the data and generating the enriched layer. Note that there is a nice animation during the pipeline execution for you to follow-up with what's going on!
mapped_data, enriched_layer = pipeline.compose_transform()
Visualisation¶
The enriched layer is visualised, showing building counts along road segments statically.
fig = pipeline.visualise([
"building_count",
])
Export Results¶
Finally, the processed data is saved to a JupyterGIS file for future analysis in a collaborative-in-real-time manner.
pipeline.to_jgis(
filepath="new_york_city_overture_easy_pipeline.JGIS",
urban_layer_name="NYC Overture Roads & Buildings – Easy Pipeline"
)