Filter¶
In this notebook, we’ll zoom in on the important bits of your data, make sure only the data points within your just querried urban_layer remains!
Data source used:
- PLUTO data from NYC Open Data. https://www.nyc.gov/content/planning/pages/resources/datasets/mappluto-pluto-change
import urban_mapper as um
# Get UrbanMapper rolling
mapper = um.UrbanMapper()
Loading Data and Creating a Layer¶
First, let’s load some data and create a layer for say Downtown Brooklyn.
Note that:
- Loader example can be seen in
examples/Basics/loader.ipynbto know how to load your own data. - Urban Layer example can be seen in
examples/Basics/urban_layer.ipynbto know how to query your layer e.g of Downtown brooklyn streets intersections.
# Load data
# Note: For the documentation interactive mode, we only query 5000 records from the dataset. Feel free to remove for a more realistic analysis.
data = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=5000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file within the HuggingFace OSCUR datasets hub and with the `longitude` and `latitude` or only `geometry`
)
# Create urban layer
layer = (
mapper.urban_layer.with_type("streets_intersections") # From the urban_layer module and with type streets_intersections
.from_place("Downtown Brooklyn, New York City, USA") # From a place
.build()
)
Applying the Filter¶
Now we've got all the ingradients, let’s use the BoundingBoxFilter to keep only the data points within our layer’s bounds. It’s like putting a spotlight on Downtown Brooklyn say you had data for the whole of New York City.
# Apply filter
filtered_data = (
mapper
.filter # From the filter module
.with_type("BoundingBoxFilter") # With type BoundingBoxFilter which is a filter that filters out your data points based on the bounding box of the layer
.transform(data, layer) # Transform the data with the layer previously queried
)
filtered_data
Be Able To Preview Your Filter¶
Curious about your filter? Use preview() to see its setup—super useful when you’re borrowing someone else’s analysis!
# Preview filter
print(mapper.filter.preview())
Provide many different datasets to the same filter¶
You can load many datasets and feed the filter with a dictionary. In that case, the output will also be a dictonary. See the next simple example.
If you want to apply the filter to a specific dataset of the dictionary, provide .with_data(data_id=...) to the filter.
# Load CSV data
data1 = (
mapper
.loader
.from_huggingface("oscur/pluto", number_of_rows=1000, streaming=True).with_columns("longitude", "latitude").load()
# From the loader module, from the following file and with the `longitude` and `latitude` or only `geometry`
)
# Load Parquet data
data2 = (
mapper
.loader
.from_huggingface("oscur/taxisvis1M", number_of_rows=1000, streaming=True) # To update with your own path
.with_columns("pickup_longitude", "pickup_latitude").load() # Inform your long and lat columns or only geometry
)
data = {
"pluto_data": data1,
"taxi_data": data2,
}
# Apply filter
filtered_data = (
mapper
.filter # From the filter module
.with_type("BoundingBoxFilter") # With type BoundingBoxFilter which is a filter that filters out your data points based on the bounding box of the layer
.transform(data, layer) # Transform the data with the layer previously queried
)
filtered_data["pluto_data"]
filtered_data["taxi_data"]
More Geo Filter primitives ?¶
Wants more? Come shout that out on https://github.com/VIDA-NYU/UrbanMapper/issues/5
Wrapping Up¶
Well done, you star! You’ve filtered your data to focus on what matters. Next stop: try enricher or visualiser.