Auctus Mixin¶
Optional module
Install the auctus_mixins extra to enable these features:
pip install urban-mapper-community[auctus_mixins]uv add urban-mapper-community --group auctus_mixins
What is Auctus Mixin?
The Auctus mixin is responsible to deliver access to Auctus Dataset Search API services via the UrbanMapper
workflow. It provides a set of methods to search for datasets, get dataset details, and download datasets.
A mixin, in this very instance, is nothing more than a class that connects external libraries for their use
directly adapted towards the UrbanMapper workflow.
Documentation Under Alpha Construction
This documentation is in its early stages and still being developed. The API may therefore change, and some parts might be incomplete or inaccurate.
Use at your own risk, and please report anything that seems incorrect / outdated you find.
auctus
¶
Classes¶
AuctusSearchMixin
¶
Bases: _AuctusSearchBase
Mixin for searching, exploring, and loading datasets from the Auctus data discovery service.
This mixin extends AuctusSearch to provide a simplified interface for discovering
and working with datasets from the Auctus data discovery service. It allows users
to search for relevant datasets, explore their metadata, and load them directly
into their urban data analysis workflows.
What is Auctus? What is Auctus Search?
Auctus is a web crawler and search engine for datasets, specifically meant for data augmentation tasks in
machine learning. It is able to find datasets in different repositories and index them for later retrieval.
Auctus paper's citation:
Sonia Castelo, Rémi Rampin, Aécio Santos, Aline Bessa, Fernando Chirigati, and Juliana Freire. 2021. Auctus: a dataset search engine for data discovery and augmentation. Proc. VLDB Endow. 14, 12 (July 2021), 2791–2794. https://doi.org/10.14778/3476311.3476346
Auctus official website:
https://auctus.vida-nyu.org/
Find more in the Auctus GitHub repository.
–––
Auctus Search on the other hand, is a wrapper of the great Auctus' API. Workable straightforwardly from
a Jupyter notebook's cell.
Find more in the Auctus Search GitHub Repository.
What is a mixin?
A mixin is a class that provides methods to other libraries' classes, but is not considered a base class itself. Consider this as helpers from external sources.
Examples:
>>> from urban_mapper import UrbanMapper
>>>
>>> # Initialise UrbanMapper
>>> mapper = UrbanMapper()
>>>
>>> # Search for datasets about NYC taxi trips
>>> results = mapper.auctus.explore_datasets_from_auctus(
... search_query="NYC taxi trips",
... display_initial_results=True
... )
>>>
>>> # Select a dataset from the results (interactive)
>>> # (This would be done through the UI that appears)
>>>
>>> # Load the selected dataset
>>> taxi_trips = mapper.auctus.load_dataset_from_auctus()
>>>
>>> # Profile the dataset to understand its characteristics
>>> mapper.auctus.profile_dataset_from_auctus()
Source code in src/urban_mapper/mixins/auctus.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | |
Functions¶
explore_datasets_from_auctus(search_query, page=1, size=10, display_initial_results=False)
¶Search for datasets in the Auctus data discovery service.
This method queries the Auctus data discovery service for datasets matching
the provided search query. Results can be paginated and optionally displayed
immediately for quick inspection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
search_query
|
Union[str, List[str]]
|
Search query string or list of strings to find datasets. |
required |
page
|
int
|
Page number for paginated results. Defaults to 1. |
1
|
size
|
int
|
Number of results per page. Defaults to 10. |
10
|
display_initial_results
|
bool
|
Whether to automatically display search results. Defaults to False. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
AuctusDatasetCollection |
AuctusDatasetCollection
|
An object containing the search results, which can be further explored or used to select a dataset. |
Examples:
>>> results = mapper.auctus.explore_datasets_from_auctus(
... search_query="NYC crashes",
... size=20,
... display_initial_results=True
... )
Source code in src/urban_mapper/mixins/auctus.py
load_dataset_from_auctus(display_table=True)
¶Load the selected dataset from Auctus search results.
This method loads the dataset that was selected after calling
explore_datasets_from_auctus(). It can handle both tabular and geographic data,
returning a pandas DataFrame or geopandas GeoDataFrame accordingly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
display_table
|
bool
|
Whether to display a preview of the loaded data. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
Union[DataFrame, GeoDataFrame]
|
Union[pd.DataFrame, gpd.GeoDataFrame]: The loaded dataset, either as a pandas DataFrame or geopandas GeoDataFrame. |
Examples:
Source code in src/urban_mapper/mixins/auctus.py
profile_dataset_from_auctus()
¶Generate and display a profile report for the selected Auctus dataset.
This method creates a comprehensive profile of the dataset loaded using
load_dataset_from_auctus(). The profile includes statistics, distributions,
and insights into the dataset's characteristics, aiding in data understanding
and preparation for analysis.
Returns:
| Name | Type | Description |
|---|---|---|
None |
None
|
This method does not return anything but displays the profile report. |
Examples: