The 2025 EY Open Science AI and Data Challenge: Cooling Urban Heat Islands (EY Participants)

703
32 Days Left
challenge-header-img

The EY Open Science AI & Data Challenge calls for innovators to address the Urban Heat Island effect using AI. Develop ML models to predict city temperatures and aid urban design for cooler, sustainable environments. Contribute to global efforts against climate change and enhance urban resilience. Join us in shaping a livable future for city dwellers.

  • Challenge startJan 20, 2025
    On Monday, January 20, 2025
    Challenge begins.
  • Challenge endMar 20, 2025
    On Thursday, March 20, 2025
    Challenge ends.
  • Finalists announcementApr 11, 2025
    On Friday, April 11, 2025
    Finalists announcement. Finalists' content package development.
  • Finalists' content dueMay 4, 2025
    On Sunday, May 4, 2025
    Judging panel review of finalists' content packages.
  • Winners announcementMay 22, 2025
    On Thursday, May 22, 2025
    Winners announced.

Data Description

Target Dataset:

Near-surface air temperature data in an index format was collected on 24 July 2021 across the Bronx and Manhattan regions of New York City in the United States. The data was collected in the afternoon between 3:00 pm and 4:00 pm. This dataset includes time stamps, traverse points (latitude and longitude) and the corresponding Urban Heat Island (UHI) Index values for 11229 data points. These UHI Index values are the target parameters for your model.

Please find the dataset here.

Note: Participants are strictly prohibited from using Longitude and Latitude values as features in building their machine learning models. Submissions that employ longitude and latitude values as model features will be disqualified. These values should only be utilized for understanding the attributes and characteristics of the locations.

Incorporating latitude and longitude data in their raw forms or through any form of manipulation—including multiplication, embedding, or conversion to polar coordinates—as predictive features in your model is strictly prohibited, as it can compromise the adaptability of your model across diverse scenarios. This prohibition extends to calculating the distance from a reference point and using it as a feature, which is essentially a transformation of the original geographical coordinates into a new feature form. Submissions that include these types of features will be considered non-compliant and will be disqualified.

Feature Datasets:

Participants can leverage many datasets to consider for their models. Their ability to analyze which datasets and parameters are the most important for model development will determine the model performance. The following are the recommended satellite datasets:

These datasets can be extracted from Microsoft Planetary Computer Portal's data catalog. Please see the sample notebooks for more details.

Additional Datasets:

Participants can also explore the following datasets in their model development journey:

Additionally, participants are allowed to use additional datasets for their models, provided those datasets are open and available to all public users and the source of such datasets are referenced in the model.

Validation Dataset:

After building the machine learning model, you need to predict the UHI index values on the locations identified in the validation dataset. Predictions on the validation dataset need to be saved in a CSV file and uploaded to the challenge platform to get a score on the ranking board.

Supporting Material:

Participants can refer to the following material before starting model development:

This ZIP file contains all of the required content mentioned above. You will find datasets, sample notebooks and documentation to support the data challenge.

Terms of Use and Licensing requirements for the datasets:

Training Data:

Satellite Data (Sentinel-2 Sample Output)

Building Footprint Data

Weather Data