Skip to main content Link Search Menu Expand Document (external link)

Dataset Highlights

NYC Open Data would not exist without the work of Open Data Coordinators: City staff at each agency who are responsible for identifying, structuring, documenting, publishing, maintaining, and sharing their agency’s public datasets. As New York City evolves, so does the data that is used in its operations. And these changes often mean publishing a new dataset to share more about the work the agency is doing. This year, we’ve published more than 200 new datasets, all of which are available to view in our asset inventory, and some of which are highlighted below:

Workforce Development Metrics

Agency: Mayor’s Office for Economic Opportunity (NYC Opportunity)
Dataset: Performance Metrics for Workforce Development Programs


New York City’s Workforce Data Portal represents a wide-scale effort by the Mayor’s Office for Economic Opportunity (NYC Opportunity) to integrate and standardize data reporting across the City’s workforce development system. The system encompasses the multiple City agencies, community-based organizations, and activities that prepare NYC residents for employment, help workers advance in their careers, and ensure a skilled workforce. More than 20 City agencies manage their workforce programs, but no single “workforce development data system” exists in NYC. As a result, program data, including data about participants, performance metrics, and outcomes, have been fragmented, making it challenging to develop insights based on a comprehensive picture.

Currently, the Portal incorporates data from 18 workforce development programs managed by five distinct City agencies and data from the New York State Department of Labor’s wage reporting system. The report contains thirteen performance metrics (e.g. clients served, full-time job placement, training credential attainment, academic degree program enrollment) for the City’s workforce development programs. Each metric can be broken down by three demographic types (gender, race/ethnicity, and age group) and the program target population (e.g., youth and young adults, NYCHA communities) as well.

For Open Data Week 2022, NYC Opportunity hosted the virtual presentation “Building a Workforce Data Portal: From Data Silos to an Integrated Data Platform.” The goals of the event were to share approaches to building an integrated data platform to turn raw data into meaningful insights, advocate the importance of data standards and definitions for conducting analyses across multiple City agencies, and discuss how this portal can be leveraged by researchers, policy makers and data scientists.

Visit the Workforce Data Portal to learn about the metrics used in this dataset and to explore the dataset with interactive visualizations. For more resources related to NYC workforce development system, see WorkingNYC (NYC’s front door to programs that help residents prepare for and find a job) and Rebuild, Renew, Reinvent: A Blueprint for NYC’s Economic Recovery (a strategic plan from the Adams administration on economic recovery, including the workforce development system).

Outdoor Public Art Inventory

Agency: Public Design Commission (PDC)
Dataset: Public Design Commission Outdoor Public Art Inventory


The Public Design Commission (PDC) has jurisdiction over permanent structures, landscape architecture, and art proposed on or over City-owned property. Its approval is required for any permanent installation of artwork on City-owned property, as well as any relocation, conservation, or removal of City-owned artworks. It created the Public Design Commission Outdoor Public Art Inventory dataset in response to a 2018 report from the Mayoral Advisory Commission on City Art, Monuments, and Markers. In September 2019, the PDC completed the first phase of the inventory with a focus on outdoor artwork, including monuments and memorials. As part of this project, two archivists were hired to create a dataset and two professional photographers were hired to document the artworks, beginning with Manhattan and Staten Island. PDC staff have continued to compile and review the data as more artworks are added to the collection.

The resulting dataset includes geodata on the locations of these artworks, artist information, artwork typology, material, and subject matter, as well as PDC approval, creation, and dedication dates. This data can be used to locate City-owned outdoor artworks installed by various City agencies and to learn additional context regarding the artist and subject of the artwork. This dataset can be cross-referenced with the NYC Parks Monuments and Completed Percent for Art projects with Artist Information datasets on NYC Open Data. Visit the Public Design Commission website for more information about guidelines for artwork, monuments and memorials.

Watershed Water Quality

Agency: Department of Environmental Protection (DEP)
Dataset: Watershed Water Quality - Hydrology


The Department of Environmental Protection (DEP) maintains a Watershed Water Quality data collection containing water quality testing data within the New York City Water Supply System, which comprises 19 reservoirs and three controlled lakes and spreads across a 2,000-square-mile watershed. The water supply system provides approximately one billion gallons of safe drinking water every day to New York City’s 8.8 million residents, as well as about 110 million gallons a day to one million people living in Westchester, Putnam, Orange, and Ulster counties. Its watershed is located in portions of the Hudson Valley and Catskill Mountains and stretches as far as 125 miles north of New York City.

Each year, DEP scientists collect thousands of water samples that are analyzed hundreds of thousands of times in water quality laboratories to ensure that the highest quality water is delivered to consumers. They monitor the quality of the water starting from the feeder streams that supply the reservoirs all the way to the nearly 1,000 street-side sampling stations located in every neighborhood in New York City. Analyses are conducted for microbiological, chemical, and physical parameters. Water quality data is stored and reported from a laboratory information management system (LIMS) that maintains records of when and where the sample was collected, how it was processed, and which water quality parameters were tested. The LIMS also stores any additional relevant information related to the sample or the analyzed result. Collected samples are categorized by type, including hydrology (watershed stream samples), limnology (reservoir sampling), and keypoints (locations within aqueducts or tunnels). DEP makes the datasets public and they are arranged into tables by sample type.

To successfully utilize these data, the user should have a basic understanding of what the reported parameters (e.g. Turbidity, Carbon, Nitrogen) represent. Resources to help users understand these parameters are available in the Annual Watershed Water Quality Report.

DEP is currently working to restructure these datasets by combining the different sample types, qualifiers, and sites into a single table which is intended to make the dataset easier to use and visualize.

Real-World Fuel Efficiency

Agency: Department of Citywide Administrative Services (DCAS)
Dataset: Real-World Fuel Efficiency


With more than 30,000 vehicles, over two-thirds of which are hybrid or alternative fuel vehicles, New York City’s municipal fleet is the largest in the United States. The fleet includes a wide range of vehicle types—sedans, SUVs, pickups, and vans—used by many different City offices and agencies. In anticipation of the City’s transition to an all-electric fleet by 2035, the Department of Citywide Administrative Services (DCAS) evaluated the potential environmental and financial impacts of this transition by conducting an analysis of expected and actual fuel efficiency of City fleet vehicles. The data from this analysis comprises the Real-World Fuel Efficiency dataset.

Vehicle fuel economy labels report miles per gallon estimates based on a series of tests in controlled situations. However, especially in the case of New York City, vehicles are subject to a wide range of conditions, including the effects of weather, heat & air conditioning, stalled traffic, and idling, making a comparison of published and real-world fuel efficiency both useful and informative. The origin of this comparison was the passage of two local laws: Local Law 34 of 2005, which required DCAS to report on the published fuel economy of non-NYPD purchased vehicle models, and Local Law 75 of 2013, which required DCAS to report on actual use-based (‘real-world’) fuel economy. DCAS upgraded their vehicle telematics (remote sensors and tracking) systems in 2019 to improve their ability to report on real-world fuel economy.

The Real-World Fuel Efficiency dataset reports both the Environmental Protection Agency (EPA) expected and actual fuel economy of more than 4,000 non-policing fleet vehicles, including both hybrid & conventional gas vehicles and covering 106 vehicle makes, models, and years. These vehicles were tracked throughout all of 2019 as they traveled a combined total of over 18 million miles. Based on the fueling and use data, DCAS also estimated the fuel costs per gallon and mile for each model of vehicle. They found that while hybrid vehicles were expected by the EPA to be 118% more fuel efficient than non-hybrids, they were actually 155% more fuel-efficient in the City fleet. The results indicate that switching to electric vehicles could have considerable environmental as well as financial benefits.

Learn more about the Real-World Fuel Efficiency data and view the data aggregated by vehicle make, model, and year in this report. The labor-intensive nature of this project makes it unlikely that the dataset will be updated, though changes in technology and resources may change plans.

Housing Database

Agency: Department of City Planning (DCP)
Dataset: Housing Database by Census Block


The Department of City Planning (DCP)’s Housing Database (Housing DB) contains approved Department of Buildings (DOB) applications for new construction, alterations of existing buildings, and demolitions that either add or remove residential units that were filed or completed after January 1, 2010. Housing DB is built off of DOB Job Applications Filings records and other open data inputs. DCP adds value to these existing open datasets and creates Housing DB by selecting the subset of DOB Job Applications Filings, decoding DOB’s codes to more descriptive values, and adding geospatial data.

With each release of Housing DB, DCP’s Housing team conducts an extensive research process, which includes correcting data errors and allocating units between long-term residential use (Class A), hotels, and other Class B (temporary occupancy) units like SROs (single room occupancy units) and dormitories. This intensive research process improves the data quality and adds a level of detail to Housing DB. Furthermore, since many analysts are interested in how the number of housing units has changed over time by geographic area, DCP publishes unit change summary files that report the net change in Class A housing units by year at different geographic levels (e.g. Neighborhood Tabulation Area). People interested in analyzing the change in legal housing units across time and space should look no further than DCP’s Housing Database. For more information about Housing DB and related housing and economic analyses, visit Bytes of the Big Apple.