Spatial Associations between Transportation and Commerce

An analysis of commercial area in Manhattan integrated with subway lines, stations, volumes of subway, car and cyclists

Feiyang Ren, Xueliang Yang

December 18, 2023

Introduction

The interplay between economic activities and transportation forms a dynamic cycle, characterized by spatial and temporal shifts within both systems. Changes in the spatial distribution of economic activities can reshape travel demand patterns, subsequently influencing modifications in the transportation infrastructure. Inversely, advancements in transportation infrastructure elevate accessibility, thereby catalyzing shifts in the spatial arrangement of economic activities. Notably, the proliferation of retail locations predominantly occurs within a 100-meter radius of rail transit stations. This underscores the 'corridor effect' of rail transit on socio-economic activities in urban areas, as detailed in the study by Ma et al., 2023.

The example from Changchun illustrates a strong correlation between the density of rail transit and the locations of dining, retail, and entertainment venues. In this case study, GIS technology was employed to map out and visualize the spatial distribution of both transportation routes and commercial activities within the central area of Changchun in 2020.

Spatial distribution of transport routes in Changchun's central city in 2020.

Spatial distribution of commercial activities in Changchun's central city

The case study from Changchun, where a strong correlation was observed between rail transit density and commercial hubs, especially in dining, retail, and entertainment sectors, serves as a pivotal example. This correlation demonstrates a pattern that is likely to be replicated in diverse urban environments, including Manhattan.

In Manhattan, a unique urban landscape characterized by a dense network of subway tracks, roads, and bike paths, we expect to uncover similar trends. Our project aims to delve deep into these spatial relationships, focusing on how subway lines, along with vehicular and bicycle traffic flows, shape and are shaped by commercial trends. By integrating comprehensive ridership data with commercial activity patterns, we aim to provide a nuanced understanding of how transportation infrastructure not only supports but also drives economic activity.

Methodology

Data Preparation

When examining land use map provided by New York City Planning, it's essential to appreciate their role as guiding documents in urban development. However, real-world scenarios often present a contrasting picture, as exemplified by the New York University (NYU) campus situation. In the case of NYU's main campus located at 44 West 4th Street, there's an intriguing deviation from the map's designation. The map marks this area as R7-2, which is typically associated with medium-density apartment housing. However, the actual use of this land as a university campus signifies a departure from the planned residential character of the area.

NYC Planning ZoLa

Indeed, PLUTO (Primary Land Use Tax Lot Output) data is a better resource for in-depth analysis of land use in New York City. This dataset is particularly valuable because it provides comprehensive information at the tax lot level.

Data is meticulously gathered from sources including NYCOpenData, the Census, and NYCPLANNING. Such data undergoes a thorough cleaning and classification process to accurately extract and organize targeted fields, with a special focus on those containing geometric information.

Python is utilized to calculate the daily average ridership at each subway station in Manhattan for the year 2022.

Similarly, daily average traffic volume in Manhattan, 2019 is grouped.

For calculating the annual cycling ridership counts in Midtown, given that the dataset only includes year and month data, the yearly ridership 2022 figures are generated as following.

By cleaned midtown cycling counts, calculate the average cycling counts in each address in 2022, and reform the sheet with columns "Address" and "Counts" for further geocoding.

01 / 10

Geoprocessing

In this section, a range of methodologies would be introduced, focusing primarily on Geoprocessing Tools that are applied within ArcGIS Pro.

XY Table to Point

Since all the cleaned tables contain latitude and longitude fields as geometry information, the Geoprocessing Tool 'XY Table to Point' is utilized for visualizing these tables on the map.

Moreover, due to the limitation of PLUTO 2018 data being exclusively accessible via shapefiles represented by polygons, direct transformation into points is unfeasible. This predicament necessitates the exploration of two potential solutions. The initial approach involves geocoding based on addresses; however, this method has been excluded for reasons elaborated upon in the subsequent Geocode Section. Consequently, the adopted strategy involves the derivation of XY coordinates for each individual point, followed by the utilization of the XY Table to Point methodology to effectuate the necessary transformation, a method implemented within this context.

Project

The XYTableToPoint feature will be projected into the appropriate coordinate system. In GIS, 'projecting' refers to the process of transforming spatial data from one coordinate system to another. This is crucial for ensuring that the data aligns accurately with the geographical context, enabling precise analysis and visualization. By projecting the XYTableToPoint feature into the right coordinate system, the data becomes more reliable and useful for spatial analysis and decision-making in geographic contexts.

Clip

After the projection process, it was observed that some points fell outside of Manhattan. To address this, the 'Clip' tool is employed to retain only those features within the desired region. In GIS, the 'Clip' tool is used to trim or cut out a portion of one data layer using the boundaries of another layer. This ensures that the analysis is focused exclusively on the specified area, in this case, Manhattan, by removing extraneous data points that do not contribute to the objectives.

Geocode Addresses

During the processing of the midtown cycling counts table, the application of the Geocode Addresses function became imperative to ensure accurate point mapping. Geocoding within Geographic Information Systems (GIS) entails the conversion of addresses into precise geographic coordinates, facilitating the placement of data points on a map based on their address information. This process of geocoding the addresses associated with cycling counts serves to intricately position each point on the map, enabling a more granular and location-specific analytical approach.

The utilization of geocoding aims to visualize addresses lacking geospatial information on a map. However, within this project's scope, the absence of a requisite locator necessitated the adoption of the ArcGIS World Geocoding Service Locator by Esri. Regrettably, this locator encountered functionality issues due to inadequacies in the accuracy of the cycling count's address information. Consequently, a manual re-matching of all cycling points became imperative. However, considering the sheer volume of 1900 points for PLUTO 2018, undertaking this manual rematching process would impose an extensive workload, prompting the decision to forego geocoding for PLUTO 2018 data.

Select by Attritbutes

The emphasis of the study is on the distribution of buildings related to commerce in Manhattan for the year 2023. For this analysis, the key field used for grouping data is labeled 'BUILDING CLASS' (bldgclass), which categorizes buildings based on their commercial classification. This approach enables a detailed examination of the spatial patterns of commercial buildings across Manhattan.

Department of Finance - Property Tax System (PTS) group the buildings R5, R7, R8, RA, RB, RH, and RK as commercial buildings.
If the unit lots are a mixture of commercial building types, BUILDING CLASS = RC.
If the unit lots are a mixture of commercial and residential building types, BUILDING CLASS = RM.
If the unit lots are a mixture of commercial and industrial/warehouse building types, BUILDING CLASS = RI.
If the unit lots are a mixture of commercial, residential, and industrial/warehouse building types, BUILDING CLASS = RX.

In processing the block shapefile we also use Select by Attributes such that the blocks in Manhattan are chosen.

Summarize Within

To generate a heatmap reflecting the concentration of commercial-related buildings, the 'Summarize Within' tool is employed. This tool aggregates the 'Count of Points', representing the number of commercial-related buildings, within the defined blocks of Manhattan. This method effectively integrates and quantifies commercial building data at the block level, facilitating the creation of a heatmap that visually represents the distribution and density of commercial structures across the area.

Buffer

A 500-meter buffer zone is established around the subway stations to visually depict the area typically regarded as a suitable walking distance. This specific distance is rooted in existing literature, identifying it as a reasonable range for pedestrian travel. Applying this buffer allows for an analysis of the accessibility of subway stations and the potential pedestrian catchment area.

Kernel Density

To effectively evaluate the density of the desired feature, utilizing the Kernel Density tool is the optimal approach for visualization. Kernel Density analysis in GIS creates a smoothed density map, showing the concentration of a specific feature over an area. This technique is particularly useful for identifying hotspots or areas of high concentration, allowing for a clear and comprehensive visual representation of the feature's distribution across the study area.

Take the feature representing the daily average ridership per subway station as an example.

Reclassify

For the effective application of the Raster Calculator, it is essential to first pre-process the raster data layers. An analysis reveals that the kernel densities of MTA ridership, traffic volumes, and cycling riderships are not on the same scale, necessitating a reclassification. Prior to reclassification, it is important to review and adjust the symbology methods for clarity and consistency. In this process, all layers are converted to the 'Natural Breaks' classification method, enhancing visual interpretation. The classes are kept at their default settings. Following these adjustments, the reclassified raster layers are then produced for further analysis.

Raster Calculator

Once the raster layers, representing the reclassified kernel densities, are prepared, they are suitable for aggregation. The use of the Raster Calculator tool is crucial here, as raster layers cannot be overlaid directly. By inputting specific clauses in the Raster Calculator, the tool facilitates the combination of these layers. In areas where the rasters overlap, the resultant values are higher, indicating greater density. Conversely, in areas without overlay, the original values are retained. This process results in a calculated raster layer that effectively represents the combined kernel density of the entire traffic system.

Results and Discussion

Kernel Density Maps

This represents the kernel density of buildings related to commerce.

A notable area of focus, as indicated by the kernel density analysis, is the region stretching from West 15th to West 22nd Street, intersecting both 5th and 6th Avenues. This area shows a significant concentration of commercial buildings.
Additionally, another densely populated commercial zone is observed in the southern part of Manhattan.

This map demonstrates the kernel density of daily average ridership of subway stations in Manhattan.

On top of that, add the daily average car volume in Manhattan on the map.

Add the yearly average cycling volume.

01 / 05

Comparisons

Focus and Station & Line

These two identified hotspots coincide with areas that are within a 500-meter buffer zone around subway stations and also within a 100-meter proximity to subway lines. This overlay suggests a strong correlation between the density of commercial buildings and the accessibility to subway infrastructure, emphasizing the impact of transit proximity on urban commercial distribution.

Heatmap VS Kernel Density of Commercial Related Buildings 2023

When considering methods to compare areas based on density, the heatmap is the first tool that comes to mind due to its familiarity. Therefore, this analysis includes a comparison between heatmaps and kernel density maps. It appears that the kernel density map offers a clearer visualization of hotspot areas. The analysis reveals that Lower Manhattan exhibits the highest densities, with SOHO also showing a significant peak. Interestingly, contrary to expectations, Midtown does not display a similar peak in density. This unexpected finding will be explored in further detail in the subsequent discussion.

Point and Kernel Density of Pluto 2023

When conducting a density analysis of point data, point density is often the initial method considered. This analysis features a comparison between point density and kernel density, using data from PLUTO 2023. It is observed that kernel density provides a smoother representation of areas, particularly those connecting hotspots. This smoother visualization is attributed to the inherent properties of kernel density, which considers the adjacent area around each point, in contrast to point density that focuses solely on the points themselves. The differing underlying principles of these two methods result in kernel density offering a more effective and cohesive view of spatial data.

2018 VS 2023 Commercial Related Buildings

Comparing historical data with current features is a common and insightful approach. The oldest accessible version of the official NYC PLUTO data is from 2018. Consequently, it is feasible to compare the distribution of PLUTO buildings from five years ago with the current distribution of commercial buildings. This comparative analysis reveals that the spatial distribution of commercial buildings has maintained a similar pattern over the past five years. However, upon a detailed examination, it is apparent that the count of commercial buildings was lower five years ago compared to the number in 2023. This observation underscores the evolution and growth in the commercial sector over this period.

2018 VS 2023 Kernel Density of Commercial Related Buildings

To gain deeper insights into the spatial pattern, kernel density analysis is employed in this comparison. As observed in the point comparison, the kernel density map confirms that Lower Manhattan exhibits the highest commercial density compared to other areas. However, it also highlights an interesting phenomenon in relatively lower density areas, such as Midtown and SOHO, where commercial distributions are more dispersed. This suggests that commercial density in Lower Manhattan may be approaching saturation, prompting the expansion of new commercial buildings into emerging areas. This dynamic reflects the evolving landscape of commercial development in the city.

Statistics and Analysis

2018 VS 2023 Count of Commercial Related Buildings Bar Chart

These bar charts illustrate the counts of commercial-related buildings, with the category of mixed-use for commerce and residence being the most prevalent. Notably, as previously observed, Lower Manhattan leads in terms of commercial building counts, which prompts the consideration that its higher density is partially attributed to a greater number of mixed-use buildings that cater to both commercial and residential purposes. This leads to the formulation of an additional hypothesis: that when solely computing the cumulative area of commercial and retail spaces, Midtown is expected to exhibit a comparable total area to that of Lower Manhattan.

Midtown VS Lower Manhattan Analysis

A delineation for clarity: "Lower Manhattan" pertains to regions situated south of 14th street, while "Midtown" encompasses areas spanning between 14th street and 59th street for further precision in demarcation and subsequent explanation.

Two areas have been grouped based on their respective zip codes:

Midtown: Encompassing Midtown Chelsea, Flatiron District, Kips Bay, Murray Hill, Gramercy, Midtown East, Turtle Bay, Garment District, Hell's Kitchen, Midtown West, Rockefeller Center, Midtown Manhattan, Theater District, Penn Station, and Hudson Yards. The corresponding zip codes for Midtown are 10001, 10010, 10016, 10017, 10022, 10018, 10019, 10020, 10036, and 10119.
Lower Manhattan: Including the Financial District, Battery Park City, South Street Seaport, Tribeca, Civic Center, SoHo, Greenwich Village, East Village, West Village, Chelsea, and Chinatown. The corresponding zip codes for Lower Manhattan are 10004, 10005, 10006, 10009, 10007, 10012, 10013, 10014, 10002, 10003, 10011, 10038, 10280, and 10282.

PLUTO 2018 Manhattan Comparison

Here are the attribute tables of two regions from Pluto 2018.

Pluto 2018 Lower Manhattan Attribute Table

By creating a new field in attribute table called ShapeArea, which is the sum of ComArea and RetailArea, barchats and statistics could be seen below.

Sum of Area of Midtown and Lower Manhattan by Zip Codes

Statistics of Area of Midtown and Lower Manhattan by Zip Codes

The observation reveals an intriguing trend: despite Lower Manhattan boasting a higher concentration of commercial-related structures compared to Midtown, the cumulative commercial and retail spaces in Lower Manhattan remain less extensive than those in Midtown. This observation not only elucidates the absence of a peak in kernel density within Midtown but also serves to corroborate the earlier hypothesis.

Discussion

Issues

Choice of Breaks

The left is the kernel density divided by Natural Break while the right is Equal Intervals.

While the default kernel density in ArcGIS Pro uses Equal Intervals, there are instances where Equal Intervals may not be the most suitable choice. For example, as demonstrated in the comparison on the right, commercial kernel density is better displayed using the Natural Breaks classification method. When Equal Intervals are applied, some areas may have indistinct colors, making it difficult to differentiate between their densities. This issue can arise when certain areas exhibit extremely high density, while others do not.

Reclassify Issues

In the reclassification process, default reclassification classes are typically applied. However, determining optimal classes for reclassification can be challenging due to variations in scales and data characteristics. It becomes particularly challenging to decide on better classes and how to define them when dealing with different datasets.

One potential solution is to consider alternative reclassification methods proposed by other researchers. Such methods can offer a fresh perspective and may lead to more effective reclassification that enhances the clarity of kernel density maps. Improving the process of class definition and reclassification is an ongoing endeavor aimed at enhancing the visual representation and interpretability of spatial data.

Website Sharing Issues

During the process of sharing the map to a website, warnings related to the coordinate system have arisen. These warnings typically indicate a mismatch or inconsistency in the coordinate system settings between the map and the web platform. Resolving these warnings is essential to ensure accurate and seamless map display on the website. It involves aligning the coordinate system settings to ensure proper geospatial referencing and display.

Data Limitation

Due to data gaps and inherent disparities, three distinct datasets have been acquired, each representing different years and exhibiting variations in scale. These datasets include the daily average ridership at subway stations in Manhattan for the year 2022, the daily average traffic volume in Manhattan for 2019, and the yearly ridership for 2022. It is important to note that the assumption has been made that integrating such diverse data is a valid approach, despite the differences in scale and year.

Expectations

The intention was to utilize the Raster Calculator to create a demand-supply map. However, challenges arose due to the partial coverage of transportation data and imperfections in the reclassification process. Efforts are underway to enhance this aspect in future iterations, aiming for a more comprehensive and accurate demand-supply mapping.

Conclusion

Informed by the article titled "The impact of transportation on commercial activities: The stories of various transport routes in Changchun, China" by Ma et al. (2023), this project is focused on the analysis of commercial patterns in correlation with traffic information. Employing methods such as data cleaning using Python and Excel, as well as Geoprocessing in ArcGIS Pro, the findings indicate that commercial density exhibits concentration in Lower Manhattan and Midtown, with the former displaying notably higher density than the latter. In parallel, traffic density is primarily centered in Midtown, extending along major arteries towards Lower Manhattan and establishing a new traffic hotspot in the Financial District of Lower Manhattan. These findings lend support to the concept of a cyclical feedback interaction between economic activities and transportation.