Evaluating Public Transit Effectiveness with GIS

Applying the Local Index of Transit Availability (LITA) to analyze transit access across transit modes

Helena Lindsay

June 13, 2024

Introduction

Urban centers around the world face significant challenges in delivering efficient and accessible public transportation. The capstone project, ‘Assessing the Effectiveness of Public Transport Systems through Performance Analysis Using GIS,’ used the Local Index of Transit Availability (LITA) to evaluate the top 10 U.S. transit agencies with the highest service populations. The analysis focused on heavy rail and bus systems, examining each mode separately and comparatively, allowing for a thorough assessment of each agency's strengths and weaknesses.

The main objective was to apply a universal framework for assessing transit effectiveness to a variety of agencies and modes, and to provide actionable recommendations for improvement. By analyzing transit access and performance across various agencies and cities, the project aimed to reveal disparities in service quality and accessibility in diverse urban areas. The hypothesis was that there are significant differences in the effectiveness of transit agencies and noticeable variations in access across different modes of transportation. The findings will identify high-performing agencies and areas needing improvement, ultimately supporting the creation of more equitable and efficient public transit systems.

Selection of Agencies

Agencies that operate both Heavy Rail (HR) and Bus (MB) modes were prioritized, with a focus on those with significant service area populations to provide a representative sample of transit operations. This process resulted in the identification of 10 agencies, each offering both HR and MB services, totaling 20 data points. The R code below shows how the agencies were selected using the raw NTD data.

Agencies <- read.csv("/Users/helenalindsay/Documents/Spring_24/Capstone/Data/NTD_agencies.csv")

Agencies <- Agencies[order(Agencies$Service.Area.Population, decreasing = TRUE), ]
Agencies <- Agencies %>%
  filter(Type.Of.Service == 'DO') %>%
  filter(Time.Period == 'Annual Total') %>%
  group_by(Agency) %>%
  filter(all(c("HR", "MB") %in% Mode)) %>%
  ungroup()%>%
  slice_head(n = 20)
  select(-c(NTD.ID,UZA.Name, Agency.VOMS, Organization.Type, Mode.VOMS.Questionable, UACE.Code, Reporter.Type, Type.Of.Service,Primary.UZA.Area.Sq.Miles,  Time.Period, Time.Service.Begins, Time.Service.Ends))%>%
  select(where(~all(!is.na(.))))

GTFS Data

The project utilized GTFS data to gather transit service information from multiple agencies in a consistent format. The General Transit Feed Specification (GTFS) is an open standard employed by transit agencies to share transit data. It includes two components: GTFS Schedule, which details routes, schedules, fares, and geographic information in plain text files, and GTFS Realtime, which provides trip updates, vehicle locations, and service alerts through Protocol Buffers. GTFS is globally recognized, facilitating straightforward integration and use of transit data across various applications.

LITA Score

The Local Index of Transit Availability (LITA), developed by Rood (1998) , provides a comprehensive measure of transit service intensity or accessibility within a specific area. It integrates three essential aspects of transit service—route coverage, frequency, and capacity—to offer a detailed understanding of how effectively public transit meets the needs of the population.

For this capstone project, LITA was selected as the primary metric for measuring transit accessibility due to its effectiveness and simplicity, making it ideal for large-scale studies. LITA's robust yet straightforward framework allowed for seamless integration into an extensive analysis that examined two different transit modes—Heavy Rail (HR) and Bus (MB)—across ten cities. This choice ensured a comprehensive and comparative evaluation of transit services while maintaining methodological clarity and consistency.

The project started with a test analysis on the Metropolitan Atlanta Rapid Transit Authority's Bus (MARTA MB) services at the tract level. This initial analysis aimed to validate the process and evaluate its effectiveness by collecting and analyzing data on route coverage, frequency, and capacity to calculate the LITA index. The results are shown below.

After validating with MARTA MB, the project proceeded to apply the same analysis to Heavy Rail (HR) services to confirm the effectiveness of the LITA index across different transit modes. Subsequently, the scope expanded to include a comparative analysis across ten cities, evaluating both HR and MB services. In each city, the LITA index was calculated to assess transit accessibility, providing a comparative perspective on service intensity and availability.

MARTA Bus System (MB)

The analysis of the MARTA bus system revealed that transit accessibility is sparse for tracts at the periphery of Atlanta's boundary compared to those near the city center. It also showed that tracts with higher bus system accessibility tend to align with the locations where MARTA rail operates, highlighting the combined relationship between the two transit modes.

After validating the findings with MARTA MB, the project applied the same analysis to Heavy Rail (HR) services to confirm the effectiveness of the LITA index across different transit modes.

MARTA Heavy Rail System (HR)

As demonstrated in the MARTA heavy rail analysis above, heavy rail accessibility in Atlanta is notably limited. The service primarily covers tracts along a north-south and east-west axis, leaving the majority of tracts without HR access. This results in significant gaps in transit coverage, particularly in areas that fall outside these main corridors. The limited reach of heavy rail underscores the importance of mode-based analysis and the necessity for comprehensive transit planning to achieve broader and more equitable accessibility throughout the city.

Following the initial evaluation of MARTA systems (both HR and MB), the project extended its analysis by calculating LITA scores for the remaining cities. This expanded scope involved a comparative assessment across ten cities, examining both HR and MB services. The LITA index was computed in each city to evaluate transit accessibility, providing a comparative view of service intensity and availability. Below are selected examples illustrating bus and rail routes in some of the urban centers under analysis.

Example of Urban Centers

Below are some examples of urban centers that were analyzed using the LITA score.

Cleveland

Cleveland is characterized by a moderately extensive bus network that provides relatively widespread coverage throughout the city. However, its heavy rail system is sparse, providing limited coverage compared to the bus system. This disparity highlights how the LITA score can reveal differences in transit system intensity and accessibility, showing the strengths and limitations of various modes of transportation within a city.

Los Angeles

Los Angeles serves as a unique example where the primary rail service is light rail, with only 16 of the 107 rail stations being classified as heavy rail. This highlights a crucial aspect to consider when interpreting the findings from this project, which focuses on heavy rail (HR) and bus (MB) systems. In cities like Los Angeles, where light rail dominates the rail service, the analysis may not fully represent the overall accessibility and effectiveness of the entire transportation system.

Miami

Miami’s extensive bus network provides broad coverage across the city, serving many areas effectively. However, its heavy rail service is limited in scope, covering fewer locations. This contrast highlights why it’s crucial to conduct mode-specific analyses. Such analyses help reveal how different transportation modes contribute to the overall system and address their respective strengths and weaknesses.

Philadelphia

In contrast to some of the examples from above, Philadelphia's extensive heavy rail system covers much of the city. However, peripheral areas lack adequate service from both rail and bus networks. This highlights the need to address coverage gaps to improve overall transit accessibility.

Like Philadelphia, Washington, D.C. boasts a comprehensive transit network that includes both extensive bus and rail services. This well-rounded approach allows the city to effectively balance different transportation modes, providing broad coverage and connectivity. However, even with this balanced network, there may still be areas needing further improvement to ensure complete and equitable access across the city.

The remaining cities were analyzed using the same methodology, and the results are summarized in the scores presented below.

Results

The table below provides a detailed breakdown of the LITA score calculations, displaying the coverage, frequency, and capacity scores for each agency (city) and mode of transit. This table reveals that the rankings of agencies can vary significantly depending on the mode of transit being evaluated. For instance, a city with high scores in bus coverage might rank differently in its heavy rail performance. This discrepancy underscores the importance of evaluating each mode separately to gain a comprehensive understanding of transit effectiveness across different agencies.

Moreover, the table offers insight into the specific components contributing to the final LITA score, allowing for identification of areas where a particular transit system may be lacking. For example, Boston's heavy rail system has a coverage score of 3, a frequency score of 7, and a capacity score of 9. This indicates that while the system is strong in terms of frequency and capacity, there is a significant need to expand coverage. Addressing this weakness could significantly enhance the overall performance of Boston's heavy rail system.

Similarly, Cleveland's bus system shows a coverage score of 7, a frequency score of 1, and a capacity score of 6. These scores suggest that while the bus network covers a wide area and has adequate capacity, its frequency is notably poor. By focusing efforts on increasing the frequency of bus services, Cleveland could greatly improve the effectiveness of its bus transit system.

This detailed breakdown not only highlights the variability in transit system performance across different modes but also provides actionable insights for transit agencies to target specific areas for improvement. By addressing these weaknesses, agencies can work towards creating more balanced and effective transit systems that better serve their communities.

LITA Score Breakdown for Each Agency and Mode

In addition to evaluating each mode individually, the project also compared the performance of each mode across different agencies and calculated an overall ranking of the best and worst performing agencies. This comprehensive analysis considered both heavy rail and bus systems, revealing that Brooklyn and Chicago excelled in both modes. Their transit systems demonstrated high coverage, frequency, and capacity, contributing to their strong overall performance. In contrast, Baltimore had the lowest overall performance, indicating significant challenges in both its heavy rail and bus systems.

The analysis also highlighted instances of imbalanced performance across modes within the same city. For example, Washington DC scored a high 7 for its heavy rail services, reflecting strong performance in this mode. However, its bus services only scored 4, indicating room for improvement in coverage, frequency, or capacity in this mode. This discrepancy underscores the importance of considering multiple modes of transit when assessing overall system performance, as strengths in one mode may not compensate for weaknesses in another.

By examining both heavy rail and bus systems together, the project was able to identify cities that provide well-rounded transit options and those that may need targeted improvements in specific areas. This mode-specific analysis is crucial for understanding the unique strengths and limitations of each city’s transit network and for developing strategies to enhance overall transit equity and accessibility.

Conclusions

The capstone project, ‘Assessing the Effectiveness of Public Transport Systems through Performance Analysis Using GIS,’ used the Local Index of Transit Availability (LITA) to evaluate the top 10 U.S. transit agencies with the highest service populations. The analysis focused on heavy rail and bus systems, examining each mode separately and comparatively, allowing for a thorough assessment of each agency's strengths and weaknesses.

The comprehensive analysis using the LITA score emphasized the importance of evaluating each mode of transportation independently. By breaking down coverage, frequency, and capacity scores for heavy rail and bus systems, the project revealed significant variability in performance across different agencies and modes, providing valuable insights into each city's transit network.

The findings from this analysis offer actionable recommendations for transit agencies. For example, Boston's heavy rail system needs expanded coverage despite strong frequency and capacity, while Cleveland's bus system would benefit from increased service frequency. By addressing these weaknesses, agencies can enhance their systems' overall effectiveness and equity.

The project also ranked the overall performance of each mode across different agencies, with Brooklyn and Chicago excelling in both modes and Baltimore showing the lowest overall performance. Instances of imbalanced performance, such as Washington DC's high heavy rail score versus its lower bus score, underscore the need for mode-specific analysis.

In conclusion, this comprehensive analysis using the LITA score underscores the critical importance of a nuanced, multifaceted approach to transit planning and evaluation. By addressing identified weaknesses, transit agencies can improve the overall quality and equity of public transportation, ensuring all modes work together to provide comprehensive and efficient service to the community.

Scripts

Although most agencies provide static GTFS data for download, Washington DC only offers its GTFS data through an API. The project utilized specific code to automate parts of the GTFS processing step and retrieve Washington DC's GTFS data. Due to differences in the format of Washington DC's GTFS data, the standard "Calculate Transit Service Frequency" tool in ArcGIS Pro did not function as expected. Instead, a custom script was used to estimate the frequency for DC Metro based on available station information. Additionally, normalizing the LITA scores using Natural Breaks (Jenks) was accomplished through scripting with the Jenkspy library in Python.

Python Script for Processing GTFS Data

import arcpy
import os

# Function to convert GTFS shapes to features and calculate geodesic length
def process_gtfs_shapes(gtfs_folder, feature_dataset):
	try:
    	# Convert GTFS shapes to features
    	gtfs_shapes_file = os.path.join(gtfs_folder, 'shapes.txt')
    	city_name = os.path.basename(gtfs_folder)
    	out_feature_class_shapes = f'shapes_{city_name}'

    	# Check if the output feature class already exists, delete it if it does
    	if arcpy.Exists(out_feature_class_shapes):
        	arcpy.Delete_management(out_feature_class_shapes)

    	# Convert GTFS shapes to features
    	arcpy.transit.GTFSShapesToFeatures(in_gtfs_shapes_file=gtfs_shapes_file, out_feature_class=out_feature_class_shapes)

    	print(f'Conversion of GTFS shapes to features completed for {city_name}. Output: {out_feature_class_shapes}')

    	# Set the input feature class for geodesic length calculation (for shapes)
    	input_fc_shapes = out_feature_class_shapes

    	# Add a new field called LENGTH in miles (for shapes)
    	field_name_length_shapes = 'LENGTH'
    	arcpy.AddField_management(input_fc_shapes, field_name_length_shapes, 'DOUBLE')

    	# Calculate geodesic length in statute miles (for shapes)
    	expression_length_shapes = '!shape.geodesicLength@MILES!'
    	arcpy.CalculateField_management(input_fc_shapes, field_name_length_shapes, expression_length_shapes, 'PYTHON3')

    	print(f'Geodesic length calculation completed for {city_name}. Field added: {field_name_length_shapes}')

    	# Define projection for the feature class to NAD83 (EPSG:4269)
    	sr = arcpy.SpatialReference(4269)  # NAD83
    	arcpy.management.DefineProjection(out_feature_class_shapes, sr)

	except arcpy.ExecuteError:
    	print(arcpy.GetMessages(2))
	except Exception as e:
    	print(f'Failed to process GTFS shapes for {city_name} with error: {str(e)}')


# Function to convert GTFS stops to features
def process_gtfs_stops(gtfs_folder, feature_dataset):
	try:
    	# Convert GTFS stops to features
    	gtfs_stops_file = os.path.join(gtfs_folder, 'stops.txt')
    	city_name = os.path.basename(gtfs_folder)
    	out_feature_class_stops = f'stops_{city_name}'

    	# Check if the output feature class already exists, delete it if it does
    	if arcpy.Exists(out_feature_class_stops):
        	arcpy.Delete_management(out_feature_class_stops)

    	# Convert GTFS stops to features
    	arcpy.transit.GTFSStopsToFeatures(in_gtfs_stops_file=gtfs_stops_file, out_feature_class=out_feature_class_stops)

    	print(f'Conversion of GTFS stops to features completed for {city_name}. Output: {out_feature_class_stops}')

    	# Define projection for the feature class to NAD83 (EPSG:4269)
    	sr = arcpy.SpatialReference(4269)  # NAD83
    	arcpy.management.DefineProjection(out_feature_class_stops, sr)

	except arcpy.ExecuteError:
    	print(arcpy.GetMessages(2))
	except Exception as e:
    	print(f'Failed to process GTFS stops for {city_name} with error: {str(e)}')


# Function to run GTFSToPublicTransitDataModel
def run_gtfs_to_transit_data_model(gtfs_folder, target_feature_dataset):
	try:
    	arcpy.transit.GTFSToPublicTransitDataModel(
        	in_gtfs_folders=gtfs_folder,
        	target_feature_dataset=target_feature_dataset,
        	interpolate="INTERPOLATE",
        	append="NO_APPEND"
    	)
    	print(f'GTFSToPublicTransitDataModel completed for {os.path.basename(gtfs_folder)}.')
	except arcpy.ExecuteError:
    	# Print detailed error messages
    	print(arcpy.GetMessages(2))
	except Exception as e:
    	print(f'Failed to process {os.path.basename(gtfs_folder)} with error: {str(e)}')


# Main function to process GTFS data for each city
def process_gtfs_data(base_workspace, gtfs_directory):
	# Iterate over each city folder in the directory
	for city_folder in os.listdir(gtfs_directory):
    	gtfs_folder = os.path.join(gtfs_directory, city_folder)

    	# Check if it's a directory and contains necessary GTFS files (shapes.txt and stops.txt)
    	if os.path.isdir(gtfs_folder):
        	if 'shapes.txt' in os.listdir(gtfs_folder) and 'stops.txt' in os.listdir(gtfs_folder):
            	# Create a new geodatabase for the city if it doesn't exist
            	gdb_path = os.path.join(base_workspace, f'{city_folder}.gdb')
            	if not arcpy.Exists(gdb_path):
                	arcpy.CreateFileGDB_management(base_workspace, f'{city_folder}.gdb')
                	print(f'Created geodatabase: {gdb_path}')

            	# Set the workspace to the new geodatabase
            	arcpy.env.workspace = gdb_path

            	# Create a feature dataset for GTFS data if it doesn't exist
            	feature_dataset_name = 'GTFS'
            	feature_dataset_path = os.path.join(gdb_path, feature_dataset_name)
            	if not arcpy.Exists(feature_dataset_path):
                	# Create feature dataset with NAD83 (EPSG:4269) spatial reference
                	sr = arcpy.SpatialReference(4269)  # NAD83
                	arcpy.CreateFeatureDataset_management(gdb_path, feature_dataset_name, sr)
                	print(f'Created feature dataset: {feature_dataset_path}')

            	# Process GTFS shapes and stops for the city
            	process_gtfs_shapes(gtfs_folder, feature_dataset_path)
            	process_gtfs_stops(gtfs_folder, feature_dataset_path)

            	# Run GTFSToPublicTransitDataModel for the city
            	run_gtfs_to_transit_data_model(gtfs_folder, feature_dataset_path)

        	else:
            	print(f'Skipping {city_folder} - Missing shapes.txt or stops.txt.')
    	else:
        	print(f'Skipping {city_folder} - Not a valid directory.')

	print('GTFS data processing completed.')

# Execute the main processing function
if __name__ == "__main__":
	# Set environment settings
	base_workspace = r'M:\Documents\ArcGIS\Projects\ALL'  # Base workspace where all geodatabases will reside

	# Path to the directory containing GTFS folders for each city
	gtfs_directory = r'M:\Documents\ArcGIS\Projects\ALL\GTFS'

	# Process GTFS data
	process_gtfs_data(base_workspace, gtfs_directory)

Retrieving GTFS Data for Washington DC

import requests
import zipfile
import io

#Transitland API key
transitland_api_key = ""

# Feed keys for WMATA rail and bus GTFS
feed_keys = {
    "rail": "f-dqc-wmata~rail",
    "bus": "f-dqc-wmata~bus"
}

# Base URL for downloading the latest feed version
base_url = "https://transit.land/api/v2/rest/feeds/{feed_key}/download_latest_feed_version"

# Function to download and extract GTFS feed
def download_and_extract_gtfs(mode, feed_key):
    url = base_url.format(feed_key=feed_key)
    headers = {"apikey": transitland_api_key}

    # Make the request
    response = requests.get(url, headers=headers)

    # Check if the request was successful
    if response.status_code == 200:
        # Extract the contents of the ZIP file
        z = zipfile.ZipFile(io.BytesIO(response.content))
        extract_path = f"path_to_extract_folder/{mode}"  # Replace with your desired extraction path
        z.extractall(extract_path)
        print(f"{mode.capitalize()} GTFS feed downloaded and extracted successfully to {extract_path}.")
    else:
        print(f"Failed to download {mode} GTFS feed. Status code: {response.status_code}")
        try:
            error_message = response.json().get('error')
            print(f"Error message: {error_message}")
        except Exception as e:
            print(f"Failed to parse error response: {str(e)}")

# Download each GTFS feed
for mode, feed_key in feed_keys.items():
    download_and_extract_gtfs(mode, feed_key)
Rail GTFS feed downloaded and extracted successfully to path_to_extract_folder/rail.
Bus GTFS feed downloaded and extracted successfully to path_to_extract_folder/bus.

Calculating Transit Service Frequency for DC Metro

Station Information from: https://www.wmata.com/rider-guide/stations/index.cfm

# Station data with lines serviced
station_data = {
	'Addison Road-Seat Pleasant': ['Silver Line', 'Blue Line'],
	'Anacostia': ['Green Line'],
	'Archives-Navy Memorial-Penn Quarter': ['Green Line', 'Yellow Line'],
	'Arlington Cemetery': ['Blue Line'],
	'Ashburn': ['Silver Line'],
	'Ballston-MU': ['Orange Line', 'Silver Line'],
	'Benning Road': ['Silver Line', 'Blue Line'],
	'Bethesda': ['Red Line'],
	'Braddock Road': ['Blue Line', 'Yellow Line'],
	'Branch Ave': ['Green Line'],
	'Brookland-CUA': ['Red Line'],
	'Capitol Heights': ['Silver Line', 'Blue Line'],
	'Capitol South': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Cheverly': ['Orange Line'],
	'Clarendon': ['Orange Line', 'Silver Line'],
	'Cleveland Park': ['Red Line'],
	'College Park-U of Md': ['Green Line'],
	'Columbia Heights': ['Green Line'],
	'Congress Heights': ['Green Line'],
	'Court House': ['Orange Line', 'Silver Line'],
	'Crystal City': ['Blue Line', 'Yellow Line'],
	'Deanwood': ['Orange Line'],
	'Downtown Largo': ['Silver Line', 'Blue Line'],
	'Dunn Loring-Merrifield': ['Orange Line'],
	'Dupont Circle': ['Red Line'],
	'East Falls Church': ['Orange Line', 'Silver Line'],
	'Eastern Market': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Eisenhower Avenue': ['Yellow Line'],
	'Farragut North': ['Red Line'],
	'Farragut West': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Federal Center SW': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Federal Triangle': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Foggy Bottom-GWU': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Forest Glen': ['Red Line'],
	'Fort Totten': ['Green Line', 'Red Line'],
	'Franconia-Springfield': ['Blue Line'],
	'Friendship Heights': ['Red Line'],
	'Gallery Pl-Chinatown': ['Green Line', 'Yellow Line', 'Red Line'],
	'Georgia Ave-Petworth': ['Green Line'],
	'Glenmont': ['Red Line'],
	'Greenbelt': ['Green Line'],
	'Greensboro': ['Silver Line'],
	'Grosvenor-Strathmore': ['Red Line'],
	'Herndon': ['Silver Line'],
	'Huntington': ['Yellow Line'],
	'Hyattsville Crossing': ['Green Line'],
	'Innovation Center': ['Silver Line'],
	'Judiciary Square': ['Red Line'],
	'King St-Old Town': ['Blue Line', 'Yellow Line'],
	'L\'Enfant Plaza': ['Orange Line', 'Silver Line', 'Blue Line', 'Yellow Line', 'Green Line'],
	'Landover': ['Orange Line'],
	'Loudoun Gateway': ['Silver Line'],
	'McLean': ['Silver Line'],
	'McPherson Square': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Medical Center': ['Red Line'],
	'Metro Center': ['Red Line', 'Orange Line', 'Silver Line', 'Blue Line'],
	'Minnesota Ave': ['Orange Line'],
	'Morgan Boulevard': ['Silver Line', 'Blue Line'],
	'Mt Vernon Sq 7th St-Convention Center': ['Green Line', 'Yellow Line'],
	'Navy Yard-Ballpark': ['Green Line'],
	'Naylor Road': ['Green Line'],
	'New Carrollton': ['Orange Line'],
	'NoMa-Gallaudet U': ['Red Line'],
	'North Bethesda': ['Red Line'],
	'Pentagon': ['Blue Line', 'Yellow Line'],
	'Pentagon City': ['Blue Line', 'Yellow Line'],
	'Potomac Ave': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Potomac Yard': ['Blue Line', 'Yellow Line'],
	'Reston Town Center': ['Silver Line'],
	'Rhode Island Ave-Brentwood': ['Red Line'],
	'Rockville': ['Red Line'],
	'Ronald Reagan Washington National Airport': ['Blue Line', 'Yellow Line'],
	'Rosslyn': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Shady Grove': ['Red Line'],
	'Shaw-Howard U': ['Green Line'],
	'Silver Spring': ['Red Line'],
	'Smithsonian': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Southern Avenue': ['Green Line'],
	'Spring Hill': ['Silver Line'],
	'Stadium-Armory': ['Orange Line', 'Silver Line', 'Blue Line'],
	'Suitland': ['Green Line'],
	'Takoma': ['Red Line'],
	'Tenleytown-AU': ['Red Line'],
	'Twinbrook': ['Red Line'],
	'Tysons': ['Silver Line'],
	'U Street/African-Amer Civil War Memorial/Cardozo': ['Green Line'],
	'Union Station': ['Red Line'],
	'Van Dorn Street': ['Blue Line'],
	'Van Ness-UDC': ['Red Line'],
	'Vienna/Fairfax-GMU': ['Orange Line'],
	'Virginia Square-GMU': ['Orange Line', 'Silver Line'],
	'Washington Dulles International Airport': ['Silver Line'],
	'Waterfront': ['Green Line'],
	'West Falls Church': ['Orange Line'],
	'West Hyattsville': ['Green Line'],
	'Wheaton': ['Red Line'],
	'Wiehle-Reston East': ['Silver Line'],
	'Woodley Park-Zoo/Adams Morgan': ['Red Line']
}

# Dictionary to count lines per station
lines_per_station = {}

# Count lines per station
for station, lines in station_data.items():
	lines_per_station[station] = len(lines)

# Calculate average number of lines per station
total_stations = len(station_data)
total_lines = sum(lines_per_station.values())
average_lines_per_station = total_lines / total_stations

# Print results
print(f"Average number of lines per station: {average_lines_per_station:.2f}")


—-------------

# Average lines per station and total stations
average_lines_per_station = 1.52
total_stations = 91

# Define operational hours for Tuesday in minutes
operating_hours_tuesday = {
	'Tuesday': 19 * 60  # 5:00 AM to 12:00 AM
}

# Define peak service frequencies in minutes
peak_frequencies = {
	'Red Line': 8,
	'Orange Line': 12,
	'Silver Line': 12,
	'Blue Line': 12,
	'Yellow Line': 12,
	'Green Line': 8
}

peak_frequency_minutes = sum(peak_frequencies.values()) / len(peak_frequencies)

# Print the average peak frequency
print(f"Average peak frequency: {peak_frequency_minutes:.2f} minutes")


# Calculate total trains per day for all stations combined
trains_per_hour = 60 // peak_frequency_minutes
total_trains_per_day = trains_per_hour * 19  # 19 hours operating on tuesday

# Calculate total lines
total_lines = average_lines_per_station * total_stations

# Calculate total trains based on lines and frequency
total_trains_combined = total_lines * total_trains_per_day

# Print the result
print(f"Total trains coming through all DC stations combined per day: {total_trains_combined:.2f}")

Normalizing the Scores Using Natural Breaks (Jenks)

import jenkspy

# Given LITA values
lita_values = [
	8.5, 7.5, 11.17, 14.17, 15.17,
	10.17, 9.83, 12.17, 11.17, 10.17
]

# Step 1: Sort data (already sorted for demonstration)
sorted_lita = sorted(lita_values)

# Step 2: Determine number of classes
n_classes = min(len(np.unique(sorted_lita)), 10)  # Limit to 10 classes maximum

# Step 3: Calculate Jenks natural breaks
breaks = jenkspy.jenks_breaks(sorted_lita, n_classes)

# Step 4: Assign classes based on breaks
def assign_class(value, breaks):
	for i in range(1, len(breaks)):
    	if value <= breaks[i]:
        	return i
	return len(breaks) - 1

# Assign classes to each LITA value
classes = [assign_class(value, breaks) for value in lita_values]

# Print results
print("LITA Values\tClass")
for value, cls in zip(lita_values, classes):
	print(f"{value}\t\t{cls}")