NYC bike share: Tourist versus native (round 2)

Investigating New York’s Citi Bike system data with SQL, Python, and QGIS.


One in a series of projects highlighting my progress as a self-taught programmer.

I began teaching myself Python, SQL, and GIS in spring 2023 — starting from zero. I therefore welcome feedback on these projects and review for errors. And I’d be interested in taking a crack at your own data and geospatial questions too. Please get in touch.


3 September 2023

Background

My previous post used 2022 New York City Citi Bike data to map and compare where bike share system members and non-members most frequently start their rides. (The visualization revealed a non-member preference for picking up bikes near Central Park.) In this post, I’m wondering about popular routes—and whether members (natives) or non-members (tourists) traverse them faster.

Project outputs

With some 1,800 bike share stations across New York, riders could theoretically cycle over 3 million distinct “routes” (i.e. start- and end-station combinations). I ran a several SQL queries on my dataset (which, as explained in the previous post, comprised all 29.6 million rides taken in 2022 between 60 seconds and 2 hours in length), focusing on non-loop trips and determining the following:

  • Citi Bike membership holders and non-members rode 529,019 routes in common.

  • If we compare the top 100 routes most frequently cycled by members with those of non-members, only 7 routes are in common, and since a couple of those are mere reversals of the start and end stations, a QGIS map of the very most popular common start- and end-point combinations counts only 5 routes:

  • West St & Chambers St to/from Pier 40-Hudson River Park

  • Motorgate to/from Roosevelt Island Tramway

  • West St & Liberty St to West St & Chambers St

  • Little West St & 1 Pl to West St & Liberty St

  • Milton St & Franklin St to Kent Ave & N 7 St

The riverside location of these routes suggests to me the thousands of bike share system users who rode them generally had leisure rather than commute in mind—regardless of whether they were members of non-members. (The segments are also about 3/4 of a mile or less apart, quite walkable, and therefore possibly also suggesting dally over destination.)

Still, using SQL and Python to pull data and run statistics, I used the pandas, Seaborn, and Matplotlib Python libraries to create density plots of trip time lengths over these routes for both members and non-members. Who rides faster? Natives. Every time. (Statistically speaking, on average, that is.)

With the distribution of member trip length times clustering higher and to the left of non-member times, it appears that natives, on the most popular common routes, typically ride their Citi Bike bikes faster… or simply for shorter periods of time (despite having a longer window of time before having to pay additional per-minute rental fees).

When I asked ChatGPT what it thought about these results, the large language model noted the “consistency” and “purpose” exhibited by natives. Regarding the West St & Chambers St —> Pier 40-Hudson River Park route, it said:

Techniques

As mentioned above, most of my work on this project involved SQL queries and sub-queries, including COUNT, AVG, MIN, MAX, GROUP BY, and UNION functions and clauses.

In addition to the initial dataset preparations mentioned in my previous post, I also used Python and the pandas, Seaborn, and Matplotlib libraries in a script to make calculations and create and modify the density plots.

I used QGIS for the map, filtering spatial layers and employing a plugin to communicate with the Open Route Service API for bike route directions.

Data sources

I downloaded and processed trip data from the Citi Bike website and found various New York City shapefiles for my map at NYC Open Data.

Previous
Previous

Main Street. Will people bike there?

Next
Next

NYC bike share: Tourist versus native (round 1)