import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
A Report by …
@Author: Kashish Mukheja
Introduction
In this report, we analyze weather data from three major airports in the New York City area (LGA, JFK, and EWR) to determine monthly mean wind speeds and identify any outliers.
This report presents an analysis of monthly mean wind speeds at LGA, JFK, and EWR airports using historical weather data. The data is fetched from nycflights13_weather.csv
file present publically on the internet. 1. We start with importing the data and adjusting the columns to the proper format in their SI unit. 2. After preprocessing the dataset and addressing outliers, we calculate the monthly mean wind speeds for each airport. 3. Subsequently, we visualize the monthly mean wind speeds for all three airports on a single plot, providing insights into wind speed variations over time.
The analysis aims to provide valuable information for stakeholders involved in aviation, urban planning, and renewable energy sectors.
Data Preparation
To begin, let’s import the necessary libraries, and load the data from the provided CSV file and examine its structure
# Load the data
= pd.read_csv('nycflights13_weather.csv')
df_weather
# Display the first few rows of the DataFrame
df_weather.head()
origin | year | month | day | hour | temp | dewp | humid | wind_dir | wind_speed | wind_gust | precip | pressure | visib | time_hour | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EWR | 2013 | 1 | 1 | 0 | 37.04 | 21.92 | 53.97 | 230.0 | 10.35702 | 11.918651 | 0.0 | 1013.9 | 10.0 | 2013-01-01 1:00:00 |
1 | EWR | 2013 | 1 | 1 | 1 | 37.04 | 21.92 | 53.97 | 230.0 | 13.80936 | 15.891535 | 0.0 | 1013.0 | 10.0 | 2013-01-01 2:00:00 |
2 | EWR | 2013 | 1 | 1 | 2 | 37.94 | 21.92 | 52.09 | 230.0 | 12.65858 | 14.567241 | 0.0 | 1012.6 | 10.0 | 2013-01-01 3:00:00 |
3 | EWR | 2013 | 1 | 1 | 3 | 37.94 | 23.00 | 54.51 | 230.0 | 13.80936 | 15.891535 | 0.0 | 1012.7 | 10.0 | 2013-01-01 4:00:00 |
4 | EWR | 2013 | 1 | 1 | 4 | 37.94 | 24.08 | 57.04 | 240.0 | 14.96014 | 17.215830 | 0.0 | 1012.8 | 10.0 | 2013-01-01 5:00:00 |
Data Conversion to Metric Units
This section converts temperature, precipitation, visibility, and wind speed measurements from imperial to metric (SI) units. This is crucial to maintain uniformity and consistency at a global level for better interpretability. We convert the following columns:
temp
anddewp
to Celsiusprecip
to millimetresvisib
to metreswind_speed
andwind_gust
to metres per second
Note: We will replace all data in-place i.e., overwrite existing columns with new ones
Let’s start with converting the temperature from Fahrenheit to Celsius:
# Convert temperature from Fahrenheit to Celsius
'temp'] = (df_weather['temp'] - 32) * 5/9
df_weather['dewp'] = (df_weather['dewp'] - 32) * 5/9 df_weather[
Next, we’ll convert precipitation from inches to millimeters:
# Convert precipitation from inches to millimeters
'precip'] = df_weather['precip'] * 25.4 df_weather[
Then, we’ll convert visibility from miles to meters:
# Convert visibility from miles to meters
'visib'] = df_weather['visib'] * 1609.34 df_weather[
Now, let’s convert wind speed and gust from miles per hour to meters per second:
# Convert wind speed and gust from miles per hour to meters per second
'wind_speed'] = df_weather['wind_speed'] * 0.44704
df_weather['wind_gust'] = df_weather['wind_gust'] * 0.44704 df_weather[
Finally, let’s correct the time_hour column by shifting it back by one hour:
# Correct the time_hour column by shifting it back by one hour
'time_hour'] = pd.to_datetime(df_weather['time_hour']) - pd.Timedelta(hours=1) df_weather[
10) df_weather.head(
origin | year | month | day | hour | temp | dewp | humid | wind_dir | wind_speed | wind_gust | precip | pressure | visib | time_hour | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | EWR | 2013 | 1 | 1 | 0 | 2.8 | -5.6 | 53.97 | 230.0 | 4.630002 | 5.328114 | 0.0 | 1013.9 | 16093.4 | 2013-01-01 00:00:00 |
1 | EWR | 2013 | 1 | 1 | 1 | 2.8 | -5.6 | 53.97 | 230.0 | 6.173336 | 7.104152 | 0.0 | 1013.0 | 16093.4 | 2013-01-01 01:00:00 |
2 | EWR | 2013 | 1 | 1 | 2 | 3.3 | -5.6 | 52.09 | 230.0 | 5.658892 | 6.512139 | 0.0 | 1012.6 | 16093.4 | 2013-01-01 02:00:00 |
3 | EWR | 2013 | 1 | 1 | 3 | 3.3 | -5.0 | 54.51 | 230.0 | 6.173336 | 7.104152 | 0.0 | 1012.7 | 16093.4 | 2013-01-01 03:00:00 |
4 | EWR | 2013 | 1 | 1 | 4 | 3.3 | -4.4 | 57.04 | 240.0 | 6.687781 | 7.696165 | 0.0 | 1012.8 | 16093.4 | 2013-01-01 04:00:00 |
5 | EWR | 2013 | 1 | 1 | 6 | 3.9 | -3.3 | 59.37 | 270.0 | 4.630002 | 5.328114 | 0.0 | 1012.0 | 16093.4 | 2013-01-01 06:00:00 |
6 | EWR | 2013 | 1 | 1 | 7 | 3.9 | -2.8 | 61.63 | 250.0 | 3.601113 | 4.144089 | 0.0 | 1012.3 | 16093.4 | 2013-01-01 07:00:00 |
7 | EWR | 2013 | 1 | 1 | 8 | 3.9 | -2.2 | 64.43 | 240.0 | 5.144447 | 5.920127 | 0.0 | 1012.5 | 16093.4 | 2013-01-01 08:00:00 |
8 | EWR | 2013 | 1 | 1 | 9 | 4.4 | -2.2 | 62.21 | 250.0 | 5.658892 | 6.512139 | 0.0 | 1012.2 | 16093.4 | 2013-01-01 09:00:00 |
9 | EWR | 2013 | 1 | 1 | 10 | 3.9 | -2.2 | 64.43 | 260.0 | 5.658892 | 6.512139 | 0.0 | 1011.9 | 16093.4 | 2013-01-01 10:00:00 |
Daily Mean Wind Speed Calculation for LGA Airport
We now move further in our journey of exploring this data by calculating the average wind speed for each day at the LGA airport, based on the provided weather data. We will group the data by year, month and day.
# Filter data for LGA airport
= df_weather[df_weather['origin'] == 'LGA']
df_lga
# Convert time_hour column to datetime using .loc
'time_hour'] = pd.to_datetime(df_lga['time_hour'])
df_lga.loc[:,
# Group data by year, month, and day and calculate daily mean wind speed
= df_lga.groupby([df_lga['time_hour'].dt.year,
daily_mean_wind_speed 'time_hour'].dt.month,
df_lga['time_hour'].dt.day])['wind_speed'].mean()
df_lga[
# Display the daily mean wind speeds
print(daily_mean_wind_speed)
time_hour time_hour time_hour
2013 1 1 6.687781
2 6.430559
3 4.908660
4 6.880698
5 5.144447
...
12 26 3.301020
27 5.401669
28 4.672873
29 3.794030
30 6.001855
Name: wind_speed, Length: 364, dtype: float64
Visualization of Daily Mean Wind Speed at LGA Airport
This section presents a plot illustrating the daily mean wind speeds recorded at the LGA airport over time. The x-axis displays dates in a human-readable format for better interpretation. We use the daily_mean_wind_speed
computed in the previous section to write the code for visualisation.
# Convert the index to datetime
= pd.to_datetime(daily_mean_wind_speed.index.map(lambda x: '-'.join(map(str, x))))
daily_mean_wind_speed.index
# Create a plot
=(10, 6))
plt.figure(figsize='blue', marker='o', linestyle='-')
plt.plot(daily_mean_wind_speed.index, daily_mean_wind_speed.values, color'Daily Mean Wind Speed at LGA Airport')
plt.title('Date')
plt.xlabel('Wind Speed (m/s)')
plt.ylabel(True)
plt.grid(=45)
plt.xticks(rotation
plt.tight_layout() plt.show()
Top Ten Windiest Days at LGA Airport
This section creates a DataFrame containing the dates and corresponding wind speeds for the top ten windiest days at LGA airport, based on the provided weather data.
To identify the ten windiest days at LGA airport, we’ll first calculate the daily mean wind speeds for all days. Then, we’ll sort the data to find the ten days with the highest mean wind speeds.
# Calculate daily mean wind speeds for LGA airport
= df_lga.groupby([df_lga['time_hour'].dt.date])['wind_speed'].mean()
daily_mean_wind_speed
# Sort the data to identify the ten windiest days
= daily_mean_wind_speed.nlargest(10) top_ten_windiest_days
Let’s create a DataFrame with the date and corresponding wind speeds for the top ten windiest days
# Create a DataFrame with date and wind speed
= pd.DataFrame({'date': top_ten_windiest_days.index, 'wind_speed': top_ten_windiest_days.values})
top_ten_df
# Print the DataFrame
print(top_ten_df)
date wind_speed
0 2013-11-24 11.317783
1 2013-01-31 10.717598
2 2013-02-17 10.010236
3 2013-02-21 9.192903
4 2013-02-18 9.174264
5 2013-03-14 9.109958
6 2013-11-28 8.938477
7 2013-05-26 8.852736
8 2013-05-25 8.766995
9 2013-02-20 8.659819
Monthly Mean Wind Speeds for All Three Airports
This section calculates and presents the monthly mean wind speeds for all three airports (LGA, JFK, and EWR), after addressing outliers in the data. The results are displayed in a DataFrame for easy interpretation and analysis.
To compute the monthly mean wind speeds for all three airports and address the outlier in the data, the following steps are followed:
- Filter the data for all three airports: LGA, JFK, and EWR.
- Replace the outlier with
np.nan
(NaN). - Calculate the monthly mean wind speeds for each airport.
# Filter data for all three airports
= df_weather[(df_weather['origin'] == 'LGA') | (df_weather['origin'] == 'JFK') | (df_weather['origin'] == 'EWR')]
df_all_airports
# Identify and replace the outlier with np.nan (NaN)
= df_all_airports['wind_speed'].idxmax() # Find the index of the outlier
outlier_index 'wind_speed'] = np.nan # Replace the outlier with NaN
df_all_airports.loc[outlier_index,
# Calculate monthly mean wind speeds for all three airports
= df_all_airports.groupby([df_all_airports['time_hour'].dt.year,
monthly_mean_wind_speeds 'time_hour'].dt.month,
df_all_airports['origin']])['wind_speed'].mean()
df_all_airports[
# Display the result
print(monthly_mean_wind_speeds)
time_hour time_hour origin
2013 1 EWR 4.328121
JFK 5.378474
LGA 5.068977
2 EWR 4.731659
JFK 5.954831
LGA 5.522218
3 EWR 5.144447
JFK 6.209389
LGA 5.848862
4 EWR 4.265604
JFK 5.584480
LGA 4.931524
5 EWR 3.709672
JFK 4.600270
LGA 4.197841
6 EWR 4.202727
JFK 4.884367
LGA 4.462808
7 EWR 4.022527
JFK 4.498271
LGA 4.181334
8 EWR 3.349452
JFK 4.305163
LGA 3.734075
9 EWR 3.574639
JFK 4.359919
LGA 3.934073
10 EWR 3.650740
JFK 4.615364
LGA 4.568659
11 EWR 4.599020
JFK 5.752821
LGA 5.416842
12 EWR 3.911640
JFK 4.861502
LGA 4.526399
Name: wind_speed, dtype: float64
Let’s create a DataFrame with the year, month, airport and the corresponding monthly wind speeds. This will help to view the data in a more readable format.
# Create a DataFrame for monthly mean wind speeds
= monthly_mean_wind_speeds.to_frame(name='monthly_mean_wind_speed')
monthly_mean_wind_speeds_df
# Rename index levels for better readability
= ['year', 'month', 'airport']
monthly_mean_wind_speeds_df.index.names
# Display the DataFrame
monthly_mean_wind_speeds_df
monthly_mean_wind_speed | |||
---|---|---|---|
year | month | airport | |
2013 | 1 | EWR | 4.328121 |
JFK | 5.378474 | ||
LGA | 5.068977 | ||
2 | EWR | 4.731659 | |
JFK | 5.954831 | ||
LGA | 5.522218 | ||
3 | EWR | 5.144447 | |
JFK | 6.209389 | ||
LGA | 5.848862 | ||
4 | EWR | 4.265604 | |
JFK | 5.584480 | ||
LGA | 4.931524 | ||
5 | EWR | 3.709672 | |
JFK | 4.600270 | ||
LGA | 4.197841 | ||
6 | EWR | 4.202727 | |
JFK | 4.884367 | ||
LGA | 4.462808 | ||
7 | EWR | 4.022527 | |
JFK | 4.498271 | ||
LGA | 4.181334 | ||
8 | EWR | 3.349452 | |
JFK | 4.305163 | ||
LGA | 3.734075 | ||
9 | EWR | 3.574639 | |
JFK | 4.359919 | ||
LGA | 3.934073 | ||
10 | EWR | 3.650740 | |
JFK | 4.615364 | ||
LGA | 4.568659 | ||
11 | EWR | 4.599020 | |
JFK | 5.752821 | ||
LGA | 5.416842 | ||
12 | EWR | 3.911640 | |
JFK | 4.861502 | ||
LGA | 4.526399 |
EDA - Monthly Mean Wind Speeds Comparison
This section compares the monthly mean wind speeds for LGA, JFK, and EWR airports, plotting the data on the same graph. The x-axis
represents the date, and the y-axis
represents the mean wind speed in meters per second. Each airport is differentiated by a distinct color, with a legend provided for clarity.
# Reset index to use year and month as separate columns
=True)
monthly_mean_wind_speeds_df.reset_index(inplace
# Create a plot
=(10, 6))
plt.figure(figsizefor airport in monthly_mean_wind_speeds_df['airport'].unique():
= monthly_mean_wind_speeds_df[monthly_mean_wind_speeds_df['airport'] == airport]
data 'year'].astype(str) + '-' + data['month'].astype(str), data['monthly_mean_wind_speed'], label=airport)
plt.plot(data[
# Add labels and legend
'Monthly Mean Wind Speeds for LGA, JFK, and EWR Airports')
plt.title('Date')
plt.xlabel('Mean Wind Speed (m/s)')
plt.ylabel(
plt.legend()False)
plt.grid(=45)
plt.xticks(rotation
plt.tight_layout() plt.show()
Conclusion
In conclusion, our analysis reveals distinct patterns in monthly mean wind speeds at LGA, JFK, and EWR airports. While all three airports experience fluctuations in wind speeds throughout the year, certain months exhibit higher wind speeds compared to others.
By identifying and visualizing these trends, stakeholders can better understand the wind climate at each airport and make informed decisions regarding infrastructure development, flight operations, and energy generation projects. Additionally, the identification and handling of outliers ensure the accuracy and reliability of the analysis results.
Moving forward, further research could explore the impact of wind speed variations on specific sectors and evaluate potential mitigation strategies for adverse