Determine neighbourhood to open new restaurant using clustering
Objective: Determine the neighbourhood to open a new restaurant in order to expand business.
We want to open a new restaurant in New York similar to one we have in San Francisco. Firstly, we need to shortlist a few places where we can open up our new restaurant. We’ll perform K-Means Clustering in order to determine the place closest to our current location in terms of nearby venues. We’ll be using FourSquare API data and some web scraping to get details on the list of neighbourhoods in New York City.
The idea behind this analysis can be extended to many other items like opening a new office, play center, buying a house etc.
#importing libraries to be used
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#view plots in jupyter notebook
%matplotlib inline
sns.set_style('whitegrid') #setting style for plots, optional
#Libraries for Gepgraphical identification of location
from geopy.geocoders import Nominatim
# Library for using Kmeans method for clustering
from sklearn.cluster import KMeans
# Libraries to handle requests
import requests
from pandas.io.json import json_normalize
# Libraries to plot and visualize locations on maps and also plotting other kmeans related data
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
# Liraries to import data from website - Web Scraping
import seaborn as sns
from bs4 import BeautifulSoup as BS
Putting in details of current restaurant location in San Francisco
SF_restaurant = "Octavia St, San Francisco, CA 94102, United States"
# Getting the Lat-Lon of the office
geolocator = Nominatim(user_agent="USA_explorer")
SF_res_location = geolocator.geocode(SF_restaurant)
SF_latitude = SF_res_location.latitude
SF_longitude = SF_res_location.longitude
print('The geograpical coordinate are {}, {}.'.format(SF_latitude, SF_longitude))
The geograpical coordinate are 37.7780777, -122.424924.
Populating New York City Neighbourhood information from: https://www.baruch.cuny.edu/nycdata/population-geography/neighborhoods.htm
URL = "https://www.baruch.cuny.edu/nycdata/population-geography/neighborhoods.htm"
r = requests.get(URL, verify = False)
soup = BS(r.text, "html.parser")
data = soup.find_all("tr")
start_index = 0
for i in range (len(data)):
td = data[i].find_all("td")
for j in range (len(td)):
if td[j].text == "Brooklyn":
start_index = i
break
if start_index != 0:
break
end_index = 0
for i in range (len(data)-1,0,-1):
td = data[i].find_all("td")
for j in range (len(td)):
if td[j].text == "Woodside":
end_index = i
break
if end_index != 0:
break
list1 = []
list2 = []
list3 = []
list4 = []
list5 = []
for i in range (start_index,end_index+1):
td = data[i].find_all("td")
list1.append(td[1].text)
list2.append(td[2].text)
list3.append(td[3].text)
list4.append(td[4].text)
list5.append(td[5].text)
final = []
final.append(list1)
final.append(list2)
final.append(list3)
final.append(list4)
final.append(list5)
df = pd.DataFrame(final)
df = df.transpose()
final_df = pd.DataFrame(columns=['Borough','Neighbourhood'])
for i in range (5):
d = {}
d = {'Borough':df[i][0]}
for j in range (1,len(df)):
if df[i][j]=='\xa0':
break
else:
d['Neighbourhood'] = df[i][j]
final_df = final_df.append(d,ignore_index=True)
final_df
Borough | Neighbourhood | |
---|---|---|
0 | Brooklyn | Bath Beach |
1 | Brooklyn | Bay Ridge |
2 | Brooklyn | Bedford Stuyvesant |
3 | Brooklyn | Bensonhurst |
4 | Brooklyn | Bergen Beach |
... | ... | ... |
324 | Staten Island | Ward Hill |
325 | Staten Island | West Brighton |
326 | Staten Island | Westerleigh |
327 | Staten Island | Willowbrook |
328 | Staten Island | Woodrow |
329 rows × 2 columns
Adding Lattitude and Longitude information for each Neighbourhoods
final_df['Latitude']=""
final_df['Longitude']=""
for i in range(len(final_df)):
nyadd=str(final_df['Neighbourhood'][i])+', '+str(final_df['Borough'][i])+', New York'
geolocator = Nominatim(user_agent="USA_explorer")
location = geolocator.geocode(nyadd)
try:
latitude = location.latitude
longitude = location.longitude
except:
latitude=1000 # For those neighbourhoods whose latitude and longitude could not be fetched
longitude=1000
final_df['Latitude'][i]=latitude
final_df['Longitude'][i]=longitude
final_df
Borough | Neighbourhood | Latitude | Longitude | |
---|---|---|---|---|
0 | Brooklyn | Bath Beach | 40.6018 | -74.0005 |
1 | Brooklyn | Bay Ridge | 40.634 | -74.0146 |
2 | Brooklyn | Bedford Stuyvesant | 40.6834 | -73.9412 |
3 | Brooklyn | Bensonhurst | 40.605 | -73.9934 |
4 | Brooklyn | Bergen Beach | 40.6204 | -73.9068 |
... | ... | ... | ... | ... |
324 | Staten Island | Ward Hill | 40.6329 | -74.0829 |
325 | Staten Island | West Brighton | 1000 | 1000 |
326 | Staten Island | Westerleigh | 40.6212 | -74.1318 |
327 | Staten Island | Willowbrook | 40.6032 | -74.1385 |
328 | Staten Island | Woodrow | 40.5434 | -74.1976 |
329 rows × 4 columns
Cleaning the dataset fetched from the external URL
final_df=final_df[final_df.Latitude!=1000]
final_df.reset_index(inplace=True)
final_df.drop('index',axis=1,inplace=True)
final_df
Borough | Neighbourhood | Latitude | Longitude | |
---|---|---|---|---|
0 | Brooklyn | Bath Beach | 40.6018 | -74.0005 |
1 | Brooklyn | Bay Ridge | 40.634 | -74.0146 |
2 | Brooklyn | Bedford Stuyvesant | 40.6834 | -73.9412 |
3 | Brooklyn | Bensonhurst | 40.605 | -73.9934 |
4 | Brooklyn | Bergen Beach | 40.6204 | -73.9068 |
... | ... | ... | ... | ... |
311 | Staten Island | Travis | 40.5932 | -74.1879 |
312 | Staten Island | Ward Hill | 40.6329 | -74.0829 |
313 | Staten Island | Westerleigh | 40.6212 | -74.1318 |
314 | Staten Island | Willowbrook | 40.6032 | -74.1385 |
315 | Staten Island | Woodrow | 40.5434 | -74.1976 |
316 rows × 4 columns
Adding in the location of the current restaurant (in San Francisco) so that it is also used in clustering along with NYC neighbourhoods
SF_rest_add={'Borough': 'Hayes Valley, SF','Neighbourhood':'Hayes Valley','Latitude':SF_latitude,'Longitude':SF_longitude}
final_df=final_df.append(SF_rest_add,ignore_index=True)
final_df.iloc[[-1]]
Borough | Neighbourhood | Latitude | Longitude | |
---|---|---|---|---|
316 | Hayes Valley, SF | Hayes Valley | 37.7781 | -122.425 |
Clustering the neighbourhoods of New York including the neighbourhood of San Francisco
Defining FourSquare credentials
CLIENT_ID = '*****************************************'
CLIENT_SECRET = '**************************************'
VERSION = '20180605'
LIMIT = 100
Defining a function to get the venues from all neighbourhoods
def getNearbyVenues(borough, names, latitudes, longitudes, radius=500):
venues_list=[]
for borough, name, lat, lng in zip(borough, names, latitudes, longitudes):
# API request URL creation
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT)
# making requests for the URL
results = requests.get(url).json()["response"]['groups'][0]['items']
# Returning only relevant information for each nearby venue
venues_list.append([(
borough,
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Borough','Neighborhood',
'Neighborhood Latitude',
'Neighborhood Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
# Getting the venues for each neighbourhoods
NewYork_venues = getNearbyVenues(borough=final_df['Borough'],names=final_df['Neighbourhood'],
latitudes=final_df['Latitude'],
longitudes=final_df['Longitude']
)
# Looking at the data received from FourSquare
NewYork_venues.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10720 entries, 0 to 10719
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Borough 10720 non-null object
1 Neighborhood 10720 non-null object
2 Neighborhood Latitude 10720 non-null float64
3 Neighborhood Longitude 10720 non-null float64
4 Venue 10720 non-null object
5 Venue Latitude 10720 non-null float64
6 Venue Longitude 10720 non-null float64
7 Venue Category 10720 non-null object
dtypes: float64(4), object(4)
memory usage: 670.1+ KB
NewYork_venues.head()
Borough | Neighborhood | Neighborhood Latitude | Neighborhood Longitude | Venue | Venue Latitude | Venue Longitude | Venue Category | |
---|---|---|---|---|---|---|---|---|
0 | Brooklyn | Bath Beach | 40.60185 | -74.000501 | Lenny's Pizza | 40.604908 | -73.998713 | Pizza Place |
1 | Brooklyn | Bath Beach | 40.60185 | -74.000501 | King's Kitchen | 40.603844 | -73.996960 | Cantonese Restaurant |
2 | Brooklyn | Bath Beach | 40.60185 | -74.000501 | Delacqua | 40.604216 | -73.997452 | Spa |
3 | Brooklyn | Bath Beach | 40.60185 | -74.000501 | Lutzina Bar&Lounge | 40.600807 | -74.000578 | Hookah Bar |
4 | Brooklyn | Bath Beach | 40.60185 | -74.000501 | Planet Fitness | 40.604567 | -73.997861 | Gym / Fitness Center |
NewYork_venues["Venue Category"].unique()
array(['Pizza Place', 'Cantonese Restaurant', 'Spa', 'Hookah Bar',
'Gym / Fitness Center', 'Dessert Shop', 'Chinese Restaurant',
'Bakery', 'Italian Restaurant', 'Coffee Shop', 'Restaurant',
'Japanese Restaurant', 'Supplement Shop',
'Eastern European Restaurant', 'Dim Sum Restaurant', 'Tea Room',
'Ice Cream Shop', 'Peruvian Restaurant', 'Sandwich Place', 'Bank',
'American Restaurant', 'Shanghai Restaurant', 'Mobile Phone Shop',
'Kids Store', 'Gas Station', 'Middle Eastern Restaurant',
'Seafood Restaurant', 'Tennis Court', 'Vietnamese Restaurant',
'Noodle House', 'Rental Car Location', 'Park',
'Fried Chicken Joint', 'Hotpot Restaurant', 'Gift Shop',
'Irish Pub', 'Malay Restaurant', 'Bar', 'Playground', 'Donut Shop',
'Bubble Tea Shop', 'Nightclub', 'Cocktail Bar',
'New American Restaurant', 'Wine Shop', 'Boutique', 'Tiki Bar',
'Café', 'Taco Place', 'Mexican Restaurant', 'Gym', 'Wine Bar',
'Gourmet Shop', 'Bagel Shop', 'Lounge', 'Food', 'Deli / Bodega',
'Thrift / Vintage Store', 'Garden',
'Southern / Soul Food Restaurant', 'Caribbean Restaurant',
'Discount Store', 'Pharmacy', 'Farmers Market', 'Cosmetics Shop',
'Turkish Restaurant', 'Sushi Restaurant', 'Fast Food Restaurant',
'Salon / Barbershop', 'Video Game Store', 'Frozen Yogurt Shop',
'Asian Restaurant', 'Shoe Store', 'Clothing Store',
"Women's Store", 'Optical Shop', 'Accessories Store',
'Supermarket', 'Bus Station', 'Furniture / Home Store',
'Concert Hall', 'Antique Shop', 'Yoga Studio',
'Martial Arts School', 'Grocery Store', 'Indian Restaurant',
'Athletics & Sports', 'Burrito Place', 'Music Venue',
'Jewelry Store', 'French Restaurant', 'Thai Restaurant',
'Flower Shop', 'Dance Studio', 'Bookstore', "Men's Store",
'Theater', 'Korean Restaurant', 'Music Store',
'Health & Beauty Service', 'Cajun / Creole Restaurant',
'Garden Center', 'Arts & Crafts Store', 'Boxing Gym',
'Electronics Store', 'Dry Cleaner', 'Gastropub', 'Historic Site',
'Juice Bar', 'Burger Joint', 'Convenience Store',
'Bed & Breakfast', 'Hotel', 'Bistro', 'Bike Shop', 'Neighborhood',
'Russian Restaurant', 'Mediterranean Restaurant',
'Food & Drink Shop', 'Other Great Outdoors', 'Food Truck',
'Non-Profit', 'Diner', 'Varenyky restaurant', 'Karaoke Bar',
'Pool', 'Bus Line', 'Tunnel', 'Recording Studio', 'History Museum',
'Pet Store', 'Scenic Lookout', 'Beach', 'Falafel Restaurant',
'Pier', 'Indie Theater', 'Pilates Studio', 'Ramen Restaurant',
'Pub', 'Plaza', 'Chocolate Shop', 'Mattress Store',
'Spanish Restaurant', 'Moving Target', 'Vape Store',
'Fruit & Vegetable Store', 'Liquor Store', 'Lawyer',
'Metro Station', 'Bus Stop', 'Greek Restaurant', 'Record Shop',
'Beer Garden', 'Butcher', 'Event Space', 'Gaming Cafe',
'Herbs & Spices Store', 'Church', 'Filipino Restaurant',
'Latin American Restaurant', 'Wings Joint', 'Breakfast Spot',
'Brewery', 'Fish Market', 'Art Gallery', 'African Restaurant',
'Photography Studio', 'Sculpture Garden',
'Vegetarian / Vegan Restaurant', 'Pie Shop', 'Market',
'Waterfront', 'Ethiopian Restaurant', 'Yemeni Restaurant',
'Dumpling Restaurant', 'Indie Movie Theater',
'Sporting Goods Shop', 'Toy / Game Store', 'Speakeasy', 'Dive Bar',
'Harbor / Marina', 'Basketball Court', 'Candy Store', 'Museum',
'Drugstore', 'Department Store', 'Gun Range', 'Tibetan Restaurant',
'Health Food Store', 'Nail Salon', 'Tapas Restaurant',
'Salad Place', 'Climbing Gym', 'Theme Park Ride / Attraction',
'Roof Deck', 'Dog Run', 'Food Court', 'Trail', 'Boat or Ferry',
'Performing Arts Venue', 'Entertainment Service', 'Hotel Bar',
'Intersection', 'Poke Place', 'Moroccan Restaurant',
'Massage Studio', 'Flea Market', 'Cycle Studio', 'Perfume Shop',
'Residential Building (Apartment / Condo)', 'Whisky Bar', 'Winery',
'Factory', 'Miscellaneous Shop', 'Hobby Shop', 'High School',
'Rental Service', 'BBQ Joint', 'Israeli Restaurant', 'Opera House',
'German Restaurant', 'Beer Bar', 'Cupcake Shop', 'Steakhouse',
'Shipping Store', 'Board Shop', 'Bridge', 'Shopping Mall',
'Child Care Service', 'Skate Park', 'Soccer Field',
'Cuban Restaurant', 'Indoor Play Area', 'Baseball Field',
"Doctor's Office", 'Jewish Restaurant', 'Polish Restaurant',
'Sports Bar', 'Track', 'Cheese Shop', 'Bowling Alley',
'Laundromat', 'Austrian Restaurant', 'Organic Grocery', 'Farm',
'Gymnastics Gym', 'Halal Restaurant', 'Big Box Store',
'Kosher Restaurant', 'Tourist Information Center', 'Film Studio',
'IT Services', 'School', 'Comic Shop', 'Gym Pool',
'Colombian Restaurant', 'Soup Place', 'Used Bookstore',
'Business Service', 'North Indian Restaurant', 'Other Nightlife',
'Public Art', 'Field', 'Picnic Shelter', 'Waterfall',
'Amphitheater', 'Bike Trail', 'Hill', 'Snack Place', 'Sports Club',
'Video Store', 'Paper / Office Supplies Store', 'Lake',
'General Travel', 'Comfort Food Restaurant', 'Creperie',
'Szechuan Restaurant', 'Stadium', 'Community Center',
'Arepa Restaurant', 'Brazilian Restaurant', 'Football Stadium',
'Laundry Service', 'Theme Park', 'Aquarium', 'Exhibit', 'Arcade',
'Movie Theater', 'Beer Store', 'Udon Restaurant',
'South American Restaurant', 'Hardware Store', 'Gay Bar',
'Outdoor Gym', 'Picnic Area', 'Storage Facility', 'Tattoo Parlor',
'Smoke Shop', 'Piano Bar', 'Train Station', 'Print Shop',
'Pool Hall', 'Zoo', 'Zoo Exhibit', 'Souvenir Shop',
'Warehouse Store', 'Check Cashing Service', 'Post Office',
'Jazz Club', 'Puerto Rican Restaurant', 'Eye Doctor', 'River',
'Outlet Store', 'Waste Facility', 'Tennis Stadium', 'Canal',
'Recreation Center', 'Social Club', 'Library', 'Shop & Service',
'Distillery', 'Home Service', 'Auto Dealership',
'Construction & Landscaping', 'Outdoors & Recreation', 'Building',
'Cooking School', 'Memorial Site', 'Auditorium', 'Tree',
'Lingerie Store', 'Monument / Landmark', 'Paella Restaurant',
'Japanese Curry Restaurant', 'Lebanese Restaurant',
'Peking Duck Restaurant', 'Art Museum', 'Smoothie Shop',
'Argentinian Restaurant', 'Comedy Club', 'Cha Chaan Teng',
'Taiwanese Restaurant', 'Sake Bar', 'Food Stand', 'Animal Shelter',
'Molecular Gastronomy Restaurant', 'Medical Center',
'Golf Driving Range', 'Outdoor Sculpture', 'Ukrainian Restaurant',
'Soba Restaurant', 'Shabu-Shabu Restaurant',
'Australian Restaurant', 'Coworking Space', 'Kebab Restaurant',
'General Entertainment', 'Office', 'Tex-Mex Restaurant',
'Fountain', 'Stationery Store', 'Adult Boutique',
'Leather Goods Store', 'Golf Course', 'Fondue Restaurant',
'Theme Restaurant', 'Veterinarian', 'Empanada Restaurant',
'College Academic Building', 'Czech Restaurant', 'Club House',
'Bridal Shop', 'Shoe Repair', 'College Arts Building', 'Circus',
'College Bookstore', 'Kitchen Supply Store', 'Newsstand',
'Pet Service', 'Hostel', 'Hawaiian Restaurant', 'College Theater',
'Churrascaria', 'Skating Rink', 'Luggage Store',
'College Cafeteria', 'Cultural Center', 'Resort', 'Watch Shop',
'Outdoor Supply Store', 'Street Art', 'Duty-free Shop',
'Scandinavian Restaurant', 'Pet Café', 'Swiss Restaurant',
'Tram Station', 'Persian Restaurant', 'Bike Rental / Bike Share',
'Tailor Shop', 'Pedestrian Plaza', 'Hot Dog Joint', 'Daycare',
'Tanning Salon', 'Train', 'Surf Spot', 'Parking',
'Indonesian Restaurant', 'Venezuelan Restaurant',
'Imported Food Shop', 'Bath House', 'Fish & Chips Shop',
'Afghan Restaurant', 'Automotive Shop', 'Beach Bar', 'Pop-Up Shop',
'Sri Lankan Restaurant', 'Portuguese Restaurant', 'Rest Area',
'Rock Club', 'Costume Shop', 'Government Building',
'Airport Lounge', 'Airport Terminal', 'Airport Food Court',
'Plane', 'Motorcycle Shop', 'Rock Climbing Spot', 'Cafeteria',
'Auto Garage', 'Romanian Restaurant', 'Go Kart Track',
'Professional & Other Places', 'Racetrack', 'Fishing Spot',
'Lighthouse', 'Nightlife Spot', 'Weight Loss Center', 'Buffet',
'Toll Plaza', 'Botanical Garden', 'Baseball Stadium',
'Outlet Mall', 'Souvlaki Shop', 'Camera Store'], dtype=object)
NewYork_venues["Venue Category"].nunique()
443
We are getting 443 unique venues from the FourSquaredata
NewYork_venues=NewYork_venues[NewYork_venues['Venue Category']!='Neighborhood'] # Code adjusted to remove Neighborhood
# One - hot encoding to handle categorical data for clustering
NY_onehot = pd.get_dummies(data=NewYork_venues[['Borough','Neighborhood','Venue Category']],columns=['Venue Category'],drop_first=True,prefix="", prefix_sep="")
NY_onehot.head()
Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | Amphitheater | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brooklyn | Bath Beach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | Brooklyn | Bath Beach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | Brooklyn | Bath Beach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | Brooklyn | Bath Beach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | Brooklyn | Bath Beach | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 443 columns
Getting scored of each category based on Mean of the frequency of their occurences. This will help to determine the similarities between neighbourhoods
NY_grouped = NY_onehot.groupby(['Borough','Neighborhood']).mean().reset_index()
NY_grouped.head()
Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | Amphitheater | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Bronx | Allerton | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | Bronx | Bathgate | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.010000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | Bronx | Baychester | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | Bronx | Bedford Park | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | Bronx | Belmont | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.017241 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 443 columns
Function to determine most common venues, we are cleaning and reducing the venues we are doing our analysis on to reduce noise from the data and to get more precise results
def return_most_common_venues(row, num_top_venues):
row_categories = row.iloc[2:]
row_categories_sorted = row_categories.sort_values(ascending=False)
return row_categories_sorted.index.values[0:num_top_venues]
num_top_venues = 15 # selecting top 15 venues for our analysis
indicators = ['st', 'nd', 'rd']
# creating columns according to number of top venues
columns = ['Borough','Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns.append('{}th Most Common Venue'.format(ind+1))
# new dataframe to hold the top 10 venues for each of the neighbourhoods
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough']=NY_grouped['Borough']
neighborhoods_venues_sorted['Neighborhood'] = NY_grouped['Neighborhood']
# calling the function to get the top 10 venues for each neighbourhood
for ind in np.arange(NY_grouped.shape[0]):
neighborhoods_venues_sorted.iloc[ind, 2:] = return_most_common_venues(NY_grouped.iloc[ind, :], num_top_venues)
neighborhoods_venues_sorted.head()
Borough | Neighborhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | 11th Most Common Venue | 12th Most Common Venue | 13th Most Common Venue | 14th Most Common Venue | 15th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Bronx | Allerton | Discount Store | Sandwich Place | Fast Food Restaurant | Pizza Place | Pharmacy | Donut Shop | Storage Facility | Bike Trail | Soccer Field | Seafood Restaurant | Bar | Bank | Clothing Store | Mobile Phone Shop | Trail |
1 | Bronx | Bathgate | Italian Restaurant | Pizza Place | Deli / Bodega | Spanish Restaurant | Liquor Store | Bank | Bakery | Grocery Store | Dessert Shop | Mexican Restaurant | Sandwich Place | Food & Drink Shop | Coffee Shop | Shoe Store | Donut Shop |
2 | Bronx | Baychester | Pharmacy | Italian Restaurant | Grocery Store | Bike Trail | Liquor Store | Historic Site | Pizza Place | Sandwich Place | Donut Shop | Print Shop | Mobile Phone Shop | Bus Station | Bus Line | Deli / Bodega | Playground |
3 | Bronx | Bedford Park | Diner | Pizza Place | Deli / Bodega | Mexican Restaurant | Supermarket | Pharmacy | Chinese Restaurant | Sandwich Place | Spanish Restaurant | Bus Station | Grocery Store | Food Truck | Smoke Shop | Baseball Field | Bakery |
4 | Bronx | Belmont | Italian Restaurant | Pizza Place | Bakery | Deli / Bodega | Dessert Shop | Restaurant | Fish Market | Food & Drink Shop | Cheese Shop | Chinese Restaurant | Tattoo Parlor | Grocery Store | Mexican Restaurant | Fast Food Restaurant | Mediterranean Restaurant |
Determining the K value (using elbow method)
K=range(1,25)
NY_grouped_clustering = NY_grouped.drop(['Borough','Neighborhood'], 1)
WCSS=[] # Model performance indicator --- Within Cluster Sum of Squares
for k in K:
kmeans = KMeans(n_clusters=k, random_state=0).fit(NY_grouped_clustering)
WCSS.append(kmeans.inertia_)
print (WCSS)
[30.047043953503767, 28.142337991544935, 26.84013025212907, 25.956055283198904, 24.8308555411957, 24.19646209527639, 23.195158562596642, 22.935235144735433, 22.330608681361483, 22.097341077068755, 21.759179057646183, 21.06650654574184, 20.813200614316383, 20.438676231526742, 19.913500964265413, 19.51207544053955, 19.414458087213852, 19.1960151544852, 18.751570710040756, 18.449575691630375, 18.142105128663783, 17.802439692856836, 17.593415639791107, 17.267414917310226]
We have used the Within cluster sum of squares value (intertia) in order to determine the best possible value of k to be used in our analysis
#Plotting the graph of K vs WCSS to determine "k"
plt.figure(figsize=(20,10))
plt.plot(K,WCSS)
plt.xlabel("k")
plt.ylabel("Sum of Squares")
plt.title("Determining K-value")
plt.show()
We’ll be using the value of k as 7 as per the above graph. This seems to be the closest point of deflection, though there’s no clear cut point for our analysis.
Running the clustering algorithm
k = 7
kmeans = KMeans(n_clusters=k, random_state=0).fit(NY_grouped_clustering)
# adding clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
NY_clustered = final_df
# Adding latitude/longitude for each neighborhood with the cluster labels
NY_clustered = NY_clustered.merge(neighborhoods_venues_sorted.set_index(['Borough','Neighborhood']), left_on=['Borough','Neighbourhood'],right_on=['Borough','Neighborhood'])
NY_clustered.head()
Borough | Neighbourhood | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | 11th Most Common Venue | 12th Most Common Venue | 13th Most Common Venue | 14th Most Common Venue | 15th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Brooklyn | Bath Beach | 40.6018 | -74.0005 | 1 | Chinese Restaurant | Cantonese Restaurant | Supplement Shop | Pizza Place | Bank | Italian Restaurant | Japanese Restaurant | Dessert Shop | Gas Station | Bakery | Tea Room | Sandwich Place | Eastern European Restaurant | Peruvian Restaurant | Middle Eastern Restaurant |
1 | Brooklyn | Bay Ridge | 40.634 | -74.0146 | 2 | Chinese Restaurant | Dessert Shop | Seafood Restaurant | Playground | Irish Pub | Vietnamese Restaurant | Bubble Tea Shop | Noodle House | Nightclub | Tea Room | Tennis Court | Gift Shop | Park | Fried Chicken Joint | Malay Restaurant |
2 | Brooklyn | Bedford Stuyvesant | 40.6834 | -73.9412 | 2 | Coffee Shop | Pizza Place | Café | Bar | Fried Chicken Joint | Deli / Bodega | Playground | Gym | Lounge | Gym / Fitness Center | Cocktail Bar | Tiki Bar | Seafood Restaurant | Thrift / Vintage Store | Gourmet Shop |
3 | Brooklyn | Bensonhurst | 40.605 | -73.9934 | 1 | Chinese Restaurant | Bakery | Bank | Cantonese Restaurant | Japanese Restaurant | Mobile Phone Shop | Bubble Tea Shop | Pizza Place | Supplement Shop | Kids Store | Gourmet Shop | Coffee Shop | Pharmacy | Turkish Restaurant | Sushi Restaurant |
4 | Brooklyn | Bergen Beach | 40.6204 | -73.9068 | 1 | Chinese Restaurant | American Restaurant | Donut Shop | Pizza Place | Bus Station | Gym | Peruvian Restaurant | Sushi Restaurant | Supermarket | Deli / Bodega | Italian Restaurant | Field | Flea Market | Filipino Restaurant | Film Studio |
Here’s the fun part, visualize the clusters in New York City!
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NY_clustered['Latitude'], NY_clustered['Longitude'], NY_clustered['Neighbourhood'], NY_clustered['Cluster Labels']):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)
map_clusters
Let’s take a closer look
Zooming in a little more
Getting the cluster for our current restaurant
NY_clustered.loc[NY_clustered['Neighbourhood'] == "Hayes Valley"]
Borough | Neighbourhood | Latitude | Longitude | Cluster Labels | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | 11th Most Common Venue | 12th Most Common Venue | 13th Most Common Venue | 14th Most Common Venue | 15th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
315 | Hayes Valley, SF | Hayes Valley | 37.7781 | -122.425 | 2 | Clothing Store | Wine Bar | French Restaurant | Boutique | Mexican Restaurant | Pizza Place | Performing Arts Venue | Cocktail Bar | Sushi Restaurant | Coffee Shop | Park | Bakery | Optical Shop | Café | Juice Bar |
curr_rest_cluster = NY_clustered.loc[NY_clustered['Neighbourhood'] == "Hayes Valley","Cluster Labels"].item()
print(curr_rest_cluster)
2
# Extracting all the neighbourhoods falling under the cluster 2 - same as that of our restaurant in San Francisco
Cluster0=NY_clustered[NY_clustered['Cluster Labels']==curr_rest_cluster]
Cluster0.shape
(161, 20)
We have 160 options to open our new restaurant. This basically means that 160 neighbourhoods in New York City are similar to our current locality. Though this gives us a lot of options to choose from, let’s try to narrow down our choices by doing another round of clustering in these 160 locations!
NY1_grouped = NY_grouped
# Adding cluster labels to our original dataframe on which the Kmeans clustering was done
NY1_grouped = NY1_grouped.merge(neighborhoods_venues_sorted[['Borough','Neighborhood','Cluster Labels']].set_index(['Borough','Neighborhood']), left_on=['Borough','Neighborhood'],right_on=['Borough','Neighborhood'])
NY1_grouped.head()
Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | Amphitheater | ... | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | Cluster Labels | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Bronx | Allerton | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
1 | Bronx | Bathgate | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.010000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
2 | Bronx | Baychester | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
3 | Bronx | Bedford Park | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
4 | Bronx | Belmont | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.017241 | 0.0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1 |
5 rows × 444 columns
# Extracting data for the neighbourhoods which has the same cluster as that of our restaurant
SF_cluster=NY1_grouped[NY1_grouped['Cluster Labels']==curr_rest_cluster]
SF_cluster.shape
(161, 444)
Preparing the revised dataframe for reclustering to identify the neighbourhood which most similar to our current neighbourhod
SF_grouped_clustering = SF_cluster.drop(['Borough','Neighborhood','Cluster Labels'], 1)
Determining which neighbourhood is most similar to our office’s neighbourhood by increasing the K value. Using K value in this manner is indirect way to calculate the distance between the neighbourhoods from a particular neighbourhood.
k=75
# Kmeans clustering
kmeans = KMeans(n_clusters=k, random_state=0).fit(SF_grouped_clustering)
# Inserting the revised clusters in the extracted dataframe
SF1_cluster=SF_cluster
SF1_cluster.drop('Cluster Labels',inplace=True,axis=1)
SF1_cluster.insert(0, 'Cluster Labels', kmeans.labels_)
SF1_cluster
Cluster Labels | Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | 38 | Bronx | Bronx Park South | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.035714 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.107143 | 0.250000 |
6 | 38 | Bronx | Bronx River | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.037037 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.111111 | 0.259259 |
9 | 50 | Bronx | City Island | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.037037 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
11 | 45 | Bronx | Clason Point | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.083333 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
13 | 40 | Bronx | Concourse | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
294 | 68 | Staten Island | Pleasant Plains | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
297 | 58 | Staten Island | Randall Manor | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
298 | 36 | Staten Island | Richmond Town | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.166667 | 0.000000 | 0.000000 |
304 | 43 | Staten Island | South Beach | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
305 | 11 | Staten Island | St. George | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.051282 | ... | 0.0 | 0.0 | 0.025641 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 |
161 rows × 444 columns
# Identifying the new cluster label
SF1_cluster[SF1_cluster['Neighborhood']=='Hayes Valley']
Cluster Labels | Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
130 | 11 | Hayes Valley, SF | Hayes Valley | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.0 | 0.05 | 0.01 | 0.0 | 0.0 | 0.0 | 0.0 | 0.01 | 0.0 | 0.0 |
1 rows × 444 columns
# Extracting the details of the new cluster(=11)
Cluster_Label1=SF1_cluster.loc[SF1_cluster['Neighborhood']=='Hayes Valley','Cluster Labels'].item()
SF1_cluster[SF1_cluster['Cluster Labels']==Cluster_Label1]
Cluster Labels | Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | 11 | Bronx | Fordham | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
24 | 11 | Bronx | Kingsbridge | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.025641 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.025641 | 0.0 | 0.0 |
63 | 11 | Brooklyn | Brighton Beach | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
95 | 11 | Brooklyn | Homecrest | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.025000 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.025000 | 0.0 | 0.0 |
130 | 11 | Hayes Valley, SF | Hayes Valley | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.05 | 0.010000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.010000 | 0.0 | 0.0 |
145 | 11 | Manhattan | Harlem (Central) | 0.0 | 0.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.010000 | ... | 0.0 | 0.01 | 0.010000 | 0.0 | 0.01 | 0.0 | 0.0 | 0.010000 | 0.0 | 0.0 |
154 | 11 | Manhattan | Manhattanville | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.00 | 0.033333 | 0.0 | 0.00 | 0.0 | 0.0 | 0.033333 | 0.0 | 0.0 |
179 | 11 | Queens | Auburndale | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.037037 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
217 | 11 | Queens | Kew Gardens | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.020833 | 0.0 | 0.0 |
305 | 11 | Staten Island | St. George | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.051282 | ... | 0.0 | 0.00 | 0.025641 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
10 rows × 444 columns
The above 9 options are the closest to our current restaurant locality and can be used to open our new restaurant and help in business expansion!
Finalizing the output
SF2_cluster=SF1_cluster[SF1_cluster['Cluster Labels']==Cluster_Label1].copy()
SF2_cluster.drop("Cluster Labels",inplace=True,axis=1)
SF2_cluster
Borough | Neighborhood | Adult Boutique | Afghan Restaurant | African Restaurant | Airport Food Court | Airport Lounge | Airport Terminal | American Restaurant | Amphitheater | ... | Whisky Bar | Wine Bar | Wine Shop | Winery | Wings Joint | Women's Store | Yemeni Restaurant | Yoga Studio | Zoo | Zoo Exhibit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | Bronx | Fordham | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
24 | Bronx | Kingsbridge | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.025641 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.025641 | 0.0 | 0.0 |
63 | Brooklyn | Brighton Beach | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
95 | Brooklyn | Homecrest | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.025000 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.025000 | 0.0 | 0.0 |
130 | Hayes Valley, SF | Hayes Valley | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.00 | ... | 0.0 | 0.05 | 0.010000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.010000 | 0.0 | 0.0 |
145 | Manhattan | Harlem (Central) | 0.0 | 0.0 | 0.03 | 0.0 | 0.0 | 0.0 | 0.010000 | 0.01 | ... | 0.0 | 0.01 | 0.010000 | 0.0 | 0.01 | 0.0 | 0.0 | 0.010000 | 0.0 | 0.0 |
154 | Manhattan | Manhattanville | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.00 | ... | 0.0 | 0.00 | 0.033333 | 0.0 | 0.00 | 0.0 | 0.0 | 0.033333 | 0.0 | 0.0 |
179 | Queens | Auburndale | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.037037 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
217 | Queens | Kew Gardens | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.00 | ... | 0.0 | 0.00 | 0.000000 | 0.0 | 0.00 | 0.0 | 0.0 | 0.020833 | 0.0 | 0.0 |
305 | Staten Island | St. George | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 0.051282 | 0.00 | ... | 0.0 | 0.00 | 0.025641 | 0.0 | 0.00 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.0 |
10 rows × 443 columns
columns1 = ['Borough','Neighborhood']
for ind in np.arange(num_top_venues):
try:
columns1.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
except:
columns1.append('{}th Most Common Venue'.format(ind+1))
# create a new dataframe
Final_venues_sorted = pd.DataFrame(columns=columns1)
Final_venues_sorted['Borough']=SF2_cluster['Borough']
Final_venues_sorted['Neighborhood'] = SF2_cluster['Neighborhood']
for ind in np.arange(SF2_cluster.shape[0]):
Final_venues_sorted.iloc[ind, 2:] = return_most_common_venues(SF2_cluster.iloc[ind, :], num_top_venues)
Final_venues_sorted
Borough | Neighborhood | 1st Most Common Venue | 2nd Most Common Venue | 3rd Most Common Venue | 4th Most Common Venue | 5th Most Common Venue | 6th Most Common Venue | 7th Most Common Venue | 8th Most Common Venue | 9th Most Common Venue | 10th Most Common Venue | 11th Most Common Venue | 12th Most Common Venue | 13th Most Common Venue | 14th Most Common Venue | 15th Most Common Venue | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
21 | Bronx | Fordham | Shoe Store | Fast Food Restaurant | Coffee Shop | Sandwich Place | Clothing Store | Spanish Restaurant | Supplement Shop | Bank | Pharmacy | Gym / Fitness Center | Pizza Place | Café | Deli / Bodega | Miscellaneous Shop | Mobile Phone Shop |
24 | Bronx | Kingsbridge | Supermarket | Café | Gym | Pizza Place | Donut Shop | Mexican Restaurant | Sandwich Place | Spanish Restaurant | Grocery Store | Thrift / Vintage Store | Gourmet Shop | Breakfast Spot | Supplement Shop | Steakhouse | Burger Joint |
63 | Brooklyn | Brighton Beach | Supermarket | Bakery | Eastern European Restaurant | Health & Beauty Service | Grocery Store | Donut Shop | Theater | Mobile Phone Shop | Café | Flower Shop | Gourmet Shop | Bar | Food Truck | Lounge | Bus Line |
95 | Brooklyn | Homecrest | Sushi Restaurant | Café | Pizza Place | Bagel Shop | Mobile Phone Shop | Ice Cream Shop | Market | Seafood Restaurant | Bank | Bar | Sandwich Place | Gym | Coffee Shop | Mediterranean Restaurant | Eastern European Restaurant |
130 | Hayes Valley, SF | Hayes Valley | Clothing Store | Wine Bar | French Restaurant | Boutique | Mexican Restaurant | Pizza Place | Performing Arts Venue | Cocktail Bar | Sushi Restaurant | Coffee Shop | Park | Bakery | Optical Shop | Café | Juice Bar |
145 | Manhattan | Harlem (Central) | Southern / Soul Food Restaurant | Mobile Phone Shop | Clothing Store | Theater | Pizza Place | Burger Joint | African Restaurant | Cosmetics Shop | Sandwich Place | Café | Arts & Crafts Store | Mexican Restaurant | Japanese Restaurant | Jazz Club | Deli / Bodega |
154 | Manhattan | Manhattanville | Art Gallery | Fried Chicken Joint | Coffee Shop | Boutique | Seafood Restaurant | Chinese Restaurant | Sandwich Place | Lounge | Ethiopian Restaurant | Bank | Public Art | College Theater | Park | Spa | Food & Drink Shop |
179 | Queens | Auburndale | Pharmacy | Café | Hookah Bar | Sandwich Place | Train Station | Train | Lounge | Pizza Place | Miscellaneous Shop | Greek Restaurant | Athletics & Sports | Toy / Game Store | Donut Shop | Korean Restaurant | Vietnamese Restaurant |
217 | Queens | Kew Gardens | Pizza Place | Metro Station | Café | Coffee Shop | Mediterranean Restaurant | Donut Shop | Cosmetics Shop | Deli / Bodega | Nail Salon | Bagel Shop | Supplement Shop | Bank | Train Station | Gym / Fitness Center | Grocery Store |
305 | Staten Island | St. George | Clothing Store | Italian Restaurant | Sporting Goods Shop | American Restaurant | Bar | Bakery | Museum | Monument / Landmark | Deli / Bodega | Tapas Restaurant | Pharmacy | Theater | Farmers Market | Bus Stop | Seafood Restaurant |