As I was nearing the end of my first semester at Carnegie Mellon University, I had to think of an idea for my term project in my computer science course. That’s when I decided to use my interest in data to learn web scraping and combine it with Python and Tkinter. Web scraping is a popular way to gather data online. We see the importance of web scraping more than ever with the ongoing pandemic since it has allowed data scientists to gather statistics in an efficient manner which helped many organizations and the government to take the necessary actions. While I did not use web scraping to gather data about covid-19, I was able to utilize this technique to help students living in the Pittsburgh area.
The name of my term project is “Carnegie Mellon Personalized Menu”. The user will be prompted to answer several questions such as what type of food they would like to eat, how much money they want to spend, and where they want to eat. Once these questions have been answered a menu will be created from nearby restaurants located in Oakland, Shadyside, and Squirrelhill. The menu displays the restaurant name, where it is located, food names, and food prices. From the menu, there are additional options such as more information about the restaurant, similar restaurants the user might like, or the user’s past menu. This program helps students decide what they want to eat quicker instead of going through the hassle of looking up nearby restaurants and finding the menu.
- Beautiful Soup
What I Webscraped?
- Food names
- Food prices
- Food images
A general overview of the finalized project:
➔ A home page that has three options the user can choose from: sign in, create an account, and a help button. The sign-in class lets the user log. in if they already have an account. If the user does not have an account they are prompted to create one and this will be another separate class (registration class). This lets the user create a username and password and then saves their details in a file so that they can log in whenever after successfully registering. The help button is a page that will quickly explain to the user how to use the interface and now the menu is created.
➔ Once the user has successfully logged in they are be prompted to answer several questions that are food-related. The answers to these questions will choose the restaurant that has the food they feel like eating and present the menu to the user. This will be its own class — Menu class.
➔ There is another class — the past menu. This lets the user view their past menu if they are interested. This is especially useful when the user is logging back in after a long time and they forgot what their last menu was.
➔ If the user wants to see similar restaurants to their menu, then they can click the suggested restaurant's button, which is its own class — recommended restaurants.
➔ Each restaurant will be its own function and they will be placed in the appropriate class — Oakland, Shadyside, Squirrelhill. These functions will web scrape the restaurant’s data such as names, prices, and descriptions.
One of the trickiest parts of this project was suggesting other similar restaurants to the user. To create a recommender system, I used the KNN (k-nearest neighbors) machine learning algorithm. This algorithm is memory intensive but not computationally intensive, allowing it to memorize the locations of all the cases without building a model. Another crucial part of the project was displaying the user’s previous menu. I approached this by saving the restaurants given to the user in the same file that saved the username and password. I then programmed my code to only access the k-th restaurant depending on how many times the user continues signing in.
Machine Learning — Recommender System
Why I chose KNN?
To create my recommender system I chose the KNN for the nearest neighbor search. This allowed me to grab the 5 most related restaurants to the user’s given restaurant. There are many other recommender systems out there such as k-means or hierarchical clustering. However, I chose KNN because it evolves with new data allowing the programmer to add new data without retraining a completely new model. KNN also allows us to choose what type of distance metric we want, which gives the programmer a plethora of options. I decided to use the Euclidean distance because it is the most typical choice and this distance formula is an accurate measure of the true straight line distance between two points in a plane.
How I created the recommender system?
I started off by creating three different data sets — the type of food, location, and price of food.
Next, I created another function that measured the Euclidean distance from the new data to the classified data.
Once this was programmed, I created another function that takes in the user input, values of the dataset, target values, and k as parameters. This function then checks each of the target values and the five closest restaurants to the given restaurant are displayed to the user.
What is Tkinter?
Tkinter is a graphical user interface (GUI) library for Python. There are other libraries that can be used to create a GUI in Python, however, Tkinter provides a fast and powerful object-oriented interface.
How I Used Tkinter in My Project?
Before I reached MVP (Minimum Viable Product), I had to use tkinter-with-cmu-112-graphics which was the course’s modified version of Tkinter. This means that many of the graphical features I had built in such as buttons and scrollbars were created solely by me. I also used Tkinter to create different backgrounds for the menu which added a more user-friendly feel to the interface.
As stated previously, only being able to use the cmu-112 graphics until I reached MVP made me think more creatively rather than just using Tkinter’s features. At first, this was a challenge because I wasn’t sure how to create buttons and scrollbars. After trying many different ideas, I figured that a solution to creating a “fake button” would be creating a rectangle and if the user pressed anywhere in the rectangle’s area, then it would act as a button and a new page would show up.
Nearing the deadline of my project, I wanted to add background music to the interface. While I was able to add one song throughout the use of the interface, my goal was to change the type of music depending on the user’s choice. I was not able to accomplish this in the time given, I do wish to add this feature though.
I learned many useful techniques from this project about web scraping and tkinter. As I look back on this project, one thing that kept me going was the idea that there is a solution to everything. There were times I was not allowed to use a specific library or certain features which forced me to think outside of the box. Programming is not just writing lines of code to solve a certain problem, it revolves around the approach of a problem by finding the most efficient solution. Data is the core of our society. Web scraping is useful in any type of industry and I wish to utilize the skills I learned from the project to get me started on the path of utilizing data to benefit different communities and industries.
If you are interested in the full code of the project, it is linked below: