PlaceInfo: Using Machine Learning to Analyze and Visualize Customer Preferences

Anna Manukyan, Assistant Professor
Physical Sciences Unit, Natural Sciences Department

 

I am an Assistant Professor of Chemistry at Hostos Community College. My background is in Theoretical and Computational Chemistry. I have extensive experience in scientific research, algorithm and software development, computer simulation, data analysis and processing. My continued interest in computational methodologies directed me towards pursuing a highly competitive data science fellowship at the Data Incubator in NY. (https://en.wikipedia.org/wiki/The_Data_Incubator. The program was selected by Business Insider as one of 15 competitive programs in the US with more competitive admissions than Harvard.)

 

As a fellow at DI, I worked on the design and implementation of algorithmic trading strategy utilizing stock market data. This strategy was successfully used for medium and low frequency trading with real money. I also worked on projects using machine learning, natural language processing, graph theory and statistical data analysis. Undoubtedly, this fellowship experience inspired and prepared me to challenge myself, and explore other areas of interest. In particular, using machine learning approaches, I developed an integrated web application for analyzing business performance data. This web application allows customers to query for a place of business, and to get data analysis report that includes the following information: performance over time, sentiment analysis of customer reviews, the most and least popular features, available discounts, etc. For restaurants, it also provides the most recent health inspection grade along with the violation report. The generated report displays a comprehensive information about the business in a user-friendly form that is easy to navigate. The initial version of the web application is available here: placeinfo.org.

 

The existing query services provide a very limited information about either a place or a business. In New York, for example, Yelp or Foursquare do not provide a health inspection grade of a restaurant. Moreover, these services do not provide a thorough analysis of what is popular in each restaurant as well as how the rating of a restaurant changes daily. In my experience, this can be crucial when selecting a place. PlaceInfo provides a concise, user-friendly report on what is popular in each restaurant as well as how the user rating of a restaurant changes daily combined with the most recent health inspection report. A similar analysis applies to other types of businesses.

 

Preliminary analysis of the data generated by this application has shown that in a given place of business (e.g. restaurant, bakery, beauty salon), the service quality fluctuates significantly throughout the week or the month, ranging from very poor to excellent.  Instead of reading all the reviews about the place, the customer may glimpse over a short sentiment analysis report that would clarify if the place matches the customer’s needs. For example, based on reviews, the application will provide a report in the word cloud format (Fig. 1). This can provide some insight on what are the most outstanding features of this restaurant, and can be more informative than the rating alone. Stop words, or words not carrying meaningful information are automatically filtered. Words like “real”, “open”, etc., can be avoided by selecting the appropriate cutoff for word frequency.

 

 

 

 

Fig. 1. Word cloud from the 100 most recent yelp reviews for a given restaurant.

 

The application also provides the most frequent phrases used in 100 most recent Yelp reviews (Fig. 2). This would be more informative when accompanied with the phrase sentiment.

 

 

 

 

 

Fig. 2. Frequently used (meaningful) word combinations are selected to help

the customer to understand what is popular in the restaurant.

 

The application also uses the available time series data in order to forecast the future performance of the business. The information generated by this application can be useful for both the clients and the management to learn if there are any recurrent problems in the service, or when is the best time to visit a place. Moreover, these results can be used by potential business investors to assess the value of the business and its future viability.

0 Comments

Leave a reply

Log in with your credentials

Forgot your details?