APSC 1863 Lecture Notes - Lecture 2: Recommender System, Big Data, Nan
Document Summary
The whole world seems to be hearing about your new amazing abilities to analyze big data and build useful systems for them! You"ve just taken up a new contract with a new online food delivery company. This company is trying to di erentiate itself by recommending new meals to customers based o of other customers likings. Your nal result should be in the form of a function that can take in a spark dataframe of a single customer"s ratings for various meals and output their top 3 suggested meals. Out[33]: df. describe(). transpose() count mean std min 25% 50% 75% max movieid 1501. 0 49. 405730 28. 937034. Out[34]: df. corr() movieid rating userid movieid 1. 000000 0. 036569 0. 003267 rating 0. 036569 1. 000000 0. 056411 userid 0. 003267 0. 056411 1. 000000. In [35]: import numpy as np df["mealskew"] = df["movieid"]. apply(lambda id: np. nan if id > 31 else id) /users/marci/anaconda/lib/python3. 5/site-packages/numpy/lib/function_base. p y:3834: runtimewarning: invalid value encountered in percentile. Out[11]: count mean std min 25% 50% 75% max movieid 1501. 0 49. 405730 28. 937034.