ISM4011 Final: MIS Exam 3 CH9-CH12

375 views7 pages

Document Summary

3 primary activities: acquire data, what needs to be done, analyze data, how is it made useful, publish data/results, how is it given to users. Published data is available by: push delivers content without any request from users, pull requires user to request bi results. Database warehouse ^ contains: extract data from operational, internal and external databases, cleanse data, organize data, publish data. Dirty data data that"s got a phone number like 999-999-9999 or poop@poop. com. Granularity too fine or not fine enough. Curse of dimensionality from huge sources come problems. Data marts smaller, address particular specific functions of enterprise/department/etc. A data mart is like a store for the distributor. Bigdata: volume petabyte and larger, velocity generated rapidly, variety. Different formats of web server and database log files. Streams of data about user responses to page content; graphics/audio/video files. Mapreduce - hundreds of computers working in parallel hadoop thing that integrates this pig is their query language.