ADM 1370 Study Guide - Final Guide: Data Deduplication, Data Warehouse, Eventual Consistency
Document Summary
To obtain actionable information (trends or relationships to act on), you need: data analytics, human expertise, high quality data. Petabyte (pb) = 1000 terabytes (tb) = 1 million gigabytes (gb) Centralized database: data store in a single location and accessible to multiple computers (ex. Search engine) advantages and disadvantages are: better control of data quality, better it security, but, transmission delay when users are geodispersed, need more powerful hardware and faster network. Distributed database: split databases into groups, accessible to multiple groups of. Etl (extraction of data from database, transformed into standardized format, then loaded into data warehouses at specific times), then data becomes non-volatile and ready for analysis = information. Change data capture (cdc) and data deduplication is also used with etl: data marts are subsets of data warehouses, designed for analysis and quick response to queries, whereas databases are for storing data, subject-oriented similar data linked together.