INFO 340 Lecture Notes - Lecture 7: Data Warehouse, Star Schema, Fact Table
![](https://new-preview-html.oneclass.com/Yaxkv7zyB48qN4baX96zm20dL5M6XOgn/bg1.png)
INFO 340 Lecture 7
Concepts
o Distinguishing characteristics from relational database
▪ Subject-oriented, rather than all aspects of business
• E.g., customers
▪ Integrated – multiple data sources
• Likely have different formatting → Make consistent to present
unified view of data
• E.g., insurance companies, municipal courts, etc.
▪ Time-variant – depth over time
• Accurate for a very specific point in time; records based on fact
▪ Non-volatile – storable + doesn’t change over time
• New data added to supplement, but not replace
• Read only
- Queries – range from simple to complex
- Problems
o Could be incomplete
o Data ownership
o High maintenance
o LEADS TO: increased risk + cost of data warehouses
- Star schema (dimensional model)
o Star shape allows data to be easily accessible; only one join
o Simple queries against complex data
- Data Warehouse Tools: ETL Processes
o Extraction – identify where data is
▪ 2 methods
• Top down - identify what is needed first, then go out + obtain
what is missing
• Bottom down – look at available data + use it
▪ Data sources normally internal
o Transformation – find data, assemble/aggregate it, and summarize it
▪ Implemented w series of rules/functions
Document Summary
Concepts: distinguishing characteristics from relational database, subject-oriented, rather than all aspects of business, e. g. , customers. Queries range from simple to complex. Problems: could be incomplete, data ownership, high maintenance, leads to: increased risk + cost of data warehouses. Star schema (dimensional model: star shape allows data to be easily accessible; only one join, simple queries against complex data. Implemented w series of rules/functions: loading load data + load any previously defined constraints. A su(cid:271)set; (cid:862)(cid:373)i(cid:374)i (cid:449)arehouse(cid:863), (cid:272)e(cid:374)tered arou(cid:374)d a depart(cid:373)e(cid:374)t or di(cid:448)isio(cid:374: not a data warehouse, support analytical requirements of a particular business unit (e. g. , sales department) Improve response time due to less data: why, future users more easily defined, lower cost, drawback less robust data for answering higher level questions, less complex, less expensive, see efficiency gains sooner. I(cid:374)(cid:373)o(cid:374)"s corporate i(cid:374)for(cid:373)atio(cid:374) fa(cid:272)tory (cid:894)cif(cid:895: ki(cid:373)(cid:271)all"s busi(cid:374)ess di(cid:373)e(cid:374)sio(cid:374)al life(cid:272)y(cid:272)le, most common bc less risky + easier to implement.