DATAWARE HOUSING AND DATA MINING Study Guide - Midterm Guide: Amazon Dynamodb, Bigquery, Data Mining
1.What is a database
A database is the context of data warehousing and data management. It is a collecon of structured
data that is stored and organized in a way that enables ecient access and retrieval of informaon.
In data warehousing, a database is typically used to store and manage large volumes of data from
various sources, which is then transformed and analyzed to support business intelligence and decision-
making processes.
In data management, a database is used to store and manage data that is crical to an organizaon's
operaons, such as customer informaon, nancial records, and inventory data. Databases can be
designed to support specic business processes or applicaons, and can be managed using various
soware tools and technologies to ensure data quality, security, and availability.
2.What are dierent database or backend tool
There are many dierent types of databases and backend tools available, each with its own strengths
and weaknesses. Here are some examples:
1. Relaonal Databases: These are the most common type of database and are based on the
relaonal model. Examples include MySQL, Oracle, SQL Server, PostgreSQL, and SQLite.
2. NoSQL Databases: These databases are designed to handle unstructured or semi-structured
data and oer exible schema designs. Examples include MongoDB, Cassandra, Couchbase,
and DynamoDB.
3. Graph Databases: These databases are designed to handle complex relaonships between
data points and are opmized for graph queries. Examples include Neo4j, OrientDB, and
ArangoDB.
4. In-memory Databases: These databases store data in memory instead of on disk, allowing for
faster data access and processing. Examples include Redis, Memcached, and Hazelcast.
5. Search Engines: These tools are designed to allow for full-text search of large data sets.
Examples include Elascsearch, Solr, and Amazon CloudSearch.
6. Data Warehousing Tools: These tools are designed to help store, manage, and analyze large
amounts of structured data. Examples include Amazon Redshi, Google BigQuery, and
Microso Azure Synapse Analycs.
7. Key-Value Stores: These databases store data as key-value pairs, allowing for fast data
retrieval. Examples include Riak, BerkeleyDB, and Amazon DynamoDB.
8. Object-Oriented Databases: These databases store data as objects, allowing for more complex
data structures and relaonships. Examples include db4o, ObjectDB, and Versant.
These are just a few examples of the many dierent database and backend tools available. The choice
of tool depends on the specic needs and requirements of the applicaon or organizaon.
3.What is an excel le?
An Excel le is a type of spreadsheet le created using Microso Excel, a popular soware program
used for creang and managing spreadsheets.
While Excel les can be used to store and manage data, they are not typically used in data warehousing
and data management due to several limitaons.
Firstly, Excel les are not designed for large-scale data management and analysis, and can become slow
and unwieldy when handling large amounts of data. They also lack the ability to perform advanced
data transformaons, such as ltering, sorng, and aggregang data in real-me.
Secondly, Excel les are not well-suited for collaborave data management. It can be dicult to track
changes made by mulple users, and it is easy to accidentally overwrite or delete data.
Finally, Excel les are not parcularly secure or scalable, and can be suscepble to data corrupon and
loss.
In data warehousing and data management, more robust and scalable database systems such as SQL
Server, Oracle, and MySQL are typically used to store and manage data. These systems oer advanced
funconality for data processing, security, and scalability, making them beer suited for managing
large amounts of data and supporng business intelligence and decision-making processes.
4.What is meant by SQL.
SQL stands for Structured Query Language, which is a standardized programming language used for
managing and manipulang relaonal databases.
SQL is used to create, modify, and query databases, allowing users to perform a wide range of data
management tasks, including creang tables, adding and deleng data, modifying table structures,
and retrieving data from databases using queries.
SQL is a declarave language, which means that users specify what they want to do with the data,
rather than how to do it. SQL is designed to be both powerful and exible, allowing users to manipulate
data in complex ways while maintaining data integrity and security.
SQL is widely used in data warehousing and data management, and is supported by most relaonal
database management systems, including MySQL, SQL Server, Oracle, and PostgreSQL. Due to its
popularity and versality, SQL is considered an essenal skill for anyone working in the eld of data
management or data analysis.
5.What are dierent extensions found for image les
There are many dierent le extensions associated with image les, each with its own characteriscs
and uses. Here are some of the most common image le extensions:
1. JPEG/JPG (.jpg): A popular format for compressed digital images, commonly used for
photographs.
2. PNG (.png): A lossless format that supports transparency, commonly used for web graphics
and logos.
Document Summary
A database is the context of data warehousing and data management. It is a collection of structured data that is stored and organized in a way that enables efficient access and retrieval of information. In data warehousing, a database is typically used to store and manage large volumes of data from various sources, which is then transformed and analyzed to support business intelligence and decision- making processes. In data management, a database is used to store and manage data that is critical to an organization"s operations, such as customer information, financial records, and inventory data. Databases can be designed to support specific business processes or applications, and can be managed using various software tools and technologies to ensure data quality, security, and availability. There are many different types of databases and backend tools available, each with its own strengths and weaknesses.