DATAWARE HOUSING AND DATA MINING Study Guide - Midterm Guide: Data Mining, Data Warehouse, Stream Processing

6 views8 pages

mupparajugirishchowdary

26 May 2023

School

Centurion University of Technology and Management - CUTM

Department

Accounting

Course

DATAWARE HOUSING AND DATA MINING

Professor

For unlimited access to Study Guides, a Grade+ subscription is required.

MODULE VI

1.What is a data stream and give its characteriscs.

A data stream means a connuous ow of data that is generated and processed in real-me. In contrast

to stac datasets, data streams are dynamic, constantly changing, and are oen produced by various

sources, such as sensors, social media, or online transacons.

Here are some of the key characteriscs of data streams:

1. Unbounded: Data streams are typically unbounded, meaning that the size of the data can grow

innitely over me, and there is no xed endpoint for the stream.

2. Connuous: Data streams are generated connuously over me, without any pause or stop.

Therefore, it is crucial to process them in real-me or near real-me to keep up with the pace

of the data.

3. Fast-moving: Data streams are oen high-speed and fast-moving, which means that they must

be processed rapidly and eciently to avoid data loss or latency.

4. Variable in volume: The volume of data generated in a data stream can vary signicantly,

depending on the source and the specic context of the data.

5. Noisy and incomplete: Data streams are oen noisy, incomplete, and contain errors, which can

make it challenging to extract meaningful insights from the data.

6. Potenally innite: Since data streams are unbounded, it is possible that they may connue to

produce data indenitely, making it impossible to analyze the enre dataset.

Overall, data streams present unique challenges and opportunies for real-me data processing and

analysis, and require specialized techniques and tools to extract useful insights and knowledge from

them.

2.What are the applicaons of data streams.

Data streams are used in data warehousing to improve the speed and eciency of data processing,

analysis, and decision-making.

Here are some examples of data streams are applied in data warehousing:

1. Real-me data integraon: Data streams can be used to integrate data from mulple sources

in real-me, allowing organizaons to make faster and more accurate decisions based on the

most up-to-date informaon.

2. Real-me analycs: Data streams can be used for real-me analycs, where queries are

applied to the data stream in real-me to idenfy paerns, trends, and anomalies.

This approach enables organizaons to detect and respond to emerging trends and issues in

real-me.

3. Event processing: Data streams can be used for event processing, where events or nocaons

are generated in real-me based on predened criteria.

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

This approach can help organizaons monitor crical business processes, detect anomalies,

and trigger alerts or acons as needed.

4. Connuous data warehousing: Data streams can be used to connuously update data

warehouses, allowing organizaons to make decisions based on the most up-to-date

informaon.

This approach is parcularly useful in fast-moving industries such as nance, retail, and

healthcare.

5. Real-me reporng and dashboarding: Data streams can be used to provide real-me

reporng and dashboarding, allowing organizaons to monitor key performance indicators

(KPIs) and make informed decisions based on real-me data.

Overall, data streams oer numerous opportunies to improve the eciency and eecveness of data

warehousing, and to provide real-me insights and intelligence to support decision-making and

improve business operaons.

3.Give architecture of stream query processing.

The architecture of stream query processing typically consists of several components that work

together to process and analyze data streams in real-me. Here are some of the key components:

1. Stream source: The data stream source is the inial source of the data, such as a sensor or a

data feed. The data stream is generated from this source and is connuously fed into the

system.

2. Stream processing engine: This component is responsible for processing the data stream in

real-me. The engine applies various transformaons, lters, and aggregaons to the data

stream to extract meaningful insights and perform analysis.

3. Query language: A query language is used to express the stream processing logic and to specify

the operaons that should be performed on the data stream. Common query languages for

stream processing include SQL, StreamSQL, and StreamForge.

4. Stream storage: The stream storage component is used to store and manage the incoming

stream of data. The storage system must be able to handle high volumes of data, and provide

fast retrieval and query processing capabilies.

5. Stream analycs: Stream analycs components use machine learning, stascal modeling,

and other techniques to perform real-me analysis of the data stream. This component can

be used to detect anomalies, predict outcomes, and perform other types of analysis on the

data.

6. Stream visualizaon: Stream visualizaon components provide graphical representaon of the

real-me data stream, such as charts, graphs, and dashboards. This component can help users

to quickly understand and visualize the stream data.

Overall, the architecture of stream query processing is designed to handle high-speed, high-volume,

and constantly changing data streams, and to provide real-me analysis and insights to support

decision-making and improve business operaons.

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

4.Explain about random sampling and histogram

Random sampling and histogram are two common techniques used in data warehousing and data

mining to analyze large data sets.

1. Random Sampling: Random sampling is a stascal technique that involves selecng a subset

of data from a larger data set at random. This technique is commonly used in data warehousing

and data mining to obtain a representave sample of the data for analysis.

Random sampling is parcularly useful when working with large data sets where it is impraccal or

me-consuming to analyze the enre data set. By selecng a smaller representave sample of the

data, analysts can sll obtain meaningful insights and make informed decisions based on the data.

Random sampling can be performed using various sampling techniques, such as simple random

sampling, straed random sampling, or cluster sampling. The choice of sampling technique will

depend on the characteriscs of the data set and the research queson being addressed.

2. Histogram: A histogram is a graphical representaon of the distribuon of a data set. The data

is grouped into intervals or bins, and the frequency of observaons within each interval is

ploed on the y-axis.

Histograms are commonly used in data warehousing and data mining to explore the distribuon of a

data set and to idenfy paerns or trends. They can be parcularly useful when working with

connuous or numerical data, such as sales data or customer demographics.

Histograms can help analysts to idenfy outliers, anomalies, or gaps in the data, as well as to idenfy

trends or paerns in the data. By analyzing the histogram, analysts can gain a beer understanding of

the data and make more informed decisions based on the data.

In summary, random sampling and histogram are two important techniques used in data warehousing

and data mining to analyze large data sets. These techniques can help analysts to obtain representave

samples of the data and to explore the distribuon of the data to idenfy paerns, trends, and

anomalies.

5.Explain about mul resoluon model and randomized algorithms

Mul-resoluon models and randomized algorithms are two important techniques used in data

warehousing and data mining to improve the eciency and accuracy of data analysis.

1. Mul-Resoluon Models: Mul-resoluon models involve represenng data at dierent levels

of abstracon or detail. This technique is parcularly useful when working with large, complex

data sets, where it may be impraccal or me-consuming to analyze the data at its full

resoluon.

Mul-resoluon models can be used to simplify the data and focus on the most important features or

paerns, while sll retaining the overall structure of the data. This can help analysts to beer

understand the data and make more informed decisions based on the data.

Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

QUESTION 1

Value stream costing eliminates most of the transactions associated with conventional cost accounting because it ...

	A.	makes more extensive use of standard costing.
	B.	gathers and reports financial data in summary form at the value stream level rather than for each production job or product.
	C.	automates financial reporting processes.
	D.	uses more detailed and accurate overhead cost allocations than standard costing.

1 points

QUESTION 2

In value stream costing, the labor costs assigned to a value stream ...

	A.	include the costs of all personnel assigned to the value stream, plus allocations for support staff in all departments that support the value stream.
	B.	include the costs of all personnel assigned to the value stream, but few if any allocations for any support staff outside the value stream.
	C.	include the costs of all direct and indirect labor hours in the value stream, plus allocations of staff outside the value stream providing production support; but exclude any sales, marketing, or administrative personnel in the value stream.
	D.	include the costs of all direct and indirect labor hours in the value stream, but exclude the costs of any sales, marketing or administrative personnel in the value stream.

1 points

QUESTION 3

Value stream costing simplifies accounting in part by ...

	A.	centralizing and consolidating the purchasing function.
	B.	increasing the number of cost centers to more easily expose cost overruns.
	C.	allocating all overhead using actual direct labor hours or machine hours rather than relying on more complex calculations.
	D.	eliminating detailed labor reporting, including the distinction between direct and indirect labor.

1 points

QUESTION 4

When using value stream costing, managers evaluate decisions about product rationalization (keeping or dropping a product) by ...

	A.	examining the effects on overall value stream results of continuing to produce and sell a product versus dropping it.
	B.	comparing the production cost per unit of all products in the value stream.
	C.	disregarding the financial impact, and examining instead the effects on the value stream operating measures shown in the box score of continuing to produce and sell a product versus dropping it.
	D.	comparing the gross profit margins for all products.

1 points

QUESTION 5

Exotic Chocolates had material costs of $24,000 and conversion costs of $18,000 in their most recent month containing 20 working days. At the end of the month, Exotic Chocolates had material on hand equal to six days of production, work-in-process equal to three days of production, and finished goods equal to eight days of production. The value for ending inventory using the âdays of stockâ method is ...

A.

$ 28,950

B.

$ 32,550

C.

$ 35,700

D.

$ 42,000

1 points

QUESTION 6

Which of the following is an example of a good use of the value stream box score?

	A.	Plant and division managers using the box score analyze variances from budgeted standard costs.
	B.	Plant and division managers using the box score to encourage and evaluate the results of competition between their value streams for best overall performance.
	C.	Value stream continuous improvement teams using the box score to design high-impact kaizen events and improvement plans.
	D.	Employees in the production cells using the box score to manage daily performance.

1 points

QUESTION 7

The box score measures three dimensions of performance in the value stream: operational, financial, and ...

A.

quality.

B.

capacity.

C.

flow.

D.

strategic.

1 points

QUESTION 8

Displaying actual value stream results in a box score with a column for each week in a quarter is a good way to use the box score to ...

	A.	show the effects of planned lean improvements.
	B.	track value stream performance and continuous improvement.
	C.	evaluate the effect of tactical decision alternatives, such as accepting a special order.
	D.	evaluate the effect of management decision alternatives, such as adding a product line.

1 points

QUESTION 9

Displaying current state performance and future states based on the expected results of pursuing different strategy alternatives. is a good way to use the box score to ...

	A.	evaluate the effect of strategic decision alternatives, such as adding a product line.
	B.	track value stream performance and continuous improvement.
	C.	show the effects of planned lean improvements.
	D.	show the impact of actual lean improvements.

1 points

QUESTION 10

A good way to use the box score to evaluate the impact of lean improvements is to ...

	A.	display current state performance and a future state based on planned improvements.
	B.	display current state performance and future states based on the expected results of pursuing different strategy alternatives.
	C.	display actual box score results for all the weeks in a quarter to show the trends in performance.
	D.	display current performance for the most current week or month, without reference to expected future states or performance trends.

peachmouse421

2.5. Match the following terms with their definitions.

Term

_____a. dataprocessing cycle

_____b. source documents

_____c. turnaround documents

_____d. source data automation

_____e. general ledger

_____f. subsidiary ledger

_____g. control account

_____h. coding

_____i. sequence code

_____j. block code

_____k. groupcode

_____l. mnemonic code

_____m. chart ofaccounts

_____n. general journal

_____o. specialized journal

_____p. audittrail

_____q. entity

_____r. attribute

_____s. field

_____t. record

_____u. datavalue

_____v. master file

_____w. transactionfile

_____x. database

_____y. batchprocessing

_____z. online, real-time processing

Definition

1. Contains summary-level data for every asset, liability, equity,revenue, and expense account

2. Items are numbered consecutively to account for all items; missingitems cause a gap in the numerical sequence

3. Path of a transaction through a data processing system from pointof origin to final output, or backward from final output to pointof origin

4. List of general ledger account numbers; allows transaction data tobe coded, classified, and entered into proper accounts; facilitatespreparation of financial statements and reports

5. Contents of a specific field, such as âGeorgeâ in a name field

6. Portion of a data record that contains the data value for aparticular attribute, like a cell in a spreadsheet

7. Company data sent to an external party and then returned to thesystem as input

8. Used to record infrequent or nonroutine transactions

9. Characteristics of interest that need to be stored

10. Thesteps a company must follow to efficiently and effectively processdata about its transactions

11. Something about which information is stored

12. Storescumulative information about an organization; like a ledger in amanual AIS

13. Contains detailed data for any general ledger account with manyindividual subaccounts

14. Contains records of individual business transactions that occurduring a specific time period

15. Updating each transaction as it occurs

16. Devicesthat capture transaction data in machine-readable form at the timeand place of their origin

17. Used torecord large numbers of repetitive transactions

18. Set ofinterrelated, centrally coordinated files

19. Two ormore subgroups of digits are used to code items

20. Updating done periodically, such as daily

21. Systematic assignment of numbers or letters to items to classifyand organize them

22. Lettersand numbers, derived from the item description, are interspersed toidentify items; usually easy to memorize

23. Initialrecord of a transaction that takes place; usually recorded onpreprinted forms or formatted screens

24. Fieldscontaining data about entity attributes; like a row in aspreadsheet

25. Sets ofnumbers are reserved for specific categories of data

26. Thegeneral ledger account corresponding to a subsidiary ledger, wherethe sum of all subsidiary ledger entries should equal the amount inthe general ledger account

fuchsiahorse793

You have recently accepted a position with Vitex, Inc., the manufacturer of a popular consumer product. During your first week on the job, the vice president has been favorably impressed with your work. She has been so impressed, in fact, that yesterday she called you into her office and asked you to attend the executive committee meeting this morning for the purpose of leading a discussion on the variances reported for last period. Anxious to favorably impress the executive committee, you took the variances and supporting data home last night to study.

On your way to work this morning, the papers were laying on the seat of your new, red convertible. As you were crossing a bridge on the highway, a sudden gust of wind caught the papers and blew them over the edge of the bridge and into the stream below. You managed to retrieve only one page, which contains the following information:

Standard Cost Card
Direct materials, 2.00 pounds at $16.60 per pound	$	33.20
Direct labor, 1.00 direct labor-hours at $15.60 per direct labor-hour	$	15.60
Variable manufacturing overhead, 1.00 direct labor-hours at $9.30 per direct labor-hour	$	9.30

	Total Standard Cost*		Variances Reported
	Total Standard Cost*		Price or Rate			Quantity or Efficiency
Direct materials	$	498,000	$	9,280	F	$	33,200	U
Direct labor	$	234,000	$	3,200	U	$	15,600	U
Variable manufacturing overhead	$	139,500	$	4,700	F	$	? â	U

^*Applied to Work in Process during the period.

^â Entry obliterated.

You recall that manufacturing overhead cost is applied to production on the basis of direct labor-hours and that all of the materials purchased during the period were used in production. Work in process inventories are insignificant and can be ignored.

It is now 8:30 a.m. The executive committee meeting starts in just one hour; you realize that to avoid looking like a bungling fool you must somehow generate the necessary âbackupâ data for the variances before the meeting begins. Without backup data it will be impossible to lead the discussion or answer any questions..

Required:

1. How many units were produced last period?

2. How many pounds of direct material were purchased and used in production?

3. What was the actual cost per pound of material? (Round your answer to 2 decimal places.)

4. How many actual direct labor-hours were worked during the period?

5. What was the actual rate paid per direct labor-hour? (Round your answer to 2 decimal places.)

6. How much actual variable manufacturing overhead cost was incurred during the period?

violetdolphin397

DATAWARE HOUSING AND DATA MINING Study Guide - Midterm Guide: Data Mining, Data Warehouse, Stream Processing

Get access

Related Documents

MKT 700 Lecture Notes - Lecture 2: Information Overload, Unstructured Data, Compound Annual Growth Rate

Related Questions

QUESTION 1

QUESTION 2

QUESTION 3

QUESTION 4

QUESTION 5

QUESTION 6

QUESTION 7

QUESTION 8

QUESTION 9

QUESTION 10

	A.	$ 28,950
	B.	$ 32,550
	C.	$ 35,700
	D.	$ 42,000

	A.	quality.
	B.	capacity.
	C.	flow.
	D.	strategic.