ACCT10003 Lecture Notes - Lecture 10: Unstructured Data, Block Chain, Zettabyte

67 views4 pages
Big Data
Big Data
Data - raw facts/numbers that describe the characteristics of the an event
Information - data organised in a meaningful way to be useful to the user
Big Data refers to datasets whose size is beyond the ability of typical database software tools to
capture, store, manage and analyse
-Need new software/technology
The Four Big V’s of Big Data
1. Volume: exponential growth in generation, collection and storage of data
The amount of data generated worldwide in 2002 (5 billion gigabytes) now generated every 2
years
90% of the world’s data generated in the last 2 years
Terabytes, petabytes, zettabytes, brontobytes?
2. Velocity: the speed at which data is acquired and used
Determine meaning at a faster rate and in real time
Estimated that in one minute:
-YouTube users upload 72 hours of new video content
-Google received over 2 million search queries
-200 million email messages send
3. Variety: Data can be structured or unstructured
Structured Data
-has a defined length and format
-e.g. numbers, dates, strings
-usually stored in a traditional database; can be queried
-Data that accountants usually deal with
-Computer or machine generated:
-sensor data,
-web log data
-point-of-sale data
-financial data
-Human-generated:
-input data
-click-stream data
Unstructured Data:
-Machine generated:
-Satellite images e.g. google earth, weather data
-Scientific data e.g. seismic data, atmospheric data, climate data
-Photographs and video e.g. CCTV, traffic videos
-Radar or sonar data
-Human-generated:
-internal business textual data e.g. emails, documents reports
-Social media data
-Mobile data e.g. text messages, located data
-Website content
4. Veracity: the truthfulness (reliability) of the data
Reliability is a fundamental characteristic of quality data
Data may:
-Come from untrusted sources
-Be dirty (inaccurate or incomplete)
-Have a low signal-to-noise ratio
Structured v unstructured data
-Structured typically more reliable due to its nature
Blockchain technology
1
Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Youtube users upload 72 hours of new video content. Google received over 2 million search queries. 200 million email messages send: variety: data can be structured or unstructured, structured data. Usually stored in a traditional database; can be queried. Satellite images e. g. google earth, weather data. Scienti c data e. g. seismic data, atmospheric data, climate data. Photographs and video e. g. cctv, traf c videos. Internal business textual data e. g. emails, documents reports. Mobile data e. g. text messages, located data. Website content: veracity: the truthfulness (reliability) of the data, reliability is a fundamental characteristic of quality data, data may: Have a low signal-to-noise ratio: structured v unstructured data. Structured typically more reliable due to its nature: blockchain technology. New risk management skills: use simple vendor risk dashboards and lters to minimise inef ciencies and human error, understand and apply advanced query languages. Issues with big data and predictive analytics to achieve value: cost/bene t.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents