STAT C100 Lecture Notes - Lecture 2: Apache Spark, Ipython, Data Science

73 views3 pages
13 Oct 2018
School
Department
Course
Professor
Data100 Lecture02
Goals For Today
Introduce Pandas, with emphasis on:
o Key Data Structures (data frames, series, indices).
o How to index into these structures.
o How to read files to create these structures.
o Other basic operations on these structures.
Go over some important and handy iPython features and concepts:
o Shell commands (e.g. !dir or !ls).
o Portable vs. operating system specific code.
o Shift-tab.
Solve some very basic data science problems using Jupyter/pandas.
Pandas Data Structures: Data Frames, Series, and Indices
There are three fundamental data structures in pandas:
Data Frame: 2D data tabular data.
Series: 1D data. I usually think of it as columnar data.
Index: A sequence of row labels.
We can think of a Data Frame as a collection of Series that all share the same Index.
Indices Are Not Necessarily Row Numbers
Indices (a.k.a. row labels) can also:
Be non-numeric.
Have a name, e.g. “State”.
The row labels that constitute an index do not have to be unique.
Left: The index values are all unique and numeric, acting as a row number.
Right: The index values are named and non-unique.
Column names in Pandas are always unique!
Example: Can’t have two columns named “Candidate”.
Indexing with The [] Operator
Given a dataframe, it is common to extract a Series or a collection of Series. This process is also known as
“Column Selection” or sometimes “indexing by column”.
Column name argument to [] yields Series.
List argument (even of one name) to [] yields a Data Frame.
We can also index by row numbers using the [] operator.
Numeric slice argument to [] yields rows.
Example: [0:3] yields rows 0 to 2.
Summary
Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related textbook solutions