2.2 and 2.3 notes
notes for 2.2 and 2.3
2.2 notes
- It describes data compression and how it is used to reduce the number of bits needed to represent data, save transmission time, and storage space.
- It distinguishes between lossy and lossless data compression.
- The message highlights the Image Lab Project’s data concepts, which involve compression and analyzing size.
- It explains the meta-data and considerations when managing image files, including file type, size, height, width, number of pixels, visual perception, displaying images in Python Jupyter notebooks, and using Python libraries and concepts.
- It asks questions related to accessing files in terminal, working with different operating systems, and observations, struggles, or learnings while working with the code.
- The message examines why the path is a big deal when working with images and how the metadata source and label relate to Unit 5 topics.
- It describes IPython and why it is interesting in Jupyter Notebooks for both Pandas and images.
- The message outlines different ways to read and encode images, including using PIL, base64, and numpy.
- It explains the purpose and results of a program that creates meta-data and manipulates images.
- It asks questions about the Grey Scale algorithm, scaling images, and how these topics relate to data compression.
2.3 notes
Data Cleaning:
- The dataset contains missing data points and invalid data.
- One data point is inaccurate, where “Junior” is entered instead of a number.
-
It’s crucial to clean the data before performing any analysis, as garbage in will result in garbage out. Extracting Info:
- We used Pandas’ DataFrame to extract information from the dataset.
- We extracted one column and two columns while removing the index from the print statement.
- This allows us to focus on the specific information we need for our analysis.
DataFrame Sort:
- We used sort_values() to sort the DataFrame by the GPA column.
- The DataFrame is sorted both in ascending and descending order.
- Sorting helps to make sense of the data and allows us to draw meaningful insights.
DataFrame Selection or Filter:
- We used conditional statements to select data from the DataFrame.
- We filtered out all data where GPA is less than or equal to 3.00.
- Filtering helps us to select the specific data we need for our analysis.
DataFrame Selection Max and Min:
- We used conditional statements to select the maximum and minimum GPA in the DataFrame.
- The output showed the data of students who scored the highest and lowest GPA.
- Selecting the maximum and minimum values helps us to identify the highest and lowest achievers.
Creating Own DataFrame:
- We created our own DataFrame using a Python dictionary.
- We used pd.DataFrame(dict) to convert the dictionary to a DataFrame.
- Creating our own DataFrame allows us to work with data that we are interested in.