AP CSP Unit 5: Cybersecurity – Complete 2025 Study Guide
AP CSP Unit 4: Data & Simulations — Complete 2025 Study Guide
Unit 4 of AP Computer Science Principles focuses on data — how it’s collected, stored, visualized, interpreted, and used in simulations to make predictions and decisions. This unit connects computer science with science, business, medicine, and real-world problem solving.
- How data is collected and cleaned
- The difference between raw data and information
- How data visualizations can reveal patterns
- What “big data” is and why it matters
- How simulations work and what they’re used for
- The limitations and biases in data & models
- AP-style questions and explanations
📊 Data vs. Information
AP CSP makes a clear distinction between data and information:
- Data – raw, unprocessed facts (numbers, clicks, temperatures, survey answers)
- Information – data that has been processed, organized, or interpreted to be useful
For example, a table of temperatures collected every hour is data. A graph showing average temperature per day, and a conclusion about a heat wave, is information.
📥 Collecting & Cleaning Data
Data can be collected from many sources:
- Surveys & forms
- Sensors (GPS, temperature, accelerometers)
- Web clicks & user interactions
- APIs and external data sets
Data Cleaning
Real-world data is often messy. Cleaning data may include:
- Removing duplicate records
- Handling missing values
- Fixing obvious errors or outliers
- Standardizing formats (dates, units, categories)
📈 Visualizations: Seeing Patterns in Data
Visual representations of data — charts, graphs, maps — help us find patterns, trends, and outliers that might be hidden in raw tables.
Common visualization types:
- Line graph – changes over time
- Bar chart – comparing categories
- Pie chart – parts of a whole
- Scatter plot – relationships between two variables
- Heat maps – density/intensity across an area
Exam themes:
- Which visualization best supports a claim?
- What conclusion can be drawn from this graph?
- Is the data representation misleading?
📦 Big Data & Large Datasets
Big data refers to datasets that are too large or complex to be processed on a single machine using traditional tools.
Examples:
- Millions of GPS locations from phones
- Click data from a major website
- Medical records from hospitals nationwide
- Social media posts over many years
What big data enables:
- Detecting trends and patterns
- Training machine learning models
- Personalized recommendations (music, shopping)
- Forecasting traffic, weather, or disease spread
🧪 What Is a Simulation?
A simulation is a computer model that imitates a real-world process or system. Instead of experimenting on the real thing (which might be expensive, dangerous, or impossible), we experiment on the model.
Examples of simulations:
- Weather forecasting
- Disease spread in a population
- Traffic flow in a city
- Stock market behavior
- Physics engines in games
Why use simulations?
- They are cheaper and safer than real-world experiments
- They allow many “what if” scenarios
- They can run faster than real time
🧠 Limitations & Bias in Data and Models
Both data and simulations have limitations. The AP exam loves questions about what you can and cannot conclude from data.
Common limitations:
- Sampling bias – data is collected from an unrepresentative group
- Measurement error – tools or methods are inaccurate
- Outdated data – conditions have changed over time
- Overfitting – model matches past data too closely and fails on new data
Bias examples:
- A survey only sent to people online
- A dataset missing certain groups of people
- Algorithms that reflect historical inequalities
📚 Aggregation & Data Operations
Computers are great at performing repetitive operations on large datasets. Common operations include:
- Aggregation – computing summaries, such as sums, counts, averages, minimums, and maximums
- Filtering – keeping only data that meets a condition (e.g. “only students with GPA ≥ 3.5”)
- Clustering – grouping similar data points together
- Classifying – assigning data to categories or labels
These operations turn raw data into useful information, which is exactly what the AP exam focuses on.
📝 AP Exam-Style Practice Questions
Question 1
Which cybersecurity attack attempts to overwhelm a server to make it unavailable?
- A. Phishing
- B. Keylogging
- C. DDoS attack
- D. Malware infection
Correct Answer: C
A DDoS attack disrupts service availability by flooding a server with requests.
Question 2
What is the purpose of a digital certificate used by websites?
- A. To encrypt passwords locally on the user’s computer
- B. To verify the identity of a website
- C. To detect malware on the server
- D. To back up stored user data
Correct Answer: B
Digital certificates authenticate websites and help establish secure connections.
Question 3
Which of the following best protects the confidentiality of data being transmitted online?
- A. Using a long username
- B. Encrypting data with HTTPS/TLS
- C. Storing data on a local device instead of the cloud
- D. Closing unused browser tabs
Correct Answer: B
Encryption prevents attackers from reading intercepted data.
Question 4
Why can symmetric encryption be less secure than public-key encryption?
- A. It cannot encrypt large files.
- B. It requires both parties to share the same secret key.
- C. It is much slower than public-key encryption.
- D. It cannot be used over the Internet.
Correct Answer: B
Sharing a secret key introduces risk—if intercepted, all communication is compromised.
Question 5
Which human-factor issue is a common cause of security breaches?
- A. Using encrypted network protocols
- B. Users selecting weak or reused passwords
- C. Servers performing regular backups
- D. Installing timely software updates
Correct Answer: B
Human error—especially weak passwords—is one of the biggest vulnerabilities.
Need Help? Get AP CSP Tutoring
Learn About Tutoring