Topic 2.2: Data Compression | AP CSP Big Idea 2 | APCSExamPrep.com

AP CSP Course Big Idea 2 2.2 Data Compression
2.2
Big Idea 2 • Data (DAT)

Data Compression

🕐 ~20 min FREE 📖 4 MCQ practice questions 🎮 1 interactive game DAT-1.D

After this lesson, you will be able to:

  • Distinguish between lossless and lossy compression and explain how each works
  • Explain why lossy compression typically achieves greater size reduction than lossless
  • Select the appropriate compression type for a given scenario based on priorities
  • Explain why fewer bits does not necessarily mean less information
📈 Exam weight: Topic 2.2 tests scenario judgment — choosing lossless or lossy for a given context. Expect 1–2 MCQs presenting real-world scenarios. Know the two rules cold.
💡 Think about this first

Spotify streams 320 kbps MP3 audio — that's lossy compression. Apple Music streams lossless ALAC at up to 9,216 kbps. The difference in file size is about 30x. Most people can't hear the difference on normal headphones. Some audiophiles insist they can. This is the data compression trade-off made real: how much quality loss is acceptable in exchange for how much size reduction? The AP exam asks you to make this judgment for specific contexts.

Why Compress Data?

Data takes up space — on disk, in memory, and on the wire when transmitted. A raw, uncompressed 4K video runs about 50 GB per hour. An uncompressed audio track of a three-minute song can be 30 MB. These sizes are impractical for storage and transmission. Data compression solves this.

Data compression reduces the number of bits used to represent data. The key question: can you reduce bits without losing information? The answer depends on the type of compression.

🎯 Exam tip

The AP exam almost always presents compression as a scenario question: “A hospital stores patient X-rays. Which compression is most appropriate?” Know the rule: when quality or exact reconstruction is critical → lossless. When minimizing size is the priority and some quality loss is acceptable → lossy.

Lossless Compression

Lossless compression reduces the number of bits stored or transmitted while guaranteeing complete reconstruction of the original data. Not a single bit of information is permanently lost.

How does it work? By finding and exploiting patterns and redundancy in the data. A simple example: instead of storing “AAAAAABBBBCCCC” as 14 characters, a lossless algorithm might store it as “6A4B4C” — 6 characters. The original can be perfectly reconstructed from the compressed version.

Real-world lossless formats: ZIP files, PNG images, FLAC audio, GIF images.

When to choose lossless:

  • Medical images (X-rays, MRIs) — pixel-perfect accuracy required for diagnosis
  • Legal and financial documents — exact reconstruction is legally required
  • Software and code files — a single altered bit can break a program
  • Scientific data — measurement precision must be preserved

Lossy Compression

Lossy compression can significantly reduce the number of bits stored or transmitted — but only allows reconstruction of an approximation of the original data. Some information is permanently removed.

Lossy algorithms are smarter about what they remove: they discard data that humans are less likely to notice. MP3 audio removes frequencies outside typical human hearing. JPEG images reduce fine color detail that the eye struggles to perceive at normal viewing distances. The result can be dramatically smaller files with minimal perceived quality loss.

Real-world lossy formats: JPEG images, MP3 audio, MP4/H.264 video, AAC audio.

When to choose lossy:

  • Music streaming — slightly reduced audio quality is acceptable at much smaller file sizes
  • Video streaming — some visual quality traded for smooth playback over limited bandwidth
  • Social media photos — storage cost reduction outweighs minor quality loss
  • Video conferencing — minimizing transmission time is more important than perfect quality
⚠ Common exam trap

Students sometimes think lossy compression just means “lower quality.” The precise AP definition is: lossy compression allows only reconstruction of an approximation of the original data. The original data cannot be recovered once lossy compression is applied. This is why you cannot convert an MP3 back to a lossless WAV and recover the original recording.

The Key Trade-off

Lossy compression can reduce file size more than lossless, but at the cost of permanent data loss. The right choice depends entirely on the use case. Neither is universally better — the AP exam tests your ability to match the compression type to the context.

Key Vocabulary

Term AP Definition Plain English
Data compression The process of reducing the number of bits needed to represent data Making files smaller
Lossless compression Compression that guarantees complete reconstruction of the original data Smaller file, no information lost — ZIP, PNG
Lossy compression Compression that allows only reconstruction of an approximation of the original data Smaller file, some data permanently removed — JPEG, MP3
Redundancy Repeated or predictable patterns in data that compression algorithms exploit The thing lossless compression finds and removes
Compression ratio The ratio of the original file size to the compressed file size How much smaller the file gets
📋 Create Task connection

Big Idea 2 data concepts appear in the Create Task when you describe how your program processes or uses data. Understanding how to extract information and work with datasets will strengthen your written response. See the Create Task module →

📈
MCQ Practice
4 questions • AP exam difficulty • Instant feedback
Question 1 of 4
A radiologist needs to store thousands of patient X-ray images in a hospital database. The images must be retrievable in their exact original form for diagnostic accuracy. Which type of compression is MOST appropriate?
Incorrect. Lossy compression permanently removes data and cannot reconstruct the original exactly — this is unacceptable for diagnostic images where pixel accuracy can affect patient outcomes.
Correct. Lossless compression guarantees complete reconstruction of the original. For medical images where diagnostic accuracy is critical, exact reconstruction is required. This is the textbook use case for lossless compression.
Incorrect. Medical imaging quality loss is NOT imperceptible — small differences in pixel values can affect diagnosis. This is precisely why medical imaging uses lossless compression.
Incorrect. Not using compression is impractical given storage costs. Lossless compression reduces size without any data loss, making it the correct choice.
Question 2 of 4
A music streaming service compresses songs to reduce bandwidth requirements. Some audio frequencies are permanently removed during compression, but most listeners cannot detect the difference. Which type of compression BEST describes this process?
Incorrect. Lossless compression guarantees complete reconstruction with no permanent data removal. This scenario explicitly states frequencies are permanently removed.
Correct. Lossy compression permanently removes some data (here, audio frequencies outside typical hearing range) to achieve smaller file sizes. The result is an approximation of the original, not the original itself — exactly as described.
Incorrect. Run-length encoding is a lossless technique. It doesn't permanently remove information.
Incorrect. Lossless compression with redundancy removal still guarantees exact reconstruction. Permanently removing audio frequencies means this is lossy.
Question 3 of 4
Which of the following statements about data compression is accurate?

I. Fewer bits always means less information.
II. Lossless compression can guarantee complete reconstruction of the original data.
III. Lossy compression typically achieves greater size reduction than lossless compression.
Incorrect. Statement I is false. The AP CED explicitly states that fewer bits does NOT necessarily mean less information. Lossless compression reduces bits without removing information.
Incorrect. Statement II is correct, but Statement III is also correct and shouldn't be excluded.
Incorrect. Statement I is false (fewer bits does not necessarily mean less information).
Correct. Statement II is true (lossless guarantees exact reconstruction). Statement III is true (lossy can achieve greater size reduction because it permanently removes data). Statement I is false — the CED explicitly states fewer bits does not necessarily mean less information.
Question 4 of 4
A developer is choosing a compression format for images on a website. She wants the smallest possible file sizes to minimize page load times, and slight visual quality reduction is acceptable. Which format BEST meets her needs?
Incorrect. PNG is lossless — it produces larger files than lossy formats. When minimizing file size is the priority and quality loss is acceptable, lossy is more appropriate.
Incorrect. FLAC is an audio format, not an image format. It's also lossless.
Correct. JPEG uses lossy compression, which achieves significantly smaller file sizes than lossless formats. Since slight quality reduction is acceptable, JPEG matches the stated priorities: minimize file size, accept quality trade-off.
Incorrect. ZIP is lossless compression for files, not specifically optimized for images. It would produce larger files than JPEG for this use case.
🎮 Lesson Game
Compression Sorter
Drag each item into the correct bucket • Lossless or Lossy?
0
Score
Round 1/3
Round
0/4
Placed
📋 Drag these items into the correct bucket:
🟢 Lossless
🟠 Lossy
0/12
items correctly sorted

Frequently Asked Questions

Lossless when quality or exact reconstruction is maximally important (medical images, legal documents, software). Lossy when minimizing data size or transmission time is maximally important and some quality loss is acceptable (streaming, social media). This exact distinction comes from the CED (DAT-1.D.7 and DAT-1.D.8).
No. Once lossy compression is applied, the removed data is permanently gone. An MP3 file cannot be converted back to the exact original audio. A JPEG cannot recover the original image data. This is the critical difference from lossless, which always allows perfect reconstruction.
No — this is a directly tested AP fact. The CED states that fewer bits does NOT necessarily mean less information. Lossless compression reduces bits without removing any information. This statement appears in the essential knowledge and shows up on the exam.
Lossless: PNG (images), FLAC (audio), GIF (images), ZIP (files), PDF (text). Lossy: JPEG (images), MP3 (audio), MP4/H.264 (video), AAC (audio). You don't need to memorize these for the exam, but understanding what type each format represents helps answer scenario questions.
📦
AP CSP Teacher SuperpackSlides, lesson plans, tests + answer keys for all 5 Big Ideas — $249
Get the Superpack →
🏫
For teachers

The Superpack includes a full lesson plan for this topic with editable slides, student guided notes, and a unit test with answer key covering all of Big Idea 2. View what's included →

Get in Touch

Whether you're a student, parent, or teacher — I'd love to hear from you.

Just want free AP CS resources?

Enter your email below and check the subscribe box — no message needed. Students get daily practice questions and study tips. Teachers get curriculum resources and teaching strategies.

Typically responds within 24 hours

Message Sent!

Thanks for reaching out. I'll get back to you within 24 hours.

🏫 Welcome, fellow educator!

I offer curriculum resources, practice materials, and study guides designed for AP CS teachers. Let me know what you're looking for — whether it's classroom materials, a guest speaker, or Teachers Pay Teachers resources.

Email

[email protected]

📚

Courses

AP CSA, CSP, & Cybersecurity

Response Time

Within 24 hours

Prefer email? Reach me directly at [email protected]