The other day I was playing around in Matlab, and although I can’t remember what I set out to do I did end up making a small lossy audio compression/decompression system! It seemed like a good topic for a blog post.

The discrete cosine transformation

Before I show the code I’ll have to very briefly introduce the discrete cosine transform (DCT). We should be able to ignore the maths and implementation of the DCT and treat it as a magic box which comes with Matlab or octave. If your interested in the details (and they are interesting) this book is a great place to start if you want more depth than wikipedia offers.

An audio sample is a sequence real numbers $$X = \{x_1, \ldots x_N\}$$. The DCT of this audio sample is the sequence, $$DCT(X) = Y = \{y_1, \ldots, y_N \}$$ such that

$$x_n = \sum_{k=1}^n y_k w(k) cos\left( \frac{\pi(2n-1)(k-1)}{2N} \right)$$

where

$$w(k) =\cases{\frac{1}{\sqrt{N}}, & k=1 \cr \sqrt{\frac{2}{N}}, & \text{otherwise}}.$$

Don’t worry too much about that expression. We just need note that the DCT represents the original signal as a sum of cosines, and that the coefficients specify the amplitude of these cosines.

If we have the DCT coefficients we can transform them back to the original sequence with the inverse discrete cosine transform (IDCT). This could be calculated with the above expression but more efficient algorithms exist for both the DCT and IDCT (these algorithms are based on the fast Fourier transform, which is again an interesting topic that I won’t get into).