Representing images

Don Blaheta, Longwood University
Why are we working on this? When it comes to representing information, our goal is to be able to represent any kind of information as a collection of numbers. Just numbers. That may seem particularly challenging in the case of images, but here we will work through a few aspects of that representational task to see how it works. Skills in this section: Subdivide images into grids of pixels; Represent colors as RGB triples Concepts: Data representation

One of the standard types of data that a modern computer needs to be able to process is the image. In the early years of digital computers, control and interaction with computers was done through numeric switches and later text terminals, and memory and computation were too expensive to be able to do much with images. Indeed, processing images at all was something of a niche area of research. Now, of course, the computer in your phone or even your wristwatch can be expected to have a graphical interface and handle images and video automatically.

As with text and other kinds of data, we will see what representational work needs to be done in order to encode an image as a sequence of numbers, so that computation can be performed.

On this page, we'll see two important representational steps: first, to take something that the human eye could physically view as a single, complex, continuous image, and break it up into small discrete pieces that can each be separately encoded; and second, to take the myriad colors that human eyes can see and divide them into separate components that can each be represented by a single number.

Subdividing images

You're probably reading this on a screen right now. Take a moment and look away from the screen, at physical things near you. There are probably a variety of straight lines in your field of vision, oriented at different angles. There are probably also a number of things that are smoothly round, whether circular or oval or some more complex shape. On all of those things there are a mix of color and shading, some shiny spots, some shadows, many with a smooth or "gradient" transition as well as many with a sharper change between colors. In order to have any chance of representing all of that visual variety, we need to break it down into smaller representable chunks.

One standard way to do so is to overlay the image to be represented with a rectangular grid. Each cell in the grid is called a pixel, and within this kind of system, each pixel is displayed as having a single, uniform color. Unless the underlying image is itself laid out on a rectangular grid that is perfectly aligned with the pixel grid, that means the representation is imperfect, and each pixel may thus use a sort of "average" color to represent its section of the underlying image.

[rotunda
                image with 24x24 pixelation]
Figure 1: Rotunda image in 24x24 grid

When you look at anything on a computer screen or television, or anything printed in the last several decades, this has already been done—although with printing and some modern displays, the grid may be very, very small. However, it's illustrative to think about what happens if we make the grid much bigger, much coarser, and see what happens in the resulting images. To the right in Figure 1 is the "rotunda" logo of Longwood University—a simple image that in its ideal form includes some straight lines at different angles as well as some curves—as viewed through a 24x24 grid of pixels. Even with such a coarse grid it is possible to make out some basic shapes, and if you squint enough you might be able to blur the edges into seeming smoothness. But it should also be clear that the image is nowhere near ideal. Although you can discern two "circles" around the outline, their outlines are blocky and jagged, and the space between them is not white but just lighter blue-grey in most places, because the grid squares were so big that the "average" color within the square included part of one or both (blue) circles as well as the white area between them. The theoretically-straight lines of the roof are also very blocky and stair-stepped, and the vertical and horizontal lines of the columns avoid the stair-stepping but also manage to not be all-white or all-dark-blue. But even with all those problems, this is probably the best that can be done for that image, in a 24x24 grid of pixels.

[rotunda
                image with 32x32 pixelation]
Figure 2: Rotunda image in 32x32 grid

Increasing the number of pixels in the grid, when representing the same underlying image, will improve the quality of the image. Figure 2 shows the same image with a 32x32 grid—nearly twice as many total pixels as Figure 1—and the image certainly looks cleaner. The gap between the outlining circles is white or nearly white throughout, the columns appear more evenly spaced, and the roof now at least looks like it was supposed to have straight lines. Less squinting is required to make the edges look smooth. But it should still be clear that this is not a perfect representation of the underlying image: the individual pixels do not line up with the horizontal and vertical lines of the image, and can't line up with the angles and curves. The visual effect when the pixel grid is coarse enough to make the grid itself visible is called pixelation. It is to some extent an unavoidable consequence of using a grid to represent images; as the [rotunda
                image at higher resolution]
Figure 3: Rotunda image: best resolution
grid gets ever finer, though, it becomes easier to trick the eye into seeing what appears to be a perfect curve or a perfect angle. Figure 3 shows the same image without any filtering applied to demonstrate the pixel grid. However, it is still pixelated. With the quality of screens today, it may not be possible to see the effect without extra magnification, but in theory, if you could get close enough to the screen (or had a magnifying glass), you would still see tiny little stair steps around the curve and on the angles: this is a fundamental property of any representation system based on a pixel grid.

Resolution

When we talk about the resolution of an image or of a device, we are quantifying just how well it will be able to approximate an underlying image with curves and angles and tightly-spaced lines. Formally, this depends on many aspects of the image capturing, processing, and displaying systems, but in general usage (and for our purposes), the term refers to how many pixels are in the grid. When talking about the resolution of individual images, this is often given as a set of dimensions (24x24 and 32x32 in the above images) or as a total number of pixels in the image, typically measured in millions of pixels or megapixels (MP). A camera reporting a resolution of 16MP, for instance, may be able to record images with dimensions of 4608x3456 (which multiplies out to just shy of 16 million).

Another related use of the term "resolution" applies to display devices, such as screens or printers. Knowing that a screen can display 1920x1080 pixels (a standard resolution for high-definition TV, also known as "1080p") is useful, but not sufficient if you want to display something that is (say) two inches high. The screen with that resolution could be a 51-inch television; or it could be a 21-inch desktop monitor; or it could be a 5-inch smartphone. To specify the size of the pixels, a resolution can be measured in pixels per inch (ppi) or dots per inch (dpi) (the two are not quite synonymous, but close enough for now), which measure the number of pixels or dots in a single linear inch of display.

Screen resolution varies a lot. In screens sold with early personal computers, a resolution of 72ppi was common, and as manufacturing has improved and technology has shifted, that number has crept upward. Even relatively cheap desktop monitors are now typically 90ppi, and higher-quality monitors are 100ppi or more. Smartphones—which are typically held much closer to the eye than a monitor would be—have pushed this much higher. Apple's Retina display technology achieves a resolution of 326 ppi.

Printer and scanner resolution is also reported in dpi. Since they work by physically moving a sheet of paper past the print/scan head, it makes no sense to quote total dimensions of the device, as with screens, but otherwise the principle is basically the same. Print resolution is traditionally much higher than screen resolution (though new screen technology is becoming competitive); even low-quality printers have long been capable of 150dpi. Better inkjet printers are typically 300 or 600dpi, and quality laser printers (and professional printing equipment) are capable of 1200dpi or higher.

[img on a screen] [img on page, very tiny]
a. 72 ppi screen, 600 dpi printer, printing one dot per pixel
[img on a screen] [img on page, pixelated]
b. 72 ppi screen, 600 dpi printer, printing image at same size as it appears on screen
[img on a screen, zoomed in] [img on page, clean]
c. Screen zoomed in to work with image at high print-ready resolution
Figure 4: Working with images that will be printed

The disparity between screen resolution and printer resolution can present some challenges when preparing and printing documents containing images. When the image is prepared, it will be represented and stored as a grid of color values. If the person developing the image works primarily on a computer screen and doesn't account for the higher resolution of the printer, the resulting image will not be satisfactory. Figure 4 illustrates ways this could play out: in part a, we see that an image prepared for the screen (at 72 pixels per inch or thereabouts) will appear very, very small if printed with a single dot (at 600 dots per inch). One way to compensate for the problem is, as shown in part b, to print many dots on the page for each pixel on the screen—making the image the correct size on the page, but pixelated and with jagged edges. The best solution is shown in part c: when working with an image that is meant to be printable on paper, make sure to store and work with it at a higher resolution than would be needed for an image used on the screen only, zooming in as necessary to perform various editing tasks. If the image will be needed for both on-screen and print purposes, it is always possible to create a lower-resolution "zoomed out" copy of the image for use on-screen.

Grid alignment

Once we've decided to use a grid to subdivide our image, we've done the hard work, since now we can specify an entire image, one pixel after another. But if we want to be able to refer to an individual pixel within an entire image, we'll need a way to index that grid, so it's worth briefly diverting to the question of how that's usually done in the context of computer graphics.

[standard cartesian grid in algebra]
a: As used in most math classes
[standard cartesian grid on screen]
b: As used in computer graphics
Figure 5: Standard Cartesian grids

You've seen before, probably in a high school algebra class, the idea of using an X axis and a Y axis, perpendicular to each other, which together define a coordinate system for labelling points. The diagram in part a of Figure 5 should look familiar. The point (0,0) is right in the middle, where the axes cross; other points can be labelled, with positive or negative coordinate values that need not be nice round integers.

In most computer graphics contexts, though, we find it convenient to avoid fractions and negative numbers, so we put (0,0) in a corner and label each pixel separately. Furthermore, if we use the upper left corner for (0,0), with Y values increasing as we go down the image, it means that coordinates increase as we scan from top to bottom and left to right, just as if we were reading a page of text. In part b of Figure 5, you can see the origin in the upper left, and note that the other marked points have coordinates that are positive whole numbers, and indicate the number of pixels from the left or from the top of the image. It's possible to design software that doesn't work this way, of course, but nearly every image processing package you'll find that lets you refer to individual pixels will count them in this way.

Representing colors

In the last section, we showed how to break a larger image into smaller pieces, but our end goal is still to reduce everything to numbers. In order to do that, we have to think about how to represent colors, and in order to do that we need to learn a little bit about how color works.

[illustration of subtractive color]
Figure 6: Subtractive colors

Your past experience with color and color combination may have looked something like this: in preschool or elementary school, you learned that the primary colors were red, yellow, and blue, and you could combine them to make the others; perhaps in an art class later on you worked with combining different colors of watercolor or tempera paint. In general, if you are working with combining or layering physically colored things like pencils or crayons or paint, you are working with subtractive color. Figure 6 shows how this works; on the left of the illustration is a single white (full-spectrum) light source, but it is partially blocked by three different colored pieces of transparent plastic (called "gels" by theatre and film technicians). The yellow gel, for instance, only lets through yellow light. In areas where the light is blocked by both the yellow and the aqua/cyan gels, the resulting color is green—which in this system is thus the combination of cyan and yellow. When all the colors are combined, as in the middle of the diagram, no light passes through, and the result is black.

But computer screens work a little differently, using a system called additive color. Instead of starting with white light and progressively removing different components, we start with separate colors of light and add them together.

[illustration of additive color]
Figure 7: Additive colors
[pink]
Figure 8: Pink: (255, 149, 255)
[orange]
Figure 9: Orange: (227, 168, 54)
[cyan]
Figure 10: Dark cyan: (0, 185, 185)

Additive color is a better model for understanding how colors combine when it is the light itself that is colored. In Figure 7, we have three different lights (red, green, and blue) shining on the same surface. Where two of those lights overlap, a secondary color appears; and when all three combine we get white. The fundamental mechanism for displaying color on a computer monitor is that each pixel is composed of three tiny lights, one each in red, green, and blue. Every color is then expressed as a combination of levels of those three primary colors, from "no light" to "completely on" or somewhere in between.

These three intensity levels could be specified as a fraction between zero and one—and sometimes they are!—but as a matter of convention they are usually specified as integer values, between 0 and 255, and unless otherwise specified, the order is usually red, then green, then blue. These specifications are thus often referred to as RGB triples. For example, the triple for pure, bright, intense blue, with no other components, would be (0, 0, 255). A triple for a medium grey, with all components equal but at middle intensity, would be (128, 128, 128).

To the right you will see three figures that illustrate some different combinations of the three primary colors. In Figure 8, the RGB triple is (255, 149, 255); both red and blue are at full intensity, while the green circle is at just over half intensity. Where all three circles overlap, you can see that this produces pink. (What would happen if you kept the red at full intensity but decreased both green and blue? Go to an online RGB color generator such as the one at RapidTables and find out!) In Figure 9, the triple represented is (227, 168, 54). None of the color components are at their maximum intensity, and the blue circle is so dim as to be barely visible. The resulting mix produces an orange-gold in the overlap area. Finally, in Figure 10, we see the triple (0, 185, 185). There is no red at all here, and the green and blue are mixed in equal proportion (though not at full intensity). The result is a slightly darker version of the color called "cyan".

Exercises

  1. On a sheet of graph paper, draw or trace a curve (such as the outline of a soda can, or your hand). For every square that is entirely inside the curve, fill in the entire square. For every square that is partially inside the curve, fill it in entirely if at least half the square is inside the curve, and leave it blank otherwise. How accurately does this represent the original curve?
  2. Repeat the previous exercise, except for the squares that are partially inside the curve, shade the whole square a medium grey instead of a fully-dark square. (This is easiest if you're working in pencil, but you can do reasonably well with ink if you're careful.) How does this image compare to the previous one?
  3. Measure the resolution of your laptop or smartphone in ppi.
  4. The standard-definition television standard prevalent in North America through the 20th century had a resolution of 702x480. How many megapixels were there in each frame?
  5. One of the current HDTV (high-definition television) standards has a resolution of 1920x1080. How many megapixels does it have in each frame?
  6. Describe in English the colors represented by the following RGB triples:
    1. (255, 255, 0)
    2. (50, 50, 50)
    3. (0, 0, 128)
    4. (100, 0, 150)
    5. (0, 128, 50)
    6. (255, 150, 150)
    7. (150, 100, 50)
  7. Find RGB triples that match or approximate the following color swatches:
    1. cyan
    2. black
    3. dark red
    4. lavender
    5. sea green
    6. steel blue
    7. pale yellow

Credits and licensing

The Rotunda logo is property of Longwood University. Other images and all text are by Don Blaheta, licensed under a Creative Commons BY-SA 3.0 license.

Version 2017-Jan-13 23:00