Pornography Statistics Report - or 'The Pornograph'

Generated at Fri Aug 24 08:30:47 2007

This report is generated from a web spider that crawls freely available internet pornography. It is intended to answer the question "What camera gear is the professional pornographer using?". It also contains a bunch of data about the pornography out there. Despite the title this page is entirely work safe - this is an output from a data processing program. The computer cares neither about the material nor your moral judgements.

A wag at work has dubbed this page 'the Pornograph'.


About the Spider

Chart of queue size against total documents processed.

This chart is essentially showing how the queue size changes over time. Roughly every 100 documents or so the spider logs how many URLs it has left in its queue of URLs to process. You may think you want this to be as large as possible, but that's a misnomer - you're just using up valuable memory if you try that. What's the point of having 3,000,000 elements in your queue? You can only process one at a time, after all. The zigzag shape represents the queue getting new material when it falls below a certain level, with a steady upward trend due to a steady increase in the queue of pages to process later. I have tweaked the queuing algorithm a few times to ensure that it spiders a wide range of material.

About the Material

Pie chart of the type of materials processed. Pie chart of EXIF presence in images.

'Unknown' material includes videos and webpages the spider cannot categorise into either galleries or linkfarms. Errors are 404 errors, parse errors with the pages, and so on. The other graph shown here shows the proportion of images that have EXIF data in them. EXIF data is a set of metadata that tells you more about the image - what camera it was taken with, when it was taken, what exposure settings were used, and so on.

Be careful when using this chart to determine what material is on the web generally - the spider's smart enough to prioritise images first (as that's what it's most interested in) with galleries listed second (as that's where we find images). That said, of course there are more images than galleries (because each gallery has around 16 images).

Pie chart of 2257 compliance advertising on galleries.

There's a section of US law that covers record-keeping for pornographic pictures. Basically, pornographers must keep records that show their models were all over 18 at the time of photographing them. You don't have to display a claim to this effect on the webpage, but many people do. The spider only takes images from pages that advertise their compliance, just to be on the safe side. I didn't want the spider to inadvertently process any child porn.

Scatter chart of image dimensions. Pie chart of image orientation.

This is a scatter graph of image dimensions (x against y). The more intense the orange, the more images that are present there. Galleries often contain images of exactly the same size, hence why there are definite clusters. There are roughly two trend lines on account of some images being in a portrait orientation and some in landscape.

The second chart shows that a portrait orientation is the most popular in pornographic photography. This is most likely because humans are taller than they are wide, and so fit best in this format.

Pie chart of cameras used. Pie chart of camera brands.

This pie chart represents the cameras used. As expected, entry-level DSLRs make a strong showing. The Hasselblad is an interesting find - that's about £10,000 of camera. We can also see the manufacturers being used. Even if we stick Nikon Corporation in with Nikon it looks like Canon are very much in the lead. Nikon also lose points for being unable to keep their EXIF tags consistent.

Pie chart of exposure program used.

When taking a photo there are two factors that control the exposure1 - aperture (the size of the hole in the lens, controlling the amount of light entering the camera) and shutter speed (how long the film or sensor is exposed for). Cameras have various modes that give you differing levels of control - in manual mode the user must set both themselves. Aperture priority lets the user choose an aperture and the camera will pick the shutter speed, and vice versa for shutter priority. In program mode the camera chooses both.

The strong showing for manual mode indicates that most people are using external flash gear. The other modes rely on the camera being able to meter the exposure, which generally isn't possible when using studio flash gear. Program mode is also popular, possibly because (and this is total conjecture) amateur porn is a growing field and amateur pics are more likely to be unprocessed, therefore are more likely to have their Exif info intact.

1 At least three, actually - the explanation left off ISO mode plus other things. Those are irrelevant for this discussion, though.

Bar chart of focal lengths used.

This chart indicates that most porn is shot using wide-angle lenses. I'd expect the strong 18mm showing to be a result of people using kit lenses with digital SLRs. Another advantage is it lets you work in quite confined surroundings - but it also tends to introduce some distortion that can be unflattering in a portrait (big noses, in particular) so I was surprised to find this. Maybe you must have a teensy tiny nose to be a porn star.

Bar chart of date image taken.

This chart answers the question, "How fresh is the porn?". It shows when the image was originally taken. The spider was run in August 2007, for your reference. The porn is both surprisingly contemporary and sometimes quite out of date. And some tags are just full of nonsense (photographs taken in 2099, for instance).