Camera Lucida

June 10, 2006

Misunderstanding metadata

Filed under: Law practice, Tech — CL @ 7:41 am

We keep hearing this mistake: "metadata is data about data". This is a wrong formulation.

In general, “data” describes that which is contained in a document. Data may include words, ideas, arguments, conclusions, opinions, analyses, numbers, or any other kind of information. The document is nothing more than a container in which the data is found.

Metadata is data about the document, not data about the data. The commonly-created metadata includes the identifying information, such as:

  • author
  • date and time
  • subject
  • keywords

This is all data about the document. It is independent of the data contained in the document. It is also usually innocent and will not spring traps for the unwary, although there have been reported instances where the inadvertent inclusion of this information has tripped up a user.

"Data about the data" would be such things as total word count, or the total number of occurrences of a certain word or phrase. It is usually findable but is not usually recorded within the document.

The word is also commonly used to refer to

  • hidden information – text that is deliberately hidden but intended by the user to remain within the document, such as hidden text, hidden rows or columns in a spreadsheet, or hidden comments;
  • unintended remnants of information – i.e. text that is inadvertently left within the document after a deletion.

In this sense, metadata is neither data about the document or data about the data. It is simply additional data, tucked away in a place not immediately visible or accessible.


Blog at