Information AboutMetadata |
|
Metadata is used to facilitate the understanding, use and management of data. The metadata required for effective data management varies with the type of data and context of use. In a Library , where the data is the content of the titles stocked, metadata about a title would typically include a description of the content, the Author , the publication date and the physical location. In the context of a Camera , where the data is the photographic image, metadata would typically include the date the Photograph was taken and details of the camera settings. In the context of an Information System , where the data is the content of the Computer files, metadata about an individual data item would typically include the name of the field and its length. Metadata about a collection of data items, a computer file, might typically include the name of the file, the type of file and the name of the data administrator. WHAT IS METADATA? Any item of data is a description of something. Metadata is a type of data where the something being described is data. Or, as it is often put, metadata is data about data. If we consider a particular place in the real world, this may be described by many items of data, for example:
To make sense of and use this data, it is necessary to have access to some form of description of the sort of data it is, or, in other words, have access to its metadata. So, for example, the metadata for the above three items of data might include:
An item of metadata is itself data and therefore may have its own metadata. This might (not particularly usefully) be referred to as meta-metadata. So, for example, “Post Code” might have the following metadata:
“27th June 2006” might have the following metadata:
The hierarchy of data, metadata, meta-metadata etc. can go on for ever. Fortunately we have sufficient background knowledge so that we can usually make sense of and use an item of data with access to very little, if any, formally defined metadata. So, for example, with the “Post Code” metadata “8 characters, starting with A – Z” , it would be possible using background knowledge to know that this is a description of the format of a Post Code , without having access to any defined metadata for “8 characters, starting with A – Z”. LEVELS As indicated, there are hierarchies of data and metadata. However, any particular item of data may be on different levels of a hierarchy depending on the context. For example, when considering the geography of London, “E83BJ” would be data and “Post Code” would be metadata. But, when considering the data management of an automated system that manages geographical data, “Post Code” might be data and then “data item name” and “8 characters, starting with A – Z” would be metadata. In any particular context, metadata must be at a higher level of abstraction than the data it is describing. So, in relation to “E83BJ”, the item of data “is in London” is a further description of the place in the real world which has the post code “E83BJ” and is at the same level of abstraction. Therefore, although it is providing information about “E83BJ” (It is telling us that this is the post code of a place in London) this would not normally be considered metadata, as it is describing “E83BJ” ''qua'' place in the real world and not ''qua'' data. DEFINITIONS The term was introduced intuitively, without a formal definition. Because of that, today there are various definitions. The most common one is the literal translation:
Example: "12345" is data, and with no additional context is meaningless. When "12345" is given a meaningful name (metadata) of " ZIP Code ", one can understand (at least in the United States , and further placing "ZIP code" within the context of a Postal Address ) that "12345" refers to the General Electric plant in Schenectady, New York . As for most people the difference between data and Information is merely a Philosophical one of no relevance in practical use, other definitions are:
There are more sophisticated definitions, such as:
These are used more rarely because they tend to concentrate on one purpose of metadata — to find "objects", "entities" or "resources" — and ignore others, such as using metadata to optimize Compression Algorithms , or to perform additional computations using the data. The metadata concept has been extended into the world of systems to include any "data about data": the names of tables, columns, programs, and the like. Different views of this "system metadata" are detailed below, but beyond that is the recognition that metadata can describe all aspects of systems: data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules. Fundamentally, then, metadata is "the data that describe the structure and workings of an organization's use of information, and which describe the systems it uses to manage that information". To do a model of metadata is to do an " Enterprise Model " of the information technology industry itself.William R. Durrell, Data Administration: A Practical Guide to Data Administration, McGraw-Hill, 1985 Hierarchies of metadata When structured into a hierarchical arrangement, metadata is more properly called an Ontology or Schema . Both terms describe "what exists" for some purpose or to enable some action. For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects "exist" in the library's own ontology and how more specialized topics are related to or derived from the more general subject headings. Metadata is frequently stored in a central location and used to help organizations standardize their data. This information is typically stored in a Metadata Registry . Difference between data and metadata Usually it is not possible to distinguish between (raw) data and metadata because:
These considerations apply no matter which of the above definitions is considered. It's quite useful. USE Metadata has many different applications; this section lists some of the most common. Metadata is used to speed up and enrich searching for resources. In general, search queries using metadata can save users from performing more complex filter operations manually. It is now common for web browsers (with the notable exception of Mozilla Firefox), P2P applications and media management software to automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched . Metadata may also be associated to files manually. This is often the case with documents which are scanned into a document storage repository such as FileNet or Documentum. Once the documents have been converted into an electronic format a user brings the image up in a viewer application, manually reads the document and keys values into an online application to be stored in a metadata repository. Metadata provide additional information to users of the data it describes. This information may be descriptive ("These pictures were taken by children in the school's third grade class.") or algorithmic ("Checksum=139F"). Metadata helps to bridge the Semantic Gap . By telling a computer how data items are related and how these relations can be evaluated automatically, it becomes possible to process even more complex filter and search operations. For example, if a search engine understands that "Van Gogh" was a "Dutch painter", it can answer a search query on "Dutch painters" with a link to a web page about Vincent Van Gogh, although the exact words "Dutch painters" never occur on that page. This approach, called knowledge representation, is of special interest to the Semantic Web and Artificial Intelligence . Certain metadata is designed to optimize Lossy Compression Algorithms . For example, if a video has metadata that allows a computer to tell foreground from background, the latter can be compressed more aggressively to achieve a higher compression rate. Some metadata is intended to enable variable content presentation. For example, if a picture has metadata that indicates the most important region — the one where there is a person — an image viewer on a small screen, such as on a mobile phone's, can narrow the picture to that region and thus show the user the most interesting details. A similar kind of metadata is intended to allow blind people to access diagrams and pictures, by converting them for special output devices or reading their description using Text-to-speech software. Other descriptive metadata can be used to automate workflows. For example, if a "smart" software tool knows content and structure of data, it can convert it automatically and pass it to another "smart" tool as input. As a result, users save the many Copy-and-paste operations required when analyzing data with "dumb" tools. Metadata is becoming an increasingly important part of s and files can be important evidence. Recent changes to the Federal Rules Of Civil Procedure make metadata routinely discoverable as part of Civil Litigation . Parties to litigation are required to maintain and produce metadata as part of Discovery , and Spoliation of metadata can lead to sanctions. Metadata has become important on the World Wide Web because of the need to find useful information from the mass of information available. Manually-created metadata adds value because it ensures consistency. If a web page about a certain topic contains a word or phrase, then all web pages about that topic should contain that same word or phrase. Metadata also ensures variety, so that if a topic goes by two names each will be used. For example, an article about " Sport Utility Vehicle s" would also be Tagged "4 wheel drives", "4WDs" and "four wheel drives", as this is how SUVs are known in some countries. Examples of metadata for an Audio CD include the MusicBrainz project and AMG 's All Music Guide . Similarly, MP3 files have metadata tags in a format called ID3 . TYPES OF METADATA Metadata can be classified by:
IMPORTANT ISSUES To successfully develop and use metadata, several important issues should be treated with care: Metadata risks |
|
|