eu.medsea.mimeutil.detector
Class MagicMimeMimeDetector

java.lang.Object
  extended by eu.medsea.mimeutil.detector.MimeDetector
      extended by eu.medsea.mimeutil.detector.MagicMimeMimeDetector

public class MagicMimeMimeDetector
extends MimeDetector

The magic mime rules files are loaded in the following way.

  1. From a JVM system property magic-mime i.e -Dmagic-mime=../my/magic/mime/rules
  2. From any file named magic.mime that can be found on the classpath
  3. From a file named .magic.mime in the users home directory
  4. From the normal Unix locations /usr/share/file/magic.mime and /etc/magic.mime (in that order)
  5. From the internal magic.mime file eu.medsea.mimeutil.magic.mime if, and only if, no files are located in step 4 above.
Each rule file is appended to the end of the existing rules so the earlier in the sequence you define a rule means this will take precedence over rules loaded later.

You can add new mime mapping rules using the syntax defined for the Unix magic.mime file by placing these rules in any of the files or locations listed above. You can also change an existing mapping rule by redefining the existing rule in one of the files listed above. This is handy for some of the more sketchy rules defined in the existing Unix magic.mime files.

We extended the string type rule which allows you to match strings in a file where you do not know the actual offset of the string containing magic file information it goes something like “what I am looking for will be ‘somewhere’ within the next n characters” from this location. This is an important improvement to the string matching rules especially for text based documents such as HTML and XML formats. The reasoning for this was that the rules for matching SVG images defined in the original 'magic.mime' file hardly ever worked, this is because of the fixed offset definitions within the magic rule format. As XML documents generally have an XML declaration that can contain various optional attributes the length of this header often cannot be determined, therefore we cannot know that the DOCTYPE declaration for an SVG xml file starts at “this” location, all we can say is that, if this is an SVG xml file then it will have an SVG DOCTYPE somewhere near the beginning of the file and probably within the first 1024 characters. So we test for the xml declaration and then we test for the DOCTYPE within a specified number of characters and if found then we match this rule. This extension can be used to better identify ALL of the XML type mime mappings in the current 'magic.mime' file. Remember though, as we stated earlier mime type matching using any of the mechanisms supported is not an exact science and should always be viewed as a 'best guess' and not as a 'definite match'.

An example of overriding the PNG and SVG rules can be found in our internal 'magic.mime' file located in the test_files directory (this file is NOT used when locating rules and is used for testing purposes only). This PNG rule overrides the original PNG rule defined in the 'magic.mime' file we took from the Internet, and the SVG rule overrides the SVG detection also defined in the original 'magic.mime' file

 #PNG Image Format
 0              string          \211PNG\r\n\032\n               image/png

 #SVG Image Format
 #      We know its an XML file so it should start with an XML declaration.
 0      string  \<?xml\ version=     text/xml
 #      As the XML declaration in an XML file can be short or extended we cannot know
 #      exactly where the declaration ends i.e. how long it is,
 #      also it could be terminated by a new line(s) or a space(s).
 #      So the next line states that somewhere after the 15th character position we should find the DOCTYPE declaration.
 #      This DOCTYPE declaration should be within 1024 characters from the 15th character
 >15 string>1024<      \<!DOCTYPE\ svg\ PUBLIC\ "-//W3C//DTD\ SVG      image/svg+xml
 

As you can see the extension is defined using the syntax string>bufsize<. It can only be used on a string type and basically means match this within bufsize character from the position defined at the beginning of the line. This rule is much more verbose than required as we really only need to check for the presence of SVG. As we said earlier, this is a test case file and not used by the utility under normal circumstances. The test mime-types.properties and magic.mime files we use can be located in the test_files directory of this distribution.

We use the application/directory mime type to identify directories. Even though this is not an official mime type it seems to be well accepted on the net as an unofficial mime type so we thought it was OK for us to use as well.

This class is auto loaded by MimeUtil as it has an entry in the file called MimeDetectors. MimeUtil reads this file at startup and calls Class.forName() on each entry found. This mean the MimeDetector must have a no arg constructor.

Author:
Steven McArdle.

Field Summary
protected static String[] defaultLocations
           
 
Constructor Summary
MagicMimeMimeDetector()
           
 
Method Summary
 String getDescription()
          Abstract method to be implement by concrete MimeDetector(s).
 Collection getMimeTypesByteArray(byte[] data)
          Get the mime types that may be contained in the data array.
 Collection getMimeTypesFile(File file)
          Defer this call to the InputStream method
 Collection getMimeTypesFileName(String fileName)
          Defer this call to the File method
 Collection getMimeTypesInputStream(InputStream in)
          Get the mime types of the data in the specified InputStream.
 Collection getMimeTypesURL(URL url)
          Defer this call to the InputStream method
 
Methods inherited from class eu.medsea.mimeutil.detector.MimeDetector
closeStream, delete, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getName, init
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultLocations

protected static String[] defaultLocations
Constructor Detail

MagicMimeMimeDetector

public MagicMimeMimeDetector()
Method Detail

getDescription

public String getDescription()
Description copied from class: MimeDetector
Abstract method to be implement by concrete MimeDetector(s).

Specified by:
getDescription in class MimeDetector
Returns:
description of this MimeDetector

getMimeTypesByteArray

public Collection getMimeTypesByteArray(byte[] data)
                                 throws UnsupportedOperationException
Get the mime types that may be contained in the data array.

Specified by:
getMimeTypesByteArray in class MimeDetector
Parameters:
data. - The byte array that contains data we want to detect mime types from.
Returns:
the mime types.
Throws:
MimeException - if for instance we try to match beyond the end of the data.
UnsupportedOperationException

getMimeTypesInputStream

public Collection getMimeTypesInputStream(InputStream in)
                                   throws UnsupportedOperationException
Get the mime types of the data in the specified InputStream. Therefore, the InputStream must support mark and reset (see InputStream.markSupported()). If it does not support mark and reset, an MimeException is thrown.

Specified by:
getMimeTypesInputStream in class MimeDetector
Parameters:
in - the stream from which to read the data.
Returns:
the mime types.
Throws:
MimeException - if the specified InputStream does not support mark and reset (see InputStream.markSupported()).
UnsupportedOperationException

getMimeTypesFileName

public Collection getMimeTypesFileName(String fileName)
                                throws UnsupportedOperationException
Defer this call to the File method

Specified by:
getMimeTypesFileName in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException

getMimeTypesURL

public Collection getMimeTypesURL(URL url)
                           throws UnsupportedOperationException
Defer this call to the InputStream method

Specified by:
getMimeTypesURL in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException

getMimeTypesFile

public Collection getMimeTypesFile(File file)
                            throws UnsupportedOperationException
Defer this call to the InputStream method

Specified by:
getMimeTypesFile in class MimeDetector
Returns:
Collection of matched MimeType(s)
Throws:
UnsupportedOperationException


Copyright © 2007-2010 Medsea Business Solutions S.L.. All Rights Reserved.