|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object eu.medsea.mimeutil.detector.MimeDetector eu.medsea.mimeutil.detector.MagicMimeMimeDetector
public class MagicMimeMimeDetector
The magic mime rules files are loaded in the following way.
magic-mime
i.e
-Dmagic-mime=../my/magic/mime/rules
magic.mime
that can be found on the
classpath.magic.mime
in the users home directory/usr/share/file/magic.mime
and /etc/magic.mime
(in that order)magic.mime
file
eu.medsea.mimeutil.magic.mime
if, and only if, no files are
located in step 4 above.You can add new mime mapping rules using the syntax defined for the Unix magic.mime file by placing these rules in any of the files or locations listed above. You can also change an existing mapping rule by redefining the existing rule in one of the files listed above. This is handy for some of the more sketchy rules defined in the existing Unix magic.mime files.
We extended the string type rule which allows you to match strings in a file where you do not know the actual offset of the string containing magic file information it goes something like “what I am looking for will be ‘somewhere’ within the next n characters” from this location. This is an important improvement to the string matching rules especially for text based documents such as HTML and XML formats. The reasoning for this was that the rules for matching SVG images defined in the original 'magic.mime' file hardly ever worked, this is because of the fixed offset definitions within the magic rule format. As XML documents generally have an XML declaration that can contain various optional attributes the length of this header often cannot be determined, therefore we cannot know that the DOCTYPE declaration for an SVG xml file starts at “this” location, all we can say is that, if this is an SVG xml file then it will have an SVG DOCTYPE somewhere near the beginning of the file and probably within the first 1024 characters. So we test for the xml declaration and then we test for the DOCTYPE within a specified number of characters and if found then we match this rule. This extension can be used to better identify ALL of the XML type mime mappings in the current 'magic.mime' file. Remember though, as we stated earlier mime type matching using any of the mechanisms supported is not an exact science and should always be viewed as a 'best guess' and not as a 'definite match'.
An example of overriding the PNG and SVG rules can be found in our internal 'magic.mime' file located in the test_files directory (this file is NOT used when locating rules and is used for testing purposes only). This PNG rule overrides the original PNG rule defined in the 'magic.mime' file we took from the Internet, and the SVG rule overrides the SVG detection also defined in the original 'magic.mime' file
#PNG Image Format 0 string \211PNG\r\n\032\n image/png #SVG Image Format # We know its an XML file so it should start with an XML declaration. 0 string \<?xml\ version= text/xml # As the XML declaration in an XML file can be short or extended we cannot know # exactly where the declaration ends i.e. how long it is, # also it could be terminated by a new line(s) or a space(s). # So the next line states that somewhere after the 15th character position we should find the DOCTYPE declaration. # This DOCTYPE declaration should be within 1024 characters from the 15th character >15 string>1024< \<!DOCTYPE\ svg\ PUBLIC\ "-//W3C//DTD\ SVG image/svg+xml
As you can see the extension is defined using the syntax string>bufsize<. It can only be used on a string type and basically means match this within bufsize character from the position defined at the beginning of the line. This rule is much more verbose than required as we really only need to check for the presence of SVG. As we said earlier, this is a test case file and not used by the utility under normal circumstances. The test mime-types.properties and magic.mime files we use can be located in the test_files directory of this distribution.
We use the application/directory
mime type to identify
directories. Even though this is not an official mime type it seems to be
well accepted on the net as an unofficial mime type so we thought it was OK
for us to use as well.
This class is auto loaded by MimeUtil as it has an entry in the file called MimeDetectors. MimeUtil reads this file at startup and calls Class.forName() on each entry found. This mean the MimeDetector must have a no arg constructor.
Field Summary | |
---|---|
protected static String[] |
defaultLocations
|
Constructor Summary | |
---|---|
MagicMimeMimeDetector()
|
Method Summary | |
---|---|
String |
getDescription()
Abstract method to be implement by concrete MimeDetector(s). |
Collection |
getMimeTypesByteArray(byte[] data)
Get the mime types that may be contained in the data array. |
Collection |
getMimeTypesFile(File file)
Defer this call to the InputStream method |
Collection |
getMimeTypesFileName(String fileName)
Defer this call to the File method |
Collection |
getMimeTypesInputStream(InputStream in)
Get the mime types of the data in the specified InputStream . |
Collection |
getMimeTypesURL(URL url)
Defer this call to the InputStream method |
Methods inherited from class eu.medsea.mimeutil.detector.MimeDetector |
---|
closeStream, delete, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getMimeTypes, getName, init |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected static String[] defaultLocations
Constructor Detail |
---|
public MagicMimeMimeDetector()
Method Detail |
---|
public String getDescription()
MimeDetector
getDescription
in class MimeDetector
public Collection getMimeTypesByteArray(byte[] data) throws UnsupportedOperationException
getMimeTypesByteArray
in class MimeDetector
data.
- The byte array that contains data we want to detect mime types from.
MimeException
- if for instance we try to match beyond the end of the data.
UnsupportedOperationException
public Collection getMimeTypesInputStream(InputStream in) throws UnsupportedOperationException
InputStream
.
Therefore, the InputStream
must support mark and reset (see
InputStream.markSupported()
). If it does not support mark and
reset, an MimeException
is thrown.
getMimeTypesInputStream
in class MimeDetector
in
- the stream from which to read the data.
MimeException
- if the specified InputStream
does not support
mark and reset (see InputStream.markSupported()
).
UnsupportedOperationException
public Collection getMimeTypesFileName(String fileName) throws UnsupportedOperationException
getMimeTypesFileName
in class MimeDetector
UnsupportedOperationException
public Collection getMimeTypesURL(URL url) throws UnsupportedOperationException
getMimeTypesURL
in class MimeDetector
UnsupportedOperationException
public Collection getMimeTypesFile(File file) throws UnsupportedOperationException
getMimeTypesFile
in class MimeDetector
UnsupportedOperationException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |