Document Types

<< Click to Display Table of Contents >>

Navigation:  Process  >

Document Types

DWR processes all file types. However, DWR also extracts content and/or categorizes documents based on file extension.

Below is the list of special file extensions and how they are treated:

General Document Containers

Contents of these containers are always extracted

 

7z

 

 

lzh

 

 

rar

 

 

zip

 

 

zipx

 

 

tar

 

 

gz

 

 

tgz

 

 

 

 

Mail Containers

 

Contents of these containers are always extracted, preserving family relationships

 

dbx

 

 

eml

 

 

emlx

 

 

mbox

 

 

mbs

 

 

mbx

 

 

msg

 

 

p7m

// encrypted email format

 

pmm

 

 

pst

 

 

qim

 

 

sbd

 

 

 

 

Office Documents

 

Search for embedded documents

 

docx

 

 

pptx

 

 

ppsx

 

 

xlsx

 

 

 

 

Other Known Extensions

 

Recognized but content is not extracted

 

ace

 

 

arc

 

 

arj

 

 

bz

// unix archive

 

bz2

 

 

bza

 

 

cpio

 

 

czip

 

 

dmg

 

 

gca

 

 

gza

 

 

gzip

 

 

hqx

 

 

lha

 

 

lzs

 

 

nsf

// lotus notes mail

 

ost

// outlook/exchange cache file

 

pak

 

 

pk3

 

 

rpm

 

 

sea

 

 

sit

// mac stuff-it file

 

sitx

// mac stuff-it extended file

 

snm

// unix mail

 

taz

 

 

tbz

 

 

wad

 

 

war

 

 

x-stu

 

 

 

 

Load File Formats

 

These are recognized as standard production loadfiles

 

opt

 

 

lfp

 

 

 

 

Assumed Image File Types

 

bmp

 

 

dwg

 

 

gif

 

 

jpeg

 

 

jpg

 

 

pdf

 

 

png

 

 

tif

 

 

tiff

 

 

 

 

Assumed Binary File Types

 

bin

 

 

cmd

 

 

com

 

 

dll

 

 

exe

 

 

msp

 

 

ocx

 

 

reg