New_JJ XML-Based_JJ Files_NN2 :_: Implications_NN2 for_IF Forensics_NN2 For_IF more_DAR than_CSN 20_MC years_NNT2 ,_, programs_NN2 such_II21 as_II22 Microsoft_NP1 Word_NN1 have_VH0 stored_VVN their_APPGE documents_NN2 in_II binary_JJ file_NN1 formats_NN2 ._. 
That_DD1 's_VBZ changing_JJ as_CSA Microsoft_NP1 ,_, Sun_NN1 Microsystems_NN2 ,_, and_CC other_JJ developers_NN2 migrate_VV0 to_II new_JJ XML-based_JJ formats_NN2 for_IF document_NN1 files_NN2 ._. 
Document_NN1 files_NN2 are_VBR of_IO critical_JJ interest_NN1 to_II forensic_JJ practitioners_NN2 because_II21 of_II22 the_AT data_NN they_PPHS2 contain_VV0 ;_; they_PPHS2 're_VBR also_RR a_AT1 rich_JJ topic_NN1 for_IF forensic_JJ research_NN1 ._. 
Although_CS most_DAT investigations_NN2 concern_VV0 themselves_PPX2 solely_RR with_IW a_AT1 document_NN1 's_GE surface_NN1 content_NN1 ,_, some_DD examinations_NN2 dive_VV0 deeper_RRR ,_, examining_VVG the_AT metadata_NN1 or_CC deleted_JJ material_NN1 that_CST 's_VBZ still_RR present_JJ in_II the_AT file_NN1 ._. 
Investigators_NN2 can_VM ,_, for_REX21 instance_REX22 ,_, use_VV0 metadata_NN1 to_TO identify_VVI individuals_NN2 potentially_RR responsible_JJ for_IF unauthorized_JJ file_NN1 modification_NN1 ,_, establish_VV0 text_NN1 plagiarization_NN1 ,_, or_CC even_RR indicate_VV0 falsification_NN1 of_IO evidence_NN1 ._. 
Unfortunately_RR ,_, metadata_NN1 can_VM also_RR be_VBI modified_VVN to_TO implicate_VVI innocent_JJ people_NN --_JJ and_CC the_AT ease_NN1 of_IO modifying_VVG these_DD2 new_JJ files_NN2 means_VVZ that_CST it_PPH1 's_VBZ far_RR easier_JJR to_TO make_VVI malicious_JJ modifications_NN2 that_CST are_VBR difficult_JJ (_( if_CS not_XX impossible_JJ )_) to_TO detect_VVI ._. 
With_IW so_RG many_DA2 aspects_NN2 to_TO consider_VVI ,_, we_PPIS2 present_VV0 a_AT1 forensic_JJ analysis_NN1 of_IO the_AT two_MC rival_JJ XML-based_JJ office_NN1 document_NN1 file_NN1 formats_NN2 :_: the_AT Office_NN1 Open_JJ XML_NP1 (_( OOX_NP1 )_) that_CST Microsoft_NP1 adopted_VVD for_IF its_APPGE Office_NN1 software_NN1 suite_NN1 and_CC the_AT OpenDocument_NN1 Format_NN1 (_( ODF_NP1 )_) used_VVD by_II Sun_NN1 's_GE OpenOffice_NP1 software_NN1 ._. 
We_PPIS2 detail_VV0 how_RGQ forensic_JJ tools_NN2 can_VM exploit_VVI features_NN2 in_II these_DD2 file_NN1 formats_NN2 and_CC show_VV0 how_RRQ these_DD2 formats_NN2 could_VM cause_VVI problems_NN2 for_IF forensic_JJ practitioners_NN2 ._. 
For_IF additional_JJ information_NN1 on_II the_AT development_NN1 and_CC increased_JJ use_NN1 of_IO these_DD2 two_MC file_NN1 formats_NN2 ,_, see_VV0 the_AT "_" Background_NN1 "_" sidebar_NN1 ._. 
To_TO begin_VVI our_APPGE analysis_NN1 ,_, we_PPIS2 created_VVD multiple_JJ ODF_NN1 and_CC OOX_NN1 files_NN2 using_VVG Microsoft_NP1 Office_NN1 2007_MC for_IF Windows_NP1 ,_, Microsoft_NP1 Office_NN1 2008_MC for_IF Macintosh_NP1 ,_, OpenOffice_NP1 2.3.1_MC ,_, and_CC NeoOffice_NP1 2.2.2_MC (_( a_AT1 version_NN1 of_IO OpenOffice_NP1 that_CST runs_VVZ under_II MacOS_NP1 )_) ._. 
Overall_RR ,_, we_PPIS2 found_VVD that_CST ODF_NP1 and_CC OOX_NN1 files_NN2 tend_VV0 to_TO be_VBI smaller_JJR than_CSN equivalent_JJ legacy_NN1 non-XML_NN1 files_NN2 ,_, almost_RR certainly_RR a_AT1 result_NN1 of_IO ZIP_NN1 compression_NN1 ._. 
Although_CS it_PPH1 's_VBZ trivial_JJ to_TO add_VVI to_II or_CC remove_VV0 parts_NN2 from_II a_AT1 ZIP_NN1 archive_NN1 after_II its_APPGE creation_NN1 ,_, we_PPIS2 found_VVD that_CST in_II many_DA2 cases_NN2 ,_, adding_VVG or_CC removing_VVG parts_NN2 to_II the_AT archive_NN1 corrupted_VVD the_AT file_NN1 so_CS21 that_CS22 it_PPH1 could_VM n't_XX be_VBI processed_VVN with_IW Microsoft_NP1 Office_NN1 or_CC OpenOffice_NP1 ._. 
The_AT ZIP_NN1 structure_NN1 for_IF these_DD2 files_NN2 is_VBZ useful_JJ when_CS performing_VVG data_NN recovery_NN1 or_CC file_NN1 carving_NN1 ._. 
(_( File_NN1 carving_NN1 is_VBZ the_AT process_NN1 of_IO recognizing_VVG files_NN2 by_II their_APPGE content_NN1 ,_, rather_CS21 than_CS22 file_VVI system_NN1 metadata_NN1 ._. 
Carving_NN1 is_VBZ frequently_RR used_VVN for_IF recovering_VVG files_NN2 from_II devices_NN2 that_CST have_VH0 hardware_NN1 errors_NN2 ,_, have_VH0 been_VBN formatted_VVN ,_, or_CC have_VH0 been_VBN partially_RR overwritten_VVN ._. )_) 
Because_CS each_DD1 part_NN1 of_IO the_AT archive_NN1 includes_VVZ a_AT1 multibyte_NN1 signature_NN1 and_CC a_AT1 32-bit_NN1 cyclic_JJ redundancy_NN1 check_NN1 (_( CRC32_FO )_) for_IF validation_NN1 ,_, we_PPIS2 can_VM recover_VVI parts_NN2 of_IO a_AT1 ZIP_NN1 archive_NN1 even_CS21 when_CS22 other_JJ parts_NN2 of_IO it_PPH1 are_VBR damaged_VVN ,_, missing_VVG ,_, or_CC otherwise_RR corrupted_VVD ._. 
We_PPIS2 can_VM also_RR use_VVI the_AT CRC32_FO and_CC relative_NN1 offsets_VVZ within_II the_AT archive_NN1 to_TO automatically_RR reassemble_VVI fragmented_JJ ZIP_NN1 files_NN2 ._. 
We_PPIS2 can_VM then_RT manually_RR process_VVI recovered_JJ parts_NN2 or_CC insert_VV0 them_PPHO2 into_II other_JJ OOX/ODF_NN1 files_NN2 to_TO view_VVI the_AT data_NN ._. 
ODF_NP1 and_CC OOX_NP1 both_RR contain_VV0 a_AT1 ZIP_NN1 directory_NN1 as_II the_AT last_MD structure_NN1 in_II the_AT file_NN1 ._. 
We_PPIS2 can_VM examine_VVI this_DD1 directory_NN1 using_VVG standard_JJ tools_NN2 ,_, such_II21 as_II22 the_AT Unix_NN1 unzip_NN1 command_NN1 or_CC Sun_NN1 's_GE JAR_NN1 ._. 
ODF_NP1 has_VHZ a_AT1 second_MD directory_NN1 that_CST stores_NN2 document_VV0 parts_NN2 in_II an_AT1 XML_NP1 data_NN structure_NN1 called_VVN Meta-INF/_JJ manifest.xml_NNU ._. 
The_AT OOX_NN1 files_NN2 store_VV0 references_NN2 to_II the_AT additional_JJ document_NN1 parts_NN2 in_II the_AT &lsqb;_( ContentTypes_NP1 &rsqb;_) ._. xml_NNU and_CC ._. rels_NN2 parts_NN2 ,_, in_II31 addition_II32 to_II33 the_AT document_NN1 contents_NN2 themselves_PPX2 ._. 
Both_DB2 file_VV0 formats_NN2 include_VV0 a_AT1 special_JJ XML_NP1 file_NN1 that_CST contains_VVZ the_AT document_NN1 's_GE main_JJ flow_NN1 ._. 
In_II ODF_NP1 ,_, the_AT file_NN1 content_NN1 is_VBZ called_VVN content.xml_NNU ._. 
The_AT primary_JJ contents_NN2 of_IO an_AT1 OOX_NN1 word_NN1 processing_NN1 document_NN1 created_VVN with_IW Microsoft_NP1 Office_NN1 2007_MC or_CC 2008_MC reside_VV0 in_II the_AT document.xml_NNU part_NN1 ,_, although_CS the_AT standard_NN1 allows_VVZ a_AT1 different_JJ name_NN1 to_TO be_VBI specified_VVN in_II the_AT &lsqb;_( ContentTypes_NP1 &rsqb;_) ._. 
xml_NNU part_NN1 ._. 
Forensic_JJ tools_NN2 should_VM extract_VVI text_NN1 from_II the_AT content_JJ parts_NN2 ,_, but_CCB tool_NN1 developers_NN2 must_VM understand_VVI that_DD1 text_NN1 can_VM be_VBI present_JJ in_II other_JJ document_NN1 parts_NN2 as_RR21 well_RR22 ._. 
For_REX21 example_REX22 ,_, Microsoft_NP1 Word_NN1 allows_VVZ other_JJ Word_NN1 documents_NN2 to_TO be_VBI embedded_VVN within_II a_AT1 Word_NN1 document_NN1 using_VVG the_AT "_" Insert/Object_NN1 ..._... "_" menu_NN1 command_NN1 ._. 
These_DD2 documents_NN2 are_VBR embedded_VVN as_II a_AT1 named_JJ ._. docx_VV0 file_NN1 inside_II the_AT ZIP_NN1 archive_NN1 ,_, as_CSA Figure_NN1 1_MC1 shows_VVZ ._. 
In_II such_DA an_AT1 instance_NN1 ,_, where_CS files_NN2 are_VBR embedded_VVN within_II other_JJ files_NN2 ,_, investigators_NN2 should_VM analyze_VVI files_NN2 recursively_RR using_VVG a_AT1 special_JJ forensic_JJ tool_NN1 ._. 
The_AT most_RGT straightforward_JJ way_NN1 for_IF forensic_JJ practitioners_NN2 to_TO handle_VVI these_DD2 new_JJ compound_NN1 document_NN1 formats_NN2 is_VBZ to_TO save_VVI the_AT file_NN1 and_CC then_RT open_VV0 it_PPH1 with_IW a_AT1 compliant_JJ program_NN1 ._. 
Although_CS this_DD1 approach_NN1 works_VVZ ,_, it_PPH1 raises_VVZ several_DA2 potential_JJ problems_NN2 :_: The_AT compound_NN1 document_NN1 might_VM contain_VVI active_JJ content_NN1 that_CST the_AT forensic_JJ investigator_NN1 does_VDZ n't_XX wish_VVI to_TO execute_VVI ._. 
(_( Despite_II assurances_NN2 from_II Microsoft_NP1 and_CC others_NN2 that_CST these_DD2 file_NN1 formats_NN2 are_VBR safer_JJR ,_, both_RR ODF_VV0 and_CC OOX_NP1 have_VH0 provisions_NN2 for_IF storing_VVG active_JJ content3_FO and_CC therefore_RR can_VM carry_VVI viruses_NN2 ._. )_) 
Links_NN2 to_II external_JJ Web_NN1 sites_NN2 can_VM reveal_VVI that_CST someone_PN1 has_VHZ captured_VVN the_AT file_NN1 and_CC is_VBZ analyzing_VVG it_PPH1 ._. 
If_CS parts_NN2 of_IO the_AT file_NN1 are_VBR overwritten_VVN or_CC missing_JJ ,_, applications_NN2 such_II21 as_II22 Word_NN1 or_CC OpenOffice_NP1 might_VM be_VBI unable_JK to_TO open_VVI the_AT files_NN2 ._. 
Desktop_NN1 applications_NN2 can_VM overlook_VVI or_CC ignore_VVI critical_JJ information_NN1 of_IO interest_NN1 to_II the_AT forensic_JJ investigator_NN1 ._. 
To_II this_DD1 end_NN1 ,_, we_PPIS2 tested_VVD both_RR Guidance_NN1 's_GE EnCase_NN1 6.11_MC and_CC AccessData_NP1 's_GE Forensic_JJ ToolKit_NN1 1.8_MC and_CC determined_VVN that_CST they_PPHS2 could_VM display_VVI and_CC search_VVI for_IF text_NN1 inside_II ODF_NP1 files_NN2 ,_, OOX_NP1 files_NN2 ,_, and_CC OOX_NN1 files_NN2 embedded_VVN as_CSA objects_NN2 inside_II other_JJ OOX_NN1 files_NN2 ._. 
Both_DB2 the_AT compressed_JJ nature_NN1 of_IO ODF_NP1 and_CC OOX_NN1 files_NN2 and_CC the_AT multiple_JJ codings_NN2 for_IF the_AT strings_NN2 possible_JJ within_II XML_NP1 represent_VV0 a_AT1 significant_JJ problem_NN1 for_IF forensic_JJ program_NN1 developers_NN2 ._. 
Because_CS all_DB the_AT text_NN1 is_VBZ compressed_VVN ,_, it_PPH1 's_VBZ no_AT longer_JJR possible_JJ to_TO find_VVI it_PPH1 by_II scanning_VVG for_IF strings_NN2 within_II raw_JJ disk_NN1 or_CC document_NN1 images_NN2 ._. 
And_CC because_CS XML_NP1 allows_VVZ strings_NN2 to_TO be_VBI coded_VVN in_II hexadecimal_NN1 or_CC even_RR interrupted_VVN by_II comment_NN1 characters_NN2 ,_, any_DD forensic_JJ tool_NN1 that_CST takes_VVZ shortcuts_NN2 in_II decoding_VVG the_AT ZIP_NN1 archive_NN1 or_CC implementing_VVG the_AT full_JJ XML_NP1 schema_NN1 could_VM return_VVI false_JJ negatives_NN2 when_CS performing_VVG searches_NN2 ._. 
Document_NN1 files_NN2 are_VBR fundamentally_RR container_NN1 files_NN2 --_NN1 that_REX21 is_REX22 ,_, single_JJ files_NN2 (_( a_AT1 consecutive_JJ stream_NN1 of_IO bytes_NNU2 )_) that_CST contain_VV0 multiple_JJ data_NN objects_NN2 ._. 
A_AT1 typical_JJ Microsoft_NP1 Word_NN1 file_NN1 might_VM contain_VVI data_NN streams_NN2 associated_VVN with_IW the_AT summary_NN1 info_NN1 ,_, the_AT main_JJ text_NN1 ,_, tables_NN2 ,_, and_CC embedded_JJ images_NN2 ._. 
The_AT file_NN1 also_RR contains_VVZ numerous_JJ forms_NN2 of_IO metadata_NN1 --_NN1 both_RR for_IF the_AT document_NN1 and_CC for_IF the_AT container_NN1 itself_PPX1 ._. 
Sun_NN1 Microsystems_NN2 submitted_VVD the_AT OpenOffice_NP1 OpenDocument_NP1 Format_NN1 (_( ODF_NP1 )_) to_II the_AT Organization_NN1 for_IF the_AT Advancement_NN1 of_IO Structured_JJ Information_NN1 Standards_NN2 (_( Oasis_NN1 )_) ._. 
The_AT ODF_NN1 was_VBDZ approved_VVN as_II an_AT1 Oasis_NN1 standard_NN1 on_II 1_MC1 May_NPM1 2005_MC and_CC adopted_VVD as_CSA ISO_NP1 26300_MC the_AT following_JJ year_NNT1 ._. 
Because_II21 of_II22 the_AT verbose_JJ nature_NN1 of_IO XML_NP1 ,_, ODF_NP1 calls_VVZ for_IF the_AT XML_NP1 File_NN1 to_TO be_VBI compressed_VVN ._. 
Parsing_NP1 XML_NP1 can_VM also_RR be_VBI time-consuming_JJ ,_, so_CS ODF_NP1 uses_VVZ a_AT1 single_JJ document_NN1 represented_VVN by_II multiple_JJ XML_NP1 files_NN2 bundled_VVN together_RL into_II a_AT1 single_JJ ZIP_NN1 archive_NN1 ._. 
Images_NN2 and_CC other_JJ binary_JJ objects_NN2 are_VBR n't_XX coded_VVN as_CSA XML_NP1 but_CCB are_VBR stored_VVN natively_RR as_CSA binary_JJ sections_NN2 in_II the_AT ZIP_NN1 archive_NN1 ._. 
Following_VVG the_AT introduction_NN1 of_IO ODF_NP1 ,_, Microsoft_NP1 introduced_VVD its_APPGE own_DA XML-based_JJ document_NN1 file_NN1 formats_NN2 called_VVN WordprocessingML_NP1 ,_, SpreadsheetML_NP1 ,_, and_CC PresentionML_NP1 ._. 
Like_II ODF_NP1 ,_, Office_NN1 Open_JJ XML_NP1 (_( OOX_NP1 )_) is_VBZ a_AT1 ZIP_NN1 archive_NN1 file_NN1 consisting_VVG of_IO multiple_JJ XML_NP1 document_NN1 elements_NN2 (_( unless_CS the_AT file_NN1 is_VBZ encrypted_VVN ,_, in_II which_DDQ case_VV0 it_PPH1 's_VBZ an_AT1 OLE_JJ compound_NN1 file_NN1 )_) ._. 
Microsoft_NP1 refers_VVZ to_II the_AT file_NN1 as_II a_AT1 package_NN1 ,_, with_IW each_DD1 file_NN1 within_II the_AT archive_NN1 referred_VVN to_II as_II a_AT1 part_NN1 ._. 
As_CSA with_IW ODF_NP1 ,_, structured_JJ information_NN1 is_VBZ first_MD encoded_VVD into_II XML_NP1 and_CC compressed_JJ ;_; embedded_JJ images_NN2 are_VBR stored_VVN as_CSA binary_JJ objects_NN2 within_II their_APPGE own_DA parts_NN2 ._. 
Because_CS Microsoft_NP1 's_GE XML_NP1 languages_NN2 are_VBR defined_VVN in_II31 terms_II32 of_II33 behaviors_NN2 built_VVD in_RP to_II Microsoft_NP1 Office_NN1 ,_, OOX_NP1 files_NN2 ca_VM n't_XX be_VBI readily_RR translated_VVN into_II ODF_NP1 or_CC vice_RR21 versa_RR22 ._. 
Microsoft_NP1 's_GE Office_NN1 2003_MC allowed_VVD these_DD2 formats_NN2 to_TO be_VBI used_VVN as_CSA alternative_JJ document_NN1 file_NN1 formats_NN2 ;_; with_IW Microsoft_NP1 Office_NN1 2007_MC ,_, the_AT XML-based_JJ document_NN1 formats_NN2 became_VVD the_AT default_NN1 file_NN1 format_NN1 ._. 
Native_JJ support_NN1 for_IF Office_NN1 Open_JJ XML_NP1 is_VBZ provided_VVN today_RT in_II Microsoft_NP1 Office_NN1 2007_MC for_IF Windows_NN2 and_CC Office_NN1 2008_MC for_IF Macintosh_NP1 ._. 
Additionally_RR ,_, several_DA2 other_JJ programs_NN2 have_VH0 the_AT ability_NN1 to_TO read_VVI or_CC write_VVI Word_NN1 2007_MC files_NN2 ._. 
ZIP_VV0 files_NN2 consist_VV0 of_IO one_MC1 or_CC more_DAR file_NN1 sections_NN2 followed_VVN by_II a_AT1 central_JJ directory_NN1 ._. 
Each_DD1 file_NN1 section_NN1 consists_VVZ of_IO a_AT1 local_JJ file_NN1 header_NN1 that_CST includes_VVZ metadata_NN1 such_II21 as_II22 the_AT file_NN1 's_GE directory_NN1 and_CC filename_NN1 ,_, time_NNT1 stamp_NN1 ,_, compression_NN1 method_NN1 used_VVD ,_, and_CC additional_JJ information_NN1 ,_, followed_VVN by_II the_AT actual_JJ file_NN1 data_NN and_CC a_AT1 data_NN descriptor_NN1 that_CST includes_VVZ a_AT1 32-bit_NN1 checksum_NN1 ._. 
The_AT Central_JJ Directory_NN1 Record_NN1 contains_VVZ the_AT names_NN2 of_IO all_DB the_AT files_NN2 ,_, their_APPGE offsets_VVZ within_II the_AT file_NN1 ,_, and_CC their_APPGE time_NNT1 stamps_NN2 ._. 
The_AT new_JJ XML-based_JJ file_NN1 formats_NN2 have_VH0 several_DA2 advantages_NN2 when_CS compared_VVN with_IW binary_JJ file_NN1 formats_NN2 :_: Because_CS they_PPHS2 're_VBR compressed_VVN ,_, files_NN2 in_II the_AT new_JJ format_NN1 are_VBR typically_RR smaller_JJR than_CSN files_NN2 in_II the_AT legacy_NN1 format_NN1 ._. 
Programs_NN2 that_CST process_VV0 document_NN1 files_NN2 need_VV0 only_RR extract_VVI the_AT sections_NN2 that_CST they_PPHS2 're_VBR concerned_JJ with_IW and_CC can_VM ignore_VVI the_AT rest_NN1 ._. 
Only_JJ sections_NN2 that_CST could_VM contain_VVI computer_NN1 viruses_NN2 need_VV0 to_TO be_VBI scanned_VVN for_IF computer_NN1 viruses_NN2 ._. 
Even_CS21 if_CS22 parts_NN2 of_IO the_AT file_NN1 are_VBR corrupted_VVN ,_, complete_JJ ZIP_NN1 sections_NN2 can_VM still_RR be_VBI recovered_VVN ._. 
This_DD1 could_VM allow_VVI embedded_JJ images_NN2 or_CC even_RR content_JJ to_TO be_VBI recovered_VVN under_II some_DD circumstances_NN2 ._. 
Existing_JJ tools_NN2 for_IF handling_VVG ZIP_NN1 files_NN2 and_CC XML_NP1 documents_NN2 make_VV0 it_PPH1 easier_JJR for_IF developers_NN2 to_TO write_VVI programs_NN2 that_CST can_VM automatically_RR process_VVI data_NN stored_VVN in_II XML_NP1 document_NN1 files_NN2 than_CSN to_TO process_VVI legacy_NN1 Word_NN1 documents_NN2 ._. 
However_RR ,_, because_CS these_DD2 are_VBR ZIP_NN1 files_NN2 of_IO XML_NP1 documents_NN2 ,_, they_PPHS2 're_VBR far_RR easier_JJR to_TO modify_VVI ._. 
With_IW off-the-shelf_JJ tools_NN2 ,_, an_AT1 attacker_NN1 can_VM open_VVI one_MC1 of_IO these_DD2 files_NN2 and_CC selectively_RR add_VV0 or_CC remove_VV0 information_NN1 ._. 
Both_RR ODF_VV0 and_CC OOX_NP1 are_VBR still_RR relatively_RR rare_JJ ,_, but_CCB their_APPGE numbers_NN2 are_VBR increasing_VVG ._. 
We_PPIS2 performed_VVD Google_NP1 searches_NN2 by_II file_NN1 type_NN1 in_II March_NPM1 ,_, July_NPM1 ,_, and_CC September_NPM1 2008_MC ,_, as_II31 well_II32 as_II33 January_NPM1 2009_MC ,_, and_CC saw_VVD the_AT number_NN1 of_IO OOX_NN1 files_NN2 nearly_RR triple_VV0 during_II this_DD1 study_NN1 period_NN1 ._. 
"_" Save_VV0 preview_NN1 picture_NN1 "_" on_II the_AT "_" Advanced_JJ Options_NN2 "_" for_IF the_AT "_" Save_VV0 "_" dialog_NN1 box_NN1 is_VBZ n't_XX checked_VVN by_II default_NN1 on_II Word_NN1 and_CC Excel_VV0 2007_MC the_AT way_NN1 it_PPH1 is_VBZ in_II PowerPoint_NP1 ._. 
Embedded_JJ thumbnails_NN2 can_VM be_VBI valuable_JJ in_II forensic_JJ practice_NN1 ._. 
If_CS the_AT thumbnail_NN1 does_VDZ n't_XX match_VVI the_AT document_NN1 ,_, then_RT someone_PN1 modified_VVD the_AT thumbnail_NN1 or_CC the_AT document_NN1 after_II the_AT file_NN1 's_GE creation_NN1 ._. 
If_CS the_AT file_NN1 is_VBZ no_RR21 longer_RR22 intact_JJ ,_, the_AT thumbnail_NN1 might_VM give_VVI the_AT investigator_NN1 some_DD idea_NN1 of_IO the_AT file_NN1 's_GE contents_NN2 before_II the_AT file_NN1 was_VBDZ damaged_VVN ._. 
The_AT thumbnail_NN1 can_VM also_RR give_VVI a_AT1 sense_NN1 of_IO what_DDQ the_AT document_NN1 is_VBZ about_RP if_CS the_AT document_NN1 file_NN1 itself_PPX1 is_VBZ corrupted_VVN and_CC ca_VM n't_XX be_VBI completely_RR recovered_VVN ._. 
For_IF completeness_NN1 ,_, we_PPIS2 also_RR examined_VVD the_AT thumbnail_NN1 images_NN2 for_IF metadata_NN1 ._. 
The_AT ._. jpg_NNU thumbnails_NN2 created_VVN by_II Microsoft_NP1 Office_NN1 contained_VVD metadata_NN1 for_IF only_RR the_AT image_NN1 size_NN1 and_CC resolution_NN1 ,_, whereas_CS the_AT ._. pdf_NNU thumbnails_NN2 created_VVN by_II NeoOffice_NP1 filled_VVD in_II the_AT PDF_NP1 's_GE creator_NN1 ,_, producer_NN1 ,_, and_CC creation_NN1 date_NN1 ._. 
However_RR ,_, these_DD2 values_NN2 merely_RR indicated_VVD the_AT program_NN1 that_CST created_VVD the_AT thumbnail_NN1 ,_, not_XX the_AT user_NN1 who_PNQS ran_VVD the_AT program_NN1 ,_, as_CSA Figure_NN1 2_MC shows_NN2 ._. 
Unique_JJ identifiers_NN2 stored_VVN within_II documents_NN2 can_VM play_VVI an_AT1 important_JJ role_NN1 in_II many_DA2 forensic_JJ investigations_NN2 ._. 
Because_CS unique_JJ identifiers_NN2 remain_VV0 the_AT same_DA even_CS21 when_CS22 the_AT document_NN1 is_VBZ edited_VVN ,_, we_PPIS2 can_VM use_VVI them_PPHO2 to_TO track_VVI the_AT movement_NN1 of_IO documents_NN2 through_II or_CC between_II organizations_NN2 ._. 
By_II correlating_VVG unique_JJ identifiers_NN2 found_VVN on_II multiple_JJ hard_JJ drives_NN2 ,_, it_PPH1 's_VBZ possible_JJ to_TO find_VVI previously_RR unknown_JJ social_JJ networks_NN2 ._. 
We_PPIS2 can_VM use_VVI unique_JJ identifiers_NN2 that_CST survived_VVD copying_VVG and_CC pasting_VVG to_TO show_VVI plagiarism_NN1 ._. 
Unique_JJ identifiers_NN2 can_VM also_RR raise_VVI privacy_NN1 concerns_NN2 ._. 
We_PPIS2 found_VVD many_DA2 unique_JJ identifiers_NN2 stored_VVN within_II the_AT ODF_NN1 and_CC OOX_NN1 files_NN2 ._. 
Some_DD of_IO them_PPHO2 were_VBDR "_" unique_JJ "_" in_CS21 that_CS22 they_PPHS2 did_VDD n't_XX occur_VVI elsewhere_RL within_II a_AT1 specific_JJ XML_NP1 part_NN1 or_CC within_II the_AT ZIP_NN1 file_NN1 :_: primarily_RR ,_, these_DD2 were_VBDR 32-bit_NN1 numbers_NN2 stored_VVN in_II hexadecimal_NN1 ._. 
Others_NN2 were_VBDR 128-bit_JJ numbers_NN2 unique_JJ for_IF a_AT1 particular_JJ generation_NN1 of_IO a_AT1 particular_JJ document_NN1 ._. 
We_PPIS2 did_VDD n't_XX find_VVI any_DD unique_JJ identifiers_NN2 that_CST appeared_VVD to_TO be_VBI unique_JJ for_IF a_AT1 specific_JJ machine_NN1 ._. 
For_REX21 example_REX22 ,_, OOX_NP1 defines_VVZ a_AT1 revision_NN1 identifier_NN1 for_IF paragraphs_NN2 (_( rsidP_NN1 and_CC rsidR_NN1 )_) ._. 
Microsoft_NP1 Word_NN1 uses_VVZ these_DD2 identifiers_NN2 to_TO determine_VVI the_AT editing_NN1 session_NNT1 in_II which_DDQ a_AT1 user_NN1 added_VVD a_AT1 paragraph_NN1 to_II the_AT main_JJ document_NN1 ,_, to_TO aid_VVI in_II Word_NN1 's_GE "_" Compare_VV0 Documents_NN2 "_" feature_NN1 ._. 
According_II21 to_II22 the_AT specification_NN1 ,_, the_AT rsidR_NN1 values_NN2 should_VM be_VBI unique_JJ within_II a_AT1 document_NN1 :_: instances_NN2 with_IW the_AT same_DA value_NN1 within_II a_AT1 single_JJ document_NN1 indicate_VV0 that_CST modifications_NN2 occurred_VVD during_II the_AT same_DA editing_NN1 session_NNT1 ._. 
The_AT primary_JJ value_NN1 of_IO these_DD2 identifiers_NN2 to_II forensic_JJ examiners_NN2 is_VBZ document_NN1 tracking_NN1 ._. 
Consequently_RR ,_, it_PPH1 's_VBZ possible_JJ --_NN1 using_VVG these_DD2 numbers_NN2 --_JJ to_TO show_VVI that_CST one_MC1 file_NN1 probably_RR resulted_VVN from_II editing_VVG another_DD1 file_NN1 (_( although_CS there_EX is_VBZ ,_, of_RR21 course_RR22 ,_, a_AT1 one_MC1 in_II four_MC billion_NNO chance_NN1 that_CST two_MC of_IO these_DD2 32-bit_NN1 numbers_NN2 will_VM be_VBI the_AT same_DA )_) ._. 
However_RR ,_, the_AT new_JJ XML-based_JJ formats_NN2 also_RR make_VV0 it_PPH1 easier_JJR to_TO change_VVI unique_JJ IDs_NN2 ,_, making_VVG it_PPH1 much_RR easier_RRR to_TO maliciously_RR implicate_VVI an_AT1 innocent_JJ computer_NN1 user_NN1 or_CC create_VV0 the_AT appearance_NN1 of_IO a_AT1 false_JJ correlation_NN1 ._. 
