Sanitized Data Set from A Large-Scale Study of File-System Contents John R. Douceur and William J. Bolosky Microsoft Research Redmond, WA 98052 1. Introduction Between the dates of 1 September and 25 September, 1998, we collected snapshots of data from 10,568 file systems of 4801 personal computers at Microsoft Corporation. The file systems contain 140 million files totaling 10.5 TB of data. We recorded the name, size, timestamps, and containing directory of each file and subdirectory on every file system. For a complete description of our data collection methodology, along with an general analysis of the data, see the paper "A Large-Scale Study of File-System Contents" by John R. Douceur and William J. Bolosky, published in ACM SIGMETRICS '99. The present document accompanies the data set we collected, in a sanitized form to protect the privacy of the individuals whose file systems it describes. 2. File Format The data is distributed as a set of six (6) CD-ROMs. The CDs contain 10,568 data files, each of which is a gzipped archive of one ASCII text file. Each text file is a listing of the files and directories for one file system. The file format is more complicated than it ideally should be, primarily due to the sanitizing process, which prevents us from providing the actual names of files and directories. 2.1. Line Types Each line is terminated with a carriage-return/line-feed (\r\n) pair. There are four (4) different types of lines in each file, and the first character of each line indicates the line type. 2.1.1. System Information (%) The first line in each file begins with a % sign. This line contains the following six (6) fields, in order: 2.1.1.1. Field 1 is a decimal value indicating the version of the file format. It is 0 for all files in this set. 2.1.1.2. Field 2 is a character string indicating the type of the file system. There are four possibilities: FAT, FAT32, NTFS, and WCEFS. 2.1.1.3. Field 3 is a 64-bit decimal value indicating the time at which the scan occurred, expressed as the number of 100-nanosecond intervals since January 1, 1601, in Coordinated Universal Time (UTC). 2.1.1.4. Field 4 is a 64-bit decimal value indicating the available space on the file system, expressed in bytes. 2.1.1.5. Field 5 is a 64-bit decimal value indicating the total (available plus occupied) space on the file system, expressed in bytes. 2.1.1.6. Field 6 is a decimal value indicating the job category of the user of the file system. The encoding is as follows: 0 - other or unknown 1 - administration 2 - business 3 - management 4 - non-technical development 5 - technical development 6 - technical support All remaining lines in the file will be of one of the following types: 2.1.2. File (#) The line begins with a # sign. This line describes a file. It contains the following ten (10) fields, in order: 2.1.2.1. Fields 1 - 4 are four 32-bit hexadecimal values that are an encrypted representation of the file name. These values are generated with a keyed one-way hash of the lowercased file name. See section 2.2.4. 2.1.2.2. Field 5 is a 32-bit decimal value indicating the elapsed time since the last access to the file, expressed in 100-nanosecond intervals. This value is only valid for NTFS file systems. See section 2.2.1. 2.1.2.3. Field 6 is a 32-bit decimal value indicating the elapsed time since the last write to the file, expressed in 100-nanosecond intervals. See section 2.2.1. 2.1.2.4. Field 7 is a 32-bit decimal value indicating the elapsed time since the creation of the file, expressed in 100-nanosecond intervals. This value is only valid for NTFS file systems. See section 2.2.1. 2.1.2.5. Field 8 is a 32-bit hexadecimal value indicating the attribute flags of the file. If no attributes were recorded for the file, this value is 0xFFFFFFFF; otherwise, the bit fields are defined below in section 2.2.2. 2.1.2.6. Field 9 is a 64-bit decimal value indicating the size of the file, expressed in bytes. 2.1.2.7. Field 10 is a decimal value indicating the extension of the file name. If the extension is one of the 1000 most popular extensions, then this field will have a value between 0 and 999, indicating the corresponding extension in section 2.2.3. Otherwise, this field will have a value of -1. 2.1.3. Subdirectory ($) The line begins with a $ sign. This line describes a subdirectory. It contains the following nine (9) fields, in order: 2.1.3.1. Fields 1 - 4 are four 32-bit hexadecimal values that are an encrypted representation of the subdirectory name. These values are generated with a keyed one-way hash of the lowercased subdirectory name. See section 2.2.4. 2.1.3.2. Field 5 is a 32-bit decimal value indicating the elapsed time since the last access to the subdirectory, expressed in 100-nanosecond intervals. This value is only valid for NTFS file systems. See section 2.2.1. 2.1.3.3. Field 6 is a 32-bit decimal value indicating the elapsed time since the last write to the subdirectory, expressed in 100-nanosecond intervals. See section 2.2.1. 2.1.3.4. Field 7 is a 32-bit decimal value indicating the elapsed time since the creation of the subdirectory, expressed in 100-nanosecond intervals. This value is only valid for NTFS file systems. See section 2.2.1. 2.1.3.5. Field 8 is a 32-bit hexadecimal value indicating the attribute flags of the subdirectory. If no attributes were recorded for the subdirectory, this value is 0xFFFFFFFF; otherwise, the bit fields are defined below in section 2.2.2. 2.1.3.6. Field 9 is a decimal value indicating a unique index for this case-insensitive subdirectory name. Every unique (within a file system) subdirectory name is assigned a unique index value. See section 2.1.4. 2.1.4. Directory (&) The line begins with a & sign. This line indicates that all subsequent file (#) and subdirectory ($) lines describe files and subdirectories that are children of the directory described on this line. It contains a sequence of zero or more decimal values, each of which refers to an index number of a unique subdirectory name, as defined in paragraph 2.1.3.6. Each value indicates the name of one level of the directory path. For example, "windows\foo\bar\foo" might be represented as "7 16 22 16" if the index of "windows" is 7, the index of "foo" is 16, and the index of "bar" is 22. 2.2. Field Values 2.2.1. Time Values Fields 5, 6, and 7 of file (#) and subdirectory ($) lines indicate elapsed time since the last access, write, or creation of the file or subdirectory, expressed in 100-nanosecond intervals. This time is recorded relative to the time at which the scan was initiated (see section 2.1.1.3). The values in fields 5 (time since last access) and 7 (time since creation) are recorded for all file system types, but the values are valid only for NTFS file systems. According to Microsoft documentation, FAT and FAT32 do not properly record these timestamps, and our own analysis of these values shows them to be highly suspect. Nevertheless, our data files do include the values that were read, since they might conceivably be of some interest. If a timestamp is unavailable for a particular file, this value is recorded as –2147480000. 2.2.2. Attribute Flags Field 8 of file (#) and subdirectory ($) lines indicates attribute flags of the file or subdirectory. If no attributes were recorded for the file or subdirectory, this value is 0xFFFFFFFF; otherwise, the bit fields are defined as follows: 2.2.2.1. 0x0001 READONLY - The file or directory is read-only. Applications can read the file but cannot write to it or delete it. In the case of a directory, applications cannot delete it. 2.2.2.2. 0x0002 HIDDEN - The file or directory is hidden. It is not included in an ordinary directory listing. 2.2.2.3. 0x0004 SYSTEM - The file or directory is part of the operating system or is used exclusively by the operating system. 2.2.2.4. 0x0010 DIRECTORY - The handle identifies a directory. 2.2.2.5. 0x0020 ARCHIVE - The file or directory is an archive file or directory. Applications use this attribute to mark files for backup or removal. 2.2.2.6. 0x0040 DEVICE 2.2.2.7. 0x0080 NORMAL - The file or directory has no other attributes set. This attribute is valid only if used alone. 2.2.2.8. 0x0100 TEMPORARY - The file is being used for temporary storage. File systems attempt to keep all of the data in memory for quicker access, rather than flushing it back to mass storage. A temporary file should be deleted by the application as soon as it is no longer needed. 2.2.2.9. 0x0200 SPARSE FILE - The file is a sparse file. 2.2.2.10. 0x0400 REPARSE POINT - The file has an associated reparse point. 2.2.2.11. 0x0800 COMPRESSED - The file or directory is compressed. For a file, this means that all of the data in the file is compressed. For a directory, this means that compression is the default for newly created files and subdirectories. 2.2.2.12. 0x1000 OFFLINE - The file data is not immediately available. Indicates that the file data has been physically moved to offline storage. 2.2.2.13. 0x2000 NOT CONTENT INDEXED 2.2.2.14. 0x4000 ENCRYPTED - The file or directory is encrypted. For a file, this means that all data streams are encrypted. For a directory, this means that encryption is the default for newly created files and subdirectories. 2.2.3. File-Name Extensions Field 10 of file (#) lines is a decimal value indicating the extension of the file name. If the extension is one of the 1000 most popular extensions, then this field will have a value between 0 and 999, indicating the corresponding extension in the following table. This table is also provided as a two-column, tab-delimited, ASCII text file on the CD-ROM, named exts.txt (Note: the table in exts.txt has been updated to contain a correct copy of the following table as of 7/7/2008.) 0 .gif 200 .out 400 .lo_ 600 .itc 800 .dl$ 1 .h 201 .fts 401 .eml 601 .dp 801 .smp 2 .htm 202 .idb 402 .asx 602 .swa 802 .mvb 3 .dll 203 .usa 403 .cag 603 .vbg 803 .acg 4 - 204 .inx 404 .wrn 604 .0nd 804 .h_ 5 .c 205 .old 405 .rh 605 .mas 805 .os2 6 .exe 206 .hti 406 .asf 606 .cif 806 .mh 7 .ini 207 .mp3 407 .z 607 .tx_ 807 .gxp 8 .cpp 208 .sp_ 408 .mdm 608 .sp 808 .rct 9 .inf 209 .vbw 409 .emf 609 .pid 809 .tdl 10 .obj 210 .mdp 410 .prg 610 .sms 810 .htw 11 .txt 211 .pif 411 .ntm 611 .z1u 811 .qt 12 .bmp 212 .tsc 412 .xsl 612 .drw 812 .prj 13 .lib 213 .sdm 413 .gau 613 .msm 813 .ex$ 14 .jpg 214 .plg 414 .dic 614 .pak 814 .xdbg 15 .ico 215 .wiz 415 .ppm 615 .sh 815 .sb_ 16 .hlp 216 .dsz 416 .evt 616 .rll 816 .fli 17 .lnk 217 .ado 417 .mpd 617 .dcx 817 .pbr 18 .html 218 .fae 418 .rat 618 .dbc 818 .xpdb 19 .wav 219 .al 419 .btr 619 .rts 819 .gui 20 .mfc 220 .prt 420 .lck 620 .twd 820 .asv 21 .log 221 .fmt 421 .vtm 621 .htp 821 .p_ 22 .wmf 222 .cgm 422 .adf 622 .sln 822 .wat 23 .pdb 223 .vsd 423 .frt 623 .lgd 823 .ix 24 .tmp 224 .sct 424 .wpd 624 .saf 824 .rm_ 25 .rc 225 .acf 425 .pseg 625 .gtt 825 .qry 26 .pnf 226 .pub 426 .dct 626 .upp 826 .0a 27 .dbg 227 .cdx 427 .ram 627 .dmp 827 .max 28 .cur 228 .xml 428 .tri 628 .l 828 .vtp 29 .doc 229 .lex 429 .txr 629 .pwl 829 .pre 30 .asp 230 .avb 430 .dsr 630 .hfm 830 .mrg 31 .dir 231 .vst 431 .tps 631 .toc 831 .aco 32 .class 232 .clw 432 .ht$ 632 .hdi 832 .me 33 .dat 233 .rmi 433 .ldb 633 .suo 833 .hi 34 .cnt 234 .clt 434 .fav 634 .nmd 834 .sbk 35 .sys 235 .afm 435 .tsk 635 .rs_ 835 .spf 36 .cxx 236 .nch 436 .mde 636 .cc 836 .win 37 .java 237 .mif 437 .--- 637 .bsp 837 .lic 38 .url 238 .pal 438 .rq 638 .vcp 838 .sm_ 39 .ttf 239 .fnt 439 .mpt 639 .ger 839 .ovf 40 .inc 240 .ins 440 .pdr 640 .vfb 840 .0y 41 .def 241 .ic_ 441 .app 641 .mis 841 .ihc 42 .hxx 242 .str 442 .vsdir 642 .mso 842 .slp 43 .chm 243 .ver 443 .vjp 643 .cb 843 .pmc 44 .pfm 244 .idx 444 .wrl 644 .slk 844 .rob 45 .bat 245 .bhmm5 445 .vbz 645 .fd 845 .swt 46 .dl_ 246 .nlb 446 .ic2 646 .ms_ 846 .vfo 47 .jpeg 247 .swf 447 .hh 647 .pjt 847 .pkp 48 .idl 248 .bgl 448 .csf 648 .pps 848 .iss 49 .ocx 249 .msc 449 .stn 649 .dca 849 .ax_ 50 .asm 250 .cmp 450 .mda 650 .asc 850 .ba_ 51 .res 251 .tfm 451 .new 651 .hpw 851 .upd 52 .xls 252 .scx 452 .etf 652 .xll 852 .crs 53 .cab 253 .acp 453 .cer 653 .tl_ 853 .ddb 54 .cdf 254 .jpe 454 .oca 654 .dif 854 .cf 55 .tif 255 .rsb 455 .t 655 .nfo 855 .ac_ 56 .frm 256 .sgf 456 .rpd 656 .dtd 856 .ra 57 .sbr 257 .oak 457 .oc_ 657 .tmt 857 .pcb 58 .mac 258 .aps 458 .hhk 658 .mx 858 .cdr 59 .chi 259 .isu 459 .br 659 .sta 859 .imd 60 .cmd 260 .ilk 460 .enu 660 .bdb 860 .vim 61 .pp_ 261 .dos 461 .dmf 661 .srv 861 .rdt 62 .map 262 .cn_ 462 .nif 662 .xbm 862 .bld 63 .sen 263 .scn 463 .tcl 663 .cm 863 .md2 64 .fon 264 .tex 464 .in 664 .swp 864 .cdp 65 .mak 265 .scf 465 .cap 665 .v 865 .qjf 66 .exp 266 .tag 466 .rs 666 .sw4 866 .cms 67 .vxd 267 .dsn 467 .ht_ 667 .kdc 867 .quf 68 .pot 268 .tbl 468 .vsl 668 .mny 868 .six 69 .drv 269 .cu_ 469 .tip 669 .pcf 869 .sdk 70 .syn 270 .rsp 470 .spd 670 .iso 870 .ph 71 .ifi 271 .poc 471 .fil 671 .aux 871 .ntf 72 .nls 272 .eps 472 .eot 672 .chw 872 .rsr 73 .sql 273 .b 473 .reb 673 .ntt 873 .flw 74 .bas 274 .nt 474 .edb 674 .sml 874 .mnx 75 .cfg 275 ._al 475 .pbk 675 .cpt 875 .chb 76 .css 276 .psd 476 .prm 676 .ocm 876 .rtrt 77 .tlb 277 .awx 477 .job 677 .h16 877 .0b 78 .sym 278 .bcp 478 .pix 678 .bpd 878 .ply 79 .ani 279 .tdf 479 .cla 679 .zgf 879 .shp 80 .frx 280 .0 480 .clb 680 .obd 880 .ilb 81 .dsp 281 .sav 481 .cpi 681 .dcr 881 .set 82 .dot 282 .tsp 482 .jdb 682 .apd 882 .auz 83 .cpl 283 .pcd 483 .sfl 683 .psy 883 .val 84 .ex_ 284 .ivt 484 .fot 684 .ms 884 .fif 85 .zip 285 .idt 485 .mfa 685 .ie3 885 .rst 86 .mk 286 .rsc 486 .chmm 686 .pvk 886 .cli 87 .srg 287 .tst 487 .p 687 .hst 887 .fiv 88 .dlg 288 .awk 488 .imp 688 .chl 888 .bme 89 .in_ 289 .rul 489 .pct 689 .ts_ 889 .lisp 90 .vbp 290 .scp 490 .cln 690 .trm 890 .cus 91 .reg 291 .trn 491 .skl 691 .pyc 891 .del 92 .ppd 292 .sty 492 .qtc 692 .mss 892 .ofx 93 .s 293 .mod 493 .gpc 693 .shd 893 .mnt 94 .x 294 .ebx 494 .jcz 694 .gnd 894 .env 95 .scc 295 .r 495 .py 695 .pwz 895 .pt 96 .art 296 .mib 496 .wk4 696 .seq 896 .dtx 97 .com 297 .dun 497 .key 697 .sdo 897 .prl 98 .ufm 298 .spr 498 .sub 698 .srf 898 .plt 99 .bin 299 .pdl 499 .ds_ 699 .icn 899 .mpc 100 .scr 300 .ctt 500 .nam 700 .vf 900 .cob 101 .cls 301 .mcd 501 .sc_ 701 .zrs 901 .prn 102 .a 302 .pro 502 .alt 702 .pjx 902 .ip 103 .seg 303 .db 503 .dst 703 .tree 903 .obs 104 .rtf 304 .fpt 504 .co_ 704 .pat 904 .mkf 105 .flt 305 .ind 505 .gi_ 705 .tsb 905 .swg 106 .js 306 .osd 506 .wb2 706 .rm 906 .vsk 107 .ivi 307 .chk 507 .shw 707 .pmw 907 .cam 108 .gp_ 308 .mbx 508 .cpe 708 .ode 908 .unr 109 .inl 309 .per 509 .cel 709 .blt 909 .tli 110 .sy_ 310 .iqy 510 .ci 710 .smk 910 .csm 111 .ecf 311 .hiv 511 .mdw 711 .rpm 911 .vsz 112 .rgs 312 .sed 512 .spc 712 .sts 912 .ime 113 .dsw 313 .tga 513 .cor 713 .qrp 913 .iw 114 .gpd 314 .gz 514 .img 714 .wab 914 .fla 115 .bak 315 .nws 515 .wks 715 .req 915 .xpt 116 .htx 316 .w 516 .trg 716 .ovl 916 .vcw 117 .pch 317 .grp 517 .wll 717 .sch 917 .cgi 118 .mdz 318 .hpj 518 .dsm 718 .smf 918 .pef 119 .dep 319 .ch_ 519 .xlm 719 .vhd 919 .flog 120 .ai 320 .000 520 .box 720 .vbd 920 .sif 121 .fh7 321 .m 521 .y 721 .inv 921 .tr_ 122 .msg 322 .d 522 .evf 722 .bs 922 .asd 123 .elm 323 .org 523 .csc 723 .lid 923 .da0 124 .des 324 .cpx 524 .jar 724 .obt 924 .iff 125 .stf 325 .api 525 .uce 725 .glb 925 .ifa 126 .hpp 326 .isp 526 .thm 726 .ol_ 926 .mse 127 .tab 327 .rpt 527 .cbk 727 .vce 927 .os_ 128 .mid 328 .frq 528 .cbd 728 .sup 928 .wsp 129 .cod 329 .xlt 529 .prv 729 .fra 929 .xlb 130 .ppt 330 .bsc 530 .idq 730 .pll 930 .as_ 131 .oft 331 .spl 531 .lui 731 .jav 931 .lbm 132 .bdr 332 .dr_ 532 .vwp 732 .shg 932 .sq_ 133 .elc 333 .pd_ 533 .sid 733 .oem 933 .scd 134 .dib 334 .pc_ 534 .anm 734 .0ud 934 .dsk 135 .olb 335 .pag 535 .mui 735 .ddx 935 .plb 136 .snt 336 .mpp 536 .utf8 736 .sgm 936 .wk 137 .ht 337 .blz 537 .mdl 737 .cps 937 .ste 138 .id 338 .idf 538 .stb 738 .var 938 .sy 139 .gid 339 .adm 539 .scm 739 .m4 939 .spk 140 .mpa 340 ._ 540 .acv 740 .mov 940 .pwd 141 .aw 341 .ush 541 .hp_ 741 .mcp 941 .co 142 .vbs 342 .theme 542 .ffl 742 .ids 942 .vub 143 .hl_ 343 .bm_ 543 .exd 743 .dlx 943 .wpx 144 .mix 344 .rbf 544 .dsc 744 .bi 944 .bvg 145 .mic 345 .act 545 .col 745 .do_ 945 .pr 146 .db_ 346 .au 546 .inp 746 .acs 946 .pci 147 .htt 347 .x32 547 .ptc 747 .aip 947 .hex 148 .lmp 348 .pic 548 .msi 748 .data 948 .lng 149 .odl 349 .rle 549 .crt 749 .aif 949 .bnk 150 .pm 350 .idc 550 .bln 750 .hrc 950 .hp 151 .png 351 .prc 551 .mat 751 .mdf 951 .pi 152 .lgc 352 .cp_ 552 .wbk 752 .ftg 952 .ld 153 .pfb 353 .ddr 553 .o 753 .lwo 953 .psm 154 .tok 354 .mf 554 .fpl 754 .ian 954 .snm 155 .rcv 355 .ppa 555 .pod 755 .ita 955 .thd 156 .rdq 356 .er_ 556 .sdb 756 .deu 956 .atm 157 .lst 357 .fgl 557 .mmm 757 .dsx 957 .rpc 158 .icm 358 .raw 558 .ibd 758 .fdf 958 .mch 159 .avi 359 .mof 559 .pp 759 .ref 959 .cbs 160 .rc2 360 .icw 560 .shl 760 .dvi 960 .fet 161 .mdb 361 .vct 561 .hpa 761 .st_ 961 .phn 162 .cnv 362 .i 562 .pab 762 .tlh 962 .les 163 .tmf 363 .vcx 563 .dob 763 .drs 963 .ods 164 .slm 364 .mtl 564 .rwx 764 .cm_ 964 .m3u 165 .wri 365 .hhc 565 .csp 765 .pdi 965 .ksh 166 .src 366 .tt_ 566 .lrf 766 .loc 966 .hld 167 .sam 367 .sig 567 .stm 767 .aol 967 .geom 168 .hdr 368 .hdl 568 .base 768 .xv2 968 .rep 169 .xla 369 .mva 569 .tql 769 .lan 969 .cac 170 .mst 370 .v6 570 .pov 770 .pri 970 .atc 171 .wa_ 371 .asa 571 .cp 771 .trace 971 .ttc 172 .pl 372 .ex 572 .diz 772 .vx_ 972 .rgb 173 .mc 373 .rdf 573 .cf_ 773 .enc 973 .do$ 174 .ax 374 .da_ 574 .cov 774 .xnf 974 .sc 175 .fo_ 375 .time 575 .mi_ 775 .sfw 975 .x86 176 .0u 376 .an_ 576 .ost 776 .pip 976 .aut 177 .prf 377 .nl_ 577 .msb 777 .ida 977 .hpf 178 .el 378 .stk 578 .er1 778 .kin 978 .mad 179 .acm 379 .pst 579 .ffx 779 .eng 979 .wml 180 .pkg 380 .r8 580 .ctx 780 .ino 980 .htc 181 .csv 381 .wbm 581 .gra 781 .hts 981 .sqz 182 .dbf 382 .wpc 582 .mdt 782 .smi 982 .cvp 183 .pcx 383 .gc 583 .hhp 783 .oab 983 .efp 184 .ncb 384 .gst 584 .go 784 .dh 984 .mmp 185 .ctl 385 .ddf 585 .vaf 785 .sve 985 .prp 186 .pdf 386 .bi_ 586 .mtr 786 .px 986 .cfl 187 .htr 387 .acl 587 .ffa 787 .gen 987 .fsh 188 .cat 388 .kbd 588 .ffo 788 .rsrc 988 .aaf 189 .tpl 389 .mp2 589 .bib 789 .mpg 989 .dph 190 .int 390 .wpg 590 .rwz 790 .mcs 990 .os$ 191 .err 391 .hm 591 .ged 791 .hsh 991 .chs 192 .vbx 392 .msk 592 .thk 792 .nld 992 .pgi 193 .cnf 393 .sep 593 .pas 793 .wps 993 .f 194 .vss 394 .rom 594 .bol 794 .htz 994 .mnn 195 .e 395 .tpf 595 .dls 795 .man 995 .for 196 .ps 396 .ast 596 .o9 796 .gsf 996 .lrc 197 .opt 397 .rec 597 .ldf 797 .esn 997 .dfm 198 .0n 398 .el_ 598 .bsl 798 .u 998 .xpm 199 .hs 399 .mnu 599 .nb 799 .pk 999 .cmn 2.2.4. File and Subdirectory Names Fields 1 - 4 of file (#) and subdirectory ($) lines are four 32-bit hexadecimal values that are an encrypted representation of the file or subdirectory name. These values are generated with a keyed one-way hash of the lowercased file name, in order to protect the privacy of the file system users. Some file and directory names are well-known system names and thus not an infringement on user privacy. Following are the encrypted representations of several key file and subdirectory names: 2.2.4.1. 9402f645 795bbf66 deb620bd 49c88990 - "pagefile.sys" 2.2.4.2. 06111ffd 4f90ed4e 2ef9f321 2d211248 - "system" 2.2.4.3. 6e569441 0e02b53b c4eeb9a1 daaf740c - "system32" 2.2.4.4. 34a635e8 feb6999a b5b1364b 633376f4 - "temporary internet files" 2.2.4.5. 33dbe2e8 99e8d9f8 84d32877 d18a6dbc - "temp" 2.2.4.6. 09680ec0 e858a6b6 7dd5ce11 ed48b59f - "tmp" 2.2.4.7. c0b60a2b 3c47f6d5 23179dc4 62fa30b5 - "windows" 2.2.4.8. d9cc0d1a cfa98d43 14eeb238 994401bb - "winnt" 2.3. Example System and Data File An example may help to clarify the formal description provided in section 2.1. 2.3.1. Example File System The following tree depicts a file system. Uppercase is used to indicate a directory, and lowercase is used to depict a file. Actual names on Windows systems are case-insensitive. Indentation indicates containment; for example, the root directory contains files bar.html, bazzoo.tat, yinyang, and zing.buz. It also contains subdirectories foo, windows, and xena. Directory foo contains the file scanner.exe and the subdirectory wazoo. And so forth. (ROOT) bar.html bazzoo.tat FOO scanner.exe WAZOO wazoo.c wazoo.dll wazoo.h wazoo.txt WINDOWS MEDIA GRAPHICS XENA SOUND WAV music.wav ocean.wav SYSTEM mapi.dll scanner.exe system.dll unicorn.dll wazoo.dll that.ini this.ini XENA yinyang zing.buz 2.3.2. Data File for Example System The data file describing the contents of the above file system might look as follows. The right-justified text is explanatory annotation that is not included in the actual data file. %0 NTFS 125503623744530000 12320374784 17174093824 5 & (root) #dab52fba 76bfb03b 3c04b022 5b4f2e0b 4334457 30722020 30689407 20 488 18 bar.html #9f6a11ce 5068caaf 88a81520 228c5f1b 33970224 33970224 33970236 20 64445 -1 bazzoo.tat $6021f936 53e08fe8 531c9d65 c0aff9c1 1917621 1917621 34902456 10 0 foo $c0b60a2b 3c47f6d5 23179dc4 62fa30b5 12204 33970224 34902456 10 1 windows $bab17ab3 ce056101 5f9f6ebd d19956c6 1379 26579 34902456 10 2 xena #4fec8e34 f228b387 444d4ecb 7623763e 5284293 5284293 34902425 20 8154 4 yinyang #142ac038 893a6304 e501892f b83e717d 1135379 1135379 34902425 20 6908 -1 zing.buz & 2 xena & 1 windows $421efc89 a08cd23a 762f43cc a35d9358 12204 34902456 34902456 10 3 media $06111ffd 4f90ed4e 2ef9f321 2d211248 12171 12204 34902456 10 4 system #d15069e3 b8fe8762 fde23626 b75adec6 30179 30179 34900980 20 8174 7 that.ini #e478764a 53d35ce6 72a59531 6316f5be 1379 1379 34900980 20 7040 7 this.ini & 1 4 windows\system #338b4a6c 8c8beccf 34e18baf 203a5de1 530608 530608 33970237 24 8170 3 mapi.dll #514f469e 4023460b 5b892dec f5415522 12204 12204 33970237 20 205824 6 scanner.exe #f9079833 02fe2a1e c7d77828 37579c62 1135400 1135400 34902426 24 1047224 3 system.dll #d993e4b2 04a2d2a9 05904fbe 8a27eac3 617008 617008 34900980 20 817385 3 unicorn.dll #7f1860cc cfa98dbe 24aa28f5 a8c71a70 12171 12171 34900980 20 51049 3 wazoo.dll & 1 3 windows\media $6f68199e ff49407f 4e0a6490 4649873a 1135382 1917622 34902456 10 5 graphics $182f04a1 edf9ed6e 90af16a5 9a771cd2 1135400 34902456 34902456 10 6 sound & 1 3 6 windows\media\sound $7cb61a63 1f7c28b2 761028da 5aab3912 3009822 3009822 44655208 10 7 wav & 1 3 6 7 windows\media\sound\wav #f9211b54 414f24ad ae4bfe16 dcd323d6 74425 74425 339349 20 8388608 19 music.wav #afa67e52 e05eb686 b2ae21bf b7f711c6 69107 69107 81747 20 6978273 19 ocean.wav & 1 3 5 windows\media\graphics $bab17ab3 ce056101 5f9f6ebd d19956c6 3015394 44655208 44655208 10 2 xena & 1 3 5 2 windows\media\graphics\xena & 0 foo #514f469e 4023460b 5b892dec f5415522 69107 69107 81747 20 205824 6 scanner.exe $2174716e a685b60e 7eb36a63 f464b119 1135400 34902456 34902456 10 8 wazoo & 0 8 foo\wazoo #c7f9c3db 56759dce c737c27b 8b06e2a9 74564 1547472 1737525 20 4322 5 wazoo.c #7f1860cc cfa98dbe 24aa28f5 a8c71a70 74564 1547472 1737525 20 223232 3 wazoo.dll #a0cf82c1 0fe848b8 26abc821 4a015978 74564 1547472 1737525 20 497 1 wazoo.h #d71d4283 53296c7f ed06cce4 d8e0115a 74564 1547472 1737525 20 2983 11 wazoo.txt