Endpoint Protection

 View Only

Web Browser Forensics, Part 1 

Mar 30, 2005 02:00 AM

by Keith J. Jones, Rohyt Belani


Electronic evidence has often shaped the outcome of high-profile civil law suits and criminal investigations ranging from theft of intellectual property and insider trading that violates SEC regulations to proving employee misconduct resulting in termination of employment under unfavorable circumstances. Critical electronic evidence is often found in the suspect's web browsing history in the form of received emails, sites visited and attempted Internet searches. This two-part article presents the techniques and tools commonly used by computer forensics experts to uncover such evidence, through a fictitious investigation that closely mimics real-world scenarios.

While you read this article, you may follow along with the investigation and actually analyze case data. To actively participate in the investigation, you need to download the associated Internet activity data from the SecurityFocus archives [data].

Case notes

At 8.25pm on March 18, 2005, a Senior Associate at a prestigious law firm had just finished a draft of a property-sale contract for his client but was unable to upload the document to the law firm's centralized document storage server hosted by Docustodian, Inc. His attempts to upload the document met with the following error message: "You have reached the storage limit. Please call your system administrator". The Senior Associate did just that, calling Joe Schmo, the firm's IT administrator. But, Joe's voicemail indicated that he was on vacation from March 7-21, 2005.

This was not an isolated occurrence. An internal review revealed that over 500 GB of MP3s, pirated software, and newly released movies were stored on the system under the profile for Joe Schmo. After finding that a potential intrusion had occurred, the law firm quickly concluded that an investigation of a potential violation of internal policy or an intrusion was beyond their core IT competency and brought in a professional security firm to lead the investigation.

The investigation

During most investigations, an individual's web browsing activity often provides investigative leads. In this investigation, we will begin our analysis by reconstructing the web browsing activity in order to help prove or disprove our suspicions about Joe Schmo, the law firm's system administrator. Our investigation will utilize a combination of commercial and open source tools that you can use to analyze the data provided for this incident. We will walk through their capabilities, how they are used, and what information they will provide us to analyze web browsing activity in this investigation.

Internet Activity File Formats

The predominant two web browsers we encounter during computer related investigations are Microsoft's Internet Explorer (IE) and the Firefox/Mozilla/Netscape family. Each of these browsers saves the web browsing activity (also known as web browsing history) in their own unique formats. We will outline the file formats and the relevant file paths for both IE and Firefox/Mozilla/Netscape's Internet activity files to enhance our investigative leads.

Microsoft's Internet Explorer (IE)

IE is typically installed by default on new Windows-based computers and is used by most private and business computer owners. IE stores the Internet activity for each user under their Windows profile. In Joe's case, since he was using a Microsoft Windows operating system newer than Windows 2000, his IE activity was stored in the following directory:


C:\Documents and Settings\jschmo\Local Settings\Temporary Internet Files\Content.IE5\

The directory listed above stores the cached pages and images Joe reviewed on his computer. Inside the Content.IE5 directory there are additional subdirectories, each with a seemingly random name that contains the cached web data Joe had viewed. IE stores this cached information so that Joe does not have to download the same data more than once if he already reviewed the same web page.

We want to point out that there are two additional IE activity directories that may be of interest. The first directory contains the Internet history activity without locally cached web content:


C:\Documents and Settings\jschmo\Local Settings\History\History.IE5\

Under the directory above, there will be additional subdirectories signifying the date ranges where IE had saved the history. The last directory stores the cookie files for IE:


C:\Documents and Settings\jschmo\Cookies\

An investigator will typically check all three information stores for Internet activity data. Note that an individual can consciously clear these files for many reasons. In addition, several types of software are routinely installed on computers that periodically purge these files. But that does not mean that the information is not available. In part 2, we'll discuss what to do to find these files if they do not immediately appear available. For now, we'll assume that the data and files exists. Then, if we enter any of the directories presented above, you will find a file named Index.dat. The Index.dat file contains the Internet activity for each information store. In the cached web pages directory, this file is populated with more information than the others, even though the internal file structures are identical. In order to rebuild a web page a user had visited, the Operating System must find the correct locally cached web page and the corresponding URL the user visited. This relationship is mapped in the Index.dat file. This is the same technique we will use when reconstructing Joe's Internet browsing activity. The Content.IE5 Internet activity directory will be the most useful to us when we reconstruct Joe's activity because we can view the same web pages Joe viewed in the past through his cached versions of these web pages.

The Index.dat file is saved in a proprietary binary format that is only officially known to Microsoft. However, the following whitepaper describes some of these internal data structures that may be helpful if you try to reconstruct the file by hand.

Firefox/Mozilla/Netscape Based Web Browsers

Firefox/Mozilla/Netscape and other related browsers also save the Internet activity using a similar method to IE. Mozilla/Netscape/Firefox save the web activity in a file named history.dat. One significant difference between a history.dat file and an index.dat file is that a history.dat file is saved in an ASCII format rather than binary. This makes reviewing the file simpler than the corresponding IE file. The second difference with the history.dat file is that it does not link web site activity with cached web pages. Therefore, we cannot readily assemble views of web pages Joe visited in the same manner that we can with IE.

Firefox files are located in the following directory:


\Documents and Settings\<user name>\Application Data\Mozilla\Firefox\Profiles\<random text>\history.dat

Mozilla/Netscape history files are found in the following directory:


\Documents and Settings\<user name>\Application Data\Mozilla\Profiles\<profile name>\<random text>\history.dat

The process of reconstructing web activity manually can be quite tedious. Fortunately, there are several tools, both free and commercial, that streamline this process considerably. The following sections present some of these tools. Please follow along with the web activity data you downloaded in the introduction to this article, and use the tools mentioned in this article to reconstruct the analysis.

Web browsing analysis - open source tools


Pasco (the Latin word for "Browse") is a command line tool that runs on Unix or Windows and can reconstruct the internal structures for IE Index.dat files. Pasco accepts an Index.dat file, reconstructs the data, and outputs the information in a delimited text file format. This format is useful when you need to import the data into a spreadsheet such as Microsoft Excel. Figure 1 shows Pasco in action.


Figure 1. Pasco in Action.

Pasco shows that IE saves the following fields from a single web site visit in the Index.dat file:


  • The record type - Pasco signifies the activity is either a URL that was browsed or a website that redirected the user's browser to another site.
  • The URL - The actual website that the user visited.
  • Modified Time - The last moment in time the website was modified.
  • Access Time - The moment in time the user browsed the website.
  • Filename - The local file name that contains a copy of the URL listed.
  • Directory - The local directory you can find the "Filename" above.
  • HTTP Headers - The HTTP headers the user received when he browsed the URL.

For each row listed in the spreadsheet, you can retrieve the file listed in "Filename" in Joe's local directory named "Directory" to recreate what Joe saw on the web at the time listed in "Access Time."

Although Pasco works well with IE Internet activity files, it does not reconstruct web activity from other web browsers such as Firefox/Mozilla/Netscape. The output of Pasco as used for this article can be downloaded from the SecurityFocus archives [report].

Web Historian

Red Cliff's freeware tool, Web Historian, has the ability to crawl a directory structure and identify Internet activity files for all of the following web browsers:


  • Internet Explorer
  • Mozilla
  • Firefox
  • Netscape
  • Safari (Apple OS X)
  • Opera

What this means is that the investigator no longer has to memorize the paths for Internet activity files for each web browser. Web Historian also has the ability to output the reconstructed data into the following formats:


  • Native Excel Spreadsheet
  • HTML
  • Delimited Text File

A screenshot of Web Historian in use is shown below in Figure 2.


Figure 2. Web Historian in action.


Analysis of the web history

Now that we have the output for Joe's IE Internet activity, we can begin reviewing the websites he visited. During this analysis, we will only present the activity that is relevant to the investigation since there are numerous instances of irrelevant web browsing events that can slow down an investigator. The output from Web Historian is shown below in Figure 3.


Figure 3. Web Historian output.

In the above output we see that Joe visited Hotmail.com. Web Historian shows that the visit to Hotmail created the file named 8R9KCL4N\HoTMaiL[1].htm in the cache directory. If we open this cached file, we will see the following web page in Figure 4.


Figure 4. Web Historian reconstruction of Hotmail activity.

At the top of the web page shown in Figure 4, we see that Joe's Hotmail account is JoeSchmo1980@hotmail.com. We also see that he does not have any interesting email in this account at the time he checked it.


Figure 5. Joe visiting Barnes and Noble.

We see above in Figure 5 that Joe visited Barnes and Noble. It appears as though he is interested in books related to hacking and cracking. There are also other instances of Joe searching for similar material at hacking related websites. In Figure 6 and Figure 7 you will see Joe accessing sites known to have hacking related material. You will also see that Joe is searching for cracks specific to Docustodian, the application that was overloaded with unauthorized material.


Figure 6. Joe searches for serial numbers.


Figure 7. Joe's Google searches.

As you have seen in the last section, we were able to show that Joe, or someone using Joe's account, was interested in information that would allow him to crack the licensing for Docustodian. However, the time that most of the websites were visited was approximately at 5:50:58 PM on March 10, 2005. It's important to remember that Joe was on vacation from March 7, 2005 through March 21, 2005. It would be highly unlikely that Joe visited these websites from a sunny beach in Florida. We would have to look harder at Joe's computer to see how these websites were accessed.

You can download the output of Web Historian, as previously mentioned in the article.

Internet activity analysis - commercial tools

There are several commercial tools that will examine web related activity similar to the freely available tools we presented above. Although we already examined interesting activity in the last section, we will present some of the differences and other interesting web sites Joe's account visited with commercially available tools in this section.

IE History

IE History was one of the first commercial tools developed for web activity reconstruction. IE History is a Windows application that opens several types of web browser history files including IE and Firefox/Netscape/Mozilla. IE History is a lightweight tool that can easily export the web browsing history to spreadsheets and text delimited files.

Within IE History, you can open an Index.dat file similar to the other tools, as shown below in Figure 8.


Figure 8. Using IE history.

Once IE History parses the information in the Index.dat file, it offers some functionality that simplifies your review. For example, in several instances of activity shown in Figure 9, you can right-click and select "Go to URL" to quickly open a web browser and visit the website Joe visited. Notice, however, that IE History does not link the web activity to the relevant cached files. This means that when you review the URL through the right click function, you are actually viewing a live copy of the website. This means that you may be seeing a different view than Joe when he visited the website in the past.


Figure 9. IE History results.

IE History is a lightweight, inexpensive tool that allows you to investigate most web based Internet activity.

Forensic Tool Kit (FTK)

FTK combines some of the functionality from all of the tools we presented in this article. As commercial tools go, this receives our highest recommendation for the ease of use alone. With FTK, you can browse the cached web pages and see them in a web browser-like interface. For example, Figure 10 shows one of the cracking sites Joe's account visited.


Figure 10. FTKs web activity representation.

Figure 11 and Figure 12 show that Joe was interested in Hotels in the Sao Paulo area. Since we know that Joe is currently in Florida on vacation with his family, it is highly unlikely he was the individual responsible for this activity.


Figure 11. Joe's account does some travel planning.


Figure 12. Joe's account was searching for hotels.

FTK reconstructs the visited web pages very well. The drawback when using FTK is the reconstruction of the Index.dat file. Upon selecting an Index.dat file within FTK, you will notice the data is presented in a format that is difficult to use. Each instance of activity is presented as a separate table, and none of the information is clickable. This means importing the data into a spreadsheet would be near impossible.

Concluding part one

Our conclusion at this point is that Joe was probably not the individual using his account when unauthorized activity was performed against Docustodian. This is based on the fact that most of the potentially malicious activity occurred when Joe was on vacation with his family in Florida. However, the Internet searches for books related to hacking and license cracks are indicative of the fact that his machine may have been used by an unknown suspect. Who could it be? Who had access to Joe's machine when he was away on vacation? For the next article in this series we will examine additional investigative leads by performing an in-depth review of Joe's hard drive and the web activity of all other browsers installed on the system.
About the authors

Keith J. Jones is Director of Computer Forensics and Incident Response at Red Cliff Consulting.

Rohyt Belani is Director of Proactive Security Services at Red Cliff Consulting.

View more articles by Rohyt Belani on SecurityFocus.

This article originally appeared on SecurityFocus.com -- reproduction in whole or in part is not allowed without expressed written consent.

0 Favorited
0 Files

Tags and Keywords


Jul 18, 2011 12:10 AM

We'll assume that you are using the commandline...

In DOS cmd prompts, spaces are seen as commandline delimiters, so to use the path, wrap it in quotes so it's seen as one entity..eg

dir "c:\documents and settings\harry houdini"

As of XP SP2 (I think) you can use the unix-like function of typing the first few characters, then tapping the TAB key to autocomplete the line.  If there are multiple options for the autocomplete, then it will cycle through them in alphabetical order with each press of the TAB key..

eg..at the cmd prompt

cd c:\doc[TAB]

will autofill to cd "c:\documents and settings" (note it has wrapped the path in quotes).

Keep using the TAB key to drill through to the index.dat file within the user profile directory.

Another tool worth looking at is IEHistory view from the nirsoft.net site.  It will automatically parse through the main "History.IE5" index.dat file, and recursively through the subdirectories.  It's good on live systems (but not forensically sound method) and dead systems after you have carved the History folder from the evidence file.

Nirsoft also has some very cool tools for parsing USB device lists, protected password storage areas (when you click "remember me"?..guess where that goes) and a miryad of other useful ultilities.  Careful with some of the password tools, AV detects them as hacking tools and will delete/quarantine them for you smiley

Jul 09, 2011 11:01 PM

can not see the c:\documents and settings

error message is: is unavailable; access is denied

Please help

Apr 22, 2011 12:28 PM

Where can the Index.dat file be found after the parsing proces?

I tried it but couldn't find it.


Related Entries and Links

No Related Resource entered.