How to solve ESE -566 Errors (JET_errDbTimeTooOld)

Hello all,

After having this issue at a customer, and finding the information on the internet to be a bit old (and haste) I wanted to post what occurred and how to solve it.

So over the weekend I had DR testing at a customer. They are running Exchange 2007 SP3RU8 on all servers, and for HA they have a two node Single Copy Cluster.

We did some maintenance on the cluster, and then attempted to fail over. After waiting about 3 minutes I decided to check (as a healthy cluster should fail over within 60-90 seconds). Afterwards, I saw three DB that refused to mount.

I reviewed the Event Viewer: System logs and saw this..

Event Type: Warning
Event Source: PartMgr
Event Category: None
Event ID: 59
Date:
Time:
User:
Computer: SCC-Node2
Description:
Disk 75 will not be used because it is a redundant path for disk 0.

Not really a warm fuzzy feeling, so I decided to fail back over to SCC-Node1. When that occurred I attempted to mount both of the databases that refused to mount (within Cluster Admin utility by bringing the IS Instance online) and failed. Once I looked within the Event Viewer: Application logs I saw a slew of these error messages:

Event Type: Error
Event Source: ESE
Event Category: Logging/Recovery
Event ID: 516
Date:
Time:
User: N/A
Computer: SCC-Node1
Description:
MSExchangeIS (6312) StorageGroup1: Database F:\database1\database1.edb: Page 231 (0x000000e7) failed verification due to a timestamp mismatch. The expected timestamp was 0xbedab929 but the actual timestamp on the page was 0xbedab403. Recovery/restore will fail with error -566. If this condition persists then please restore the database from a previous backup. This problem is likely due to faulty hardware “losing” one or more flushes on this page sometime in the past. Please contact your hardware vendor for further assistance diagnosing the problem.

and then a few of these:

Event Type: Error
Event Source: ESE
Event Category: Logging/Recovery
Event ID: 454
Date:
Time:
User: N/A
Computer: SCC-Node1
Description:
MSExchangeIS (4408) StorageGroup1: Database recovery/restore failed with unexpected error -566.

The first thing I checked was the health of the actual DB. To do this you should run the following:

  1. Open the Command Prompt
  2. Change directory to %programfiles%\Microsoft\Exchange Server\Bin
  3. Run the ESEUTIL command (eseutil /mh <DriveLetter>:\Path_To_DB)

This command dumps the DB headers, and will give you two pieces of critical information you are looking for:

  • State
  • Log Required

Here is the output from the command I ran:

Initiating FILE DUMP mode…
Database: D:\Database1\Database1.edb

File Type: Database
Format ulMagic: 0x89abcdef
Engine ulMagic: 0x89abcdef
Format ulVersion: 0×620,12
Engine ulVersion: 0×620,12
Created ulVersion: 0×620,12
DB Signature: Create time:09/15/2010 16:58:36 Rand:2592704934 Computer:
cbDbPage: 8192
dbtime: 3888704209 (0xe7c8ead1)
State: Dirty Shutdown
Log Required: 1829641-1829643 (0x1beb09-0x1beb0b)

At this point, I know I am going to need to do some type of recovery that is going to require the logs in the “Log Required” section.

Before I continue, lets discuss what actually happened here based off what I see from my troubleshooting..

What would cause this: 

Before fully understanding why this occurred  you should understand how dbTime works and how the Information Store works. There is a good article here about the Information Store process, but to sum it up there is a Log Buffer that is within the memory. To help reduce I/O on your disk subsystem, transactions are written to the Log Buffer. The Log Buffer (being 1MB in size, since transaction logs are that size) then commits to a log when one of two things happen:

  1. The DB is failing over, and needs to commit the transaction to memory. This could be due to node failure, turning the server off or HA testing
  2. A hard transaction hits the DB (send / receive of email, ect)

When the log buffer needs to write, it writes using the Log Writer process.

To summarize, transactions are not written to the DB immediately but instead to the log buffer and then from the log buffer written from the log buffer utilizing the log writer to the DB.

As for the dbTime, when a new log is generated a unique database time is generated based off the log generation time. The dbTime can be higher than the actual current log time, which is normal. The dbTime is utilized more as a counter than an actual time stamp.

Why does this occur? When we perform DR (hard / soft recoveries) on a DB, the dbTime is examined to see if the log has been replayed or not.

Now to the actual root cause..

What it looked like was that when SCC-Node1 failed over to SCC-Node2, the disk subsystem was overwhelmed along with the network backbone for the SAN, thus causing a mismatch in the time. If you look at Error -566 utilizing the Exchange Error Tool you would see that this theory does match up to the actual error:

JET_errDbTimeTooOld esent98.h
# /* dbtime on page smaller than dbtimeBefore in record */

The question now is how to fix it..

The Fix: 

Although the article above states you should run “ESEUTIL /P” I highly disagree with this. ESEUTIL /P should be ran only as a last resort. I was able to fix this quite quickly using ESEUTIL /R.

From our ESEUTIL /MH content above we know the DB is in a dirty shutdown state, and we wont be able to mount this until it is in a clean state. We see that he log required was 1829641-1829643 (0x1beb09-0x1beb0b). If you look at the log folder, you might be able to find these logs (they should exist – they were created, but the dbTime is “too old”). Look at the first three letters that start off on the log file. In my situation was E070. Remember that. 

Lets go back to our command prompt that is already in the /Exchange Server/BIN folder. From there we would run the following command:

ESEUTIL /R E07 /l<driveletter>:\Log_Location

The reason I am selecting to replay the logs starting with E07 is because the formatting of the command (ESEUTIL /R Enn – Enn = the base log that needs to be replayed). Because I know Log 0x1beb09 = E0700001beb09, I should replay the logs for E07.

Monitor the process as it runs..

Initiating RECOVERY mode…

Logfile base name: E07

Log files: L:\LogLocation

System files: <current directory>

Performing soft recovery…

Restore Status (% complete)

0    10   20   30   40   50   60   70   80   90  100

|—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

……………………………………………

Once this is completed you should have no errors. Check out the status of your DB utilizing ESEUTIL /MH. The db should show a clean shutdown state with 0×0 logs required (meaning there are no logs that are required for the db to replay):

Initiating FILE DUMP mode…

Database: F:\NAData21\NAMailboxDatabase21.edb

File Type: Database

Format ulMagic: 0x89abcdef

Engine ulMagic: 0x89abcdef

Format ulVersion: 0×620,12

Engine ulVersion: 0×620,12

Created ulVersion: 0×620,12

DB Signature: Create time:09/15/2010 16:58:36 Rand:2592704934 Computer:

cbDbPage: 8192

dbtime: 4104664568 (0xf4a835f8)

State: Clean Shutdown

Log Required: 0-0 (0×0-0×0)

Once you have also replayed the logs, the dbTime log sequencing is also updated so that logs will be stamped once the DB is mounted.

Attempt to mount your database again and you should be good.

Any questions leave them below in the comments section.

(and for the Redditors who read this)

TL;DR: Don’t run ESEUTIL /P as your first step! /P = Panic = :( data loss! Run ESEUTIL /MH to get the status of the DB and then run ESEUTIL /R if you are able to find the logs / restore the logs.

-Adam F

Troubleshooting Intra-Org SMTP Traffic Issues (and disabling Cisco ASA ESMTP Inspection)

I had a particular issue that is not well documented on the Technet site so I decided I would blog about it and share my experience.

Today I was doing an Exchange 2003 to 2010 upgrade for a customer. Their AD setup was quite typical, two Active Directory sites under a single forest interconnected by a Cisco ASA in each site doing IPSEC. They wanted an Active / Active configuration (one site will be doing SMTP and CAS traffic, which is NY and the FL site will be doing just CAS traffic) which was fine. Here comes the problems.

We tested mail flow from the NY site both external (inbound/outbound) to a user in the NY site along with the Exchange 2003 setup. All worked without a problem. When we tried to send an internal or external message to a user within the FL site, it would get stuck in the messaging queue with the status of “Retry”.

The first thing when troubleshooting intra org SMTP traffic is to review the Receive connectors. By default, you would have two…

  • Client <servername>
  • Default <servername>

The default receive connector is used for SMTP traffic, and is listening over TCP 25. The client receive connector is used for ESMTP, as it is listening over TCP 587.

You should look at several things:

1)      What are the subnets allowed within the Network’s tab on my Default <servername> connector? By default, you should be allowing all subnets to utilize this connector to send message to this server.

Image

2)      What is my authentication set to for this receive connector? I have seen this one a lot, and where people think it’s an awesome idea to remove all types of authentication. Its not, don’t do it. By default is what you see below, and should be kept this way. If you need to change this, I would recommend making a new SMTP connector with the values desired

Image

3)      Who is actually allowed to use this connector? The values for the defaults are below (Exchange 2010 SP3) but should be kept alone. If you want to select “Anonymous” I would recommend setting up a new Receive connector and setting up the network section to allow only the servers that would need to relay off of this server. This prevents your Exchange 2010 servers from being a generic SMTP relay out in the interwebs for anyone to use and abuse.

Image

Since we are troubleshooting Exchange 2010 intra-org SMTP connectivity, we want to make sure that we keep the Exchange servers checked within the Authentication and Permissions group tabs. If this is unchecked, check it off and restart the “Microsoft Exchange Transport Service” (anything changed in here should have a restart of this service).

If this is fine, we should move onto actually checking if port 25 is available between the two sites. You can check this by running the following from the command line..

telnet <Name of the HUB server in the other site) 25

The response should look similar to below

Image

If we are able to telnet to this without a problem, then check out the other ports listed here. You should have these also open and available to for intra org SMTP communications and also for the HUB Transport services to work properly.

The best troubleshooting in my opinion would be to turn up the IntraOrgProtocolLoggingLevel to Verbose. This should be done on all HUB transport servers when troubleshooting, and can be done like this:

      Get-TransportServer | Set-TransportServer –IntraOrgProtocolLoggingLevel Verbose

Once this is done, attempt to send another test message. Give it about 5 minutes and then check the logs from the sending HUB transport server. These logs can be located (by default) at %PROGRAMFILES%\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLogs\SmtpSend

From there review the errors. There are usually typically a handful that can be there, but the one distinct one I saw today was “cannot achieve Exchange server authentication – failed”.

When reviewing further I noticed the following error within the queue viewer:

“451 5.7.3 – Cannot achieve Exchange Server authentication”

The Cisco ASA does protocol level filtering, the 250 STARTTLS reply is not established and dropped. To get IntraOrg SMTP Traffic to work, we would need to turn off the ESMTP inspection which can be done as shown below (thank you Lazy Network Admin for these steps below)

      CiscoASA# config t 
      CiscoASA(config)# policy-map global_policy 
      CiscoASA(config-pmap)# class inspection_default 
      CiscoASA(config-pmap-c)# no inspect esmtp 
      CiscoASA(config-pmap-c)# exit 
      CiscoASA(config-pmap)# exit 
      CiscoASA(config)# exit 
      CiscoASA# wr me

Once this is done retest, and you should be receiving emails from the other site. If you did turn on the intra organization protocol logging, I would recommend turning it off.

       Get-TransportServer | Set-TransportServer –IntraOrgProtocolLoggingLevel None


Now, these are not the only troubleshooting steps in the world. If you continue to have issues with intra-organization mail delivery I would also check within the Exchange Toolbox the routing configuration, and run through the Mail Flow Troubleshooting wizard.

Hope this is helpful to all! Any questions please let me know below.

Adam F

Exchange Proxy and Redirection (Exchange 2007 and 2010) Explained

There has been a lot of discussion recently around proxy/redirection with Exchange 2007/2010 so I wanted to clear the air about what each one does, and how they work with CAS services.

What is the real definition of Proxying and Redirection?

To understand proxying and redirection, you should understand the meaning of both. Later in this article I will cover when/why proxy or redirection happens.

Proxying occurs when a CAS role sends traffic to another CAS role in particular situations (read below for more about that). For example here are two common situations:

  • CAS to CAS communication between two AD sites
  • CAS to CAS communication between Exchange 2010 and 2007 or 2003

Redirection occurs when there are two AD sites with Exchange 2010 / 2007 / 2003 and both are internet facing.

When do I proxy or redirect a CAS connection?

This is a little more complex. The following CAS protocols / services are proxy enabled:

  • Outlook Web App (OWA) and Exchange Control Panel (ECP)
  • Exchange ActiveSync (EAS)
  • Exchange Web Services (EWS) and the availability service (part of EWS)
  • POP3 / IMAP

And then the only three services that will actually redirect are OWA/ECP and EAS.

On to the fun part..

When do I actually proxy my connections from one AD site to another?

Simple, and we will use OWA as the example.

Adam’s mailbox is in the EU AD site, which has an ExternalURL set to $NULL (no value) for OWA but the InternalURL is set to euemail.exchangelaboratory.com. My internet facing site is the US AD Site, and we have an ExternalURL/InternalURL of naemail.exchangelaboratory.com.

Since OWA does have the capability of Proxy and Redirection, when the initial authentication request goes to the UAG or CAS (depending on your deployment) exchange is looking for three items from the GCS in a LDAP/RPC request, which is

  • My homeMDB (mailbox location)
  • My msExchangeVersion (the version of Exchange the user mailbox is on)
  • Authentication Token to fulfill my request

If it notices that my homeMDB is in another site, then it will query that CAS for the InternalURL and ExternalURL of the services in question (in this case, OWA). Since the ExternalURL is set to $NULL in the EU AD Site CAS, then the only option we have will be to proxy.

What occurs at this point is that the request gets proxied from the source server (cashub1.exchangelaboratory.com) to the target server (cashub2.exchangelaboratory.com), the CASHUB2 server will contact the Mailbox server for the appropriate data from the user’s mailbox and then proxy the connection back to the source server, which then delivers the content to the user. Beautiful thing.

Image

Now for redirection:

In this situation, we now have an “Active / Active” configuration as in both the US AD Site and the EU AD Site are internet enabled. My ExternalURL for OWA in the EU AD Site is euemail.exchangelaboratory.com, and my ExternalURL for OWA in the US AD Site is naemail.exchangelaboratory.com. We will use Adam again as our example mailbox.

When Adam attempts to authenticate to the URL naemail.exchangelaboratory.com the Exchange CAS (or UAG depending on your configuration) will perform the auth request along with the Mailbox Location and Version lookup. Exchange will realize that Adam’s mailbox is actually in the EU AD Site, and will send a HTTP 451 request back to the client to redirect to the proper internet facing URL (in this situation, euemail.exchangelaboratory.com). Once this is complete, it will either be a silent redirection or the user will have to authenticate again.

Image

ActiveSync and OWA Virtual Directory permissions to make redirection and proxying to work?

To make sure we can successfully redirect Exchange ActiveSync and OWA requests, we need to ensure that the proper permissions are setup on the IIS Virtual Directories.

Exchange 2003 (non-internet facing) and Exchange 2010:

In this situation, our endpoint for client access is Exchange 2010 as this is internet facing and the most up to date version of Exchange. The Exchange 2003 site is not internet facing, so the Exchange ActiveSync protocol would have to proxy from Exchange 2010 to 2003 for any users within the Exchange 2003 environment.

To allow this to work, you should enable IWA (Integrated Windows Auth) on the IIS Virtual Directory Microsoft-Server-ActiveSync on the Exchange 2003 server. This should not be done through the IIS Manager, as the System Attendant in Exchange 2003 will overwrite these changes and break proxying. You can do this through the steps available here.

Exchange 2007 (non-internet facing) to Exchange 2007

In this situation, we have two AD sites both running Exchange 2007. One site is internet facing and the other is not, thus if a user is within the non-internet facing site we have to proxy the connection from the internet facing CAS to the non-internet facing CAS.

To allow this to work we need to enable IWA on the non-internet facing Exchange 2007 CAS. To do this we will need to go through IIS Manager, and the steps can be found here.

Single Sign On for OWA – Exchange 2010 SP2?

A new feature the Exchange Team at Microsoft implemented in Exchange 2010 is called silent redirection. I would recommend setting this up, the reason for this is because you now eliminate the need for a double authentication when doing a redirection within OWA on Exchange 2010 SP2.

This does get a little lengthy, and there is an excellent blog from the Exchange Product Group located here.

 

Well, that’s all I have for now. Leave your questions below if you have any. Thanks for reading!

Adam F