KB Article #102088

Retry Queue

Retry Queue

Summary:

EMF places mail that it temporarily can't deliver into the Retry Queue. Mail in the retry queue is usually there because of a problem with the next relay that EMF is trying to send to, not because of a problem with EMF.

QUICK CHECK:
Messages in retry are all inbound - check internal mail server(s) and network connectivity

Messages in retry are all outbound - check Firewall or next relay - verify DNS is working correctly. Check Internet connectivity.

Messages in retry are to one or just a few domains - problem is probably on the recipients end, try connecting manually.

TOPICS DISCUSSED:
* Mail Routing Rules
* DNS - MX and A records - load balancing and MX preferences
* Retry Interval Settings and DSNs
* Troubleshooting and Using Telnet to test a Relay

Details:

Retry discussion and Mail Routing Rules

There are a couple of reasons why mail might sit in the retry queue instead of being delivered. The retry queue is a location that EMF utilizes to store a message that can't be delivered to the next relay machine. For example, in a typical, simplified, enterprise environment mail is sent from an Internal mail server (Exchange, Notes, Sendmail, etc.) to EMF to the Internet:

EXCHANGE <==> EMF <==> INTERNET

If a message is sent from Exchange destined for a host on the Internet, it would follow the route above, from left to right. Exchange -> EMF -> Internet Host. EMF typically uses DNS to determine the IP address of the Internet Host that it is sending to. If the recipient host is temporarily not available, EMF will move the message to the Retry queue and then continually keep trying at an interval specified by parameters that you can set and a "back off" algorithm.

When a message is placed in Retry, the following event is logged:

Event ID: 1007
No more relay hosts to try
If within the expiration time limit, the message will be placed in the Retry queue

(EMF 4.x logs the event: "Out of mail hosts to try, placing message in retry queue".)

Many domains have multiple mail hosts that will accept mail for their domain. DNS is responsible for returning the hosts in a particular order. If EMF tries connecting to each host and still can't send the message, it will place the message in retry, and log the retry event. Keep in mind that many domains have only one mail host.

The Setup-Relays page in the EMF Web Admin is where you configure how mail should be routed. The main component is the "Mail Routing Rules" section. Here you define your domains and how mail for those domains should be routed. Each domain has an "Outbound Mail Delivery" column that needs to be populated. Click Edit for the properties of the routing preference.

A "default" domain should be listed in the Mail Routing Rules table to match all domains that aren't explicitly listed. The options for Outbound Mail Delivery are "Same as Default Record", "DNS", or "Relays in the following priority order" meaning that EMF will try these hosts in the order that they are listed. You can also specify "MX For Domains" and list a domain rather than a host, EMF will do a DNS lookup for the specified domain and try the MX records returned by DNS in the proper order.

Typically, Routing Rules are configured so that the default entry matches all outbound mail from your organization, and the other entries match mail inbound to your organization. For mail leaving your organization, you would configure the "Outbound Mail Delivery" for the default domain to be either a Relay server or DNS. You need to understand how EMF is configured to route mail in order to understand why mail might not get delivered.

You might use a Relay for the following topology:

EXCHANGE <==> EMF <==> FIREWALL <==> INTERNET

In this situation EMF will hand off all mail that matches the default domain to the Firewall, who is in turn responsible for doing the DNS lookup on the receiving domain. If the Firewall stops accepting mail, EMF will place all outgoing mail in the retry queue.

You might use DNS for the following topology:

EXCHANGE <==> EMF <==> INTERNET

In this case, EMF is responsible for looking up the IP address of the host responsible for handling mail for a particular domain. DNS is responsible for returning the Mail hosts to EMF in a particular order.

DNS - MX and A records - load balancing and MX preferences

You can do the same thing that EMF does by using Nslookup from the Windows command prompt. For example a lookup for Apple.com might return the following (simplified):

apple.com MX preference = 10, mail exchanger = mail-in.apple.com
apple.com MX preference = 30, mail exchanger = mail-in.euro.apple.com
mail-in.apple.com internet address = 17.254.0.57
mail-in.euro.apple.com internet address = 194.151.19.117

EMF will first look at the MX records for Apple.com, and then the A records for the hosts associated to the MX record.

In this example, EMF will first try to send a message to user@apple.com to the server mail-in.apple.com because it has the lowest numbered preference (10). The IP address associated with mail-in.apple.com is 17.254.0.5, therefore EMF will first try to connect to that IP. If that connection fails, EMF will then look at the next lowest numbered MX preference (in this case 30), determine the host name (mail-in.euro.apple.com), get the IP address for that host (194.151.19.117), and then try connecting to that IP address. Once EMF has failed to connect (or failed to send) to all of these machines and it has "No more relay hosts to try", EMF will put the message in the retry queue and try again later.

A couple of notes about DNS:

In many cases you might see a list of MX preferences all with the same number. In this case it is the responsibility of DNS to return the hosts in random order. This is a common load balancing technique. For the example:

- microsoft.com MX preference = 10, mail exchanger = mail1.microsoft.com
- microsoft.com MX preference = 10, mail exchanger = mail2.microsoft.com
- microsoft.com MX preference = 10, mail exchanger = mail3.microsoft.com

DNS will return the hosts Mail1, Mail2, and Mail3 in random order for each lookup of this domain. This is one way Microsoft can maintain 3 mail servers each handling 1/3 of the total mail volume.

Similarly, for one mail host (mail-in.apple.com) you can have multiple A records - or multiple IP addresses, associated with one host name. In this case also, DNS is responsible for returning the IP addresses for the host in random order.

- apple.com MX preference = 10, mail exchanger = mail-in.apple.com
- mail-in.apple.com internet address = 17.254.0.57
- mail-in.apple.com internet address = 17.254.0.58

In this example, when looking up the IP address for mail-in.apple.com you will receive the .57 address half the time and the .58 address the other half. This is another load balancing technique called round-robin.

Retry Interval Settings and DSNs

Once a mail message is placed in the retry queue, it will be retried based on the retry interval settings you define in the setup page for the Retry queue and a "back off" algorithm. The default settings are to retry in a minimum of 3 minutes, a maximum of 3 hours, and to expire (return to sender) in 3 days. What this means is once the message is placed in retry it will try a second time in: 3 minutes then 6 minutes then 12 minutes, then 24, 48....until it reaches 3 hours, when it will continually try every 3 hours until it is returned to the sender after 3 days.

EMF complies with the ESMTP DSN standard or "Delivery Status Notifications" (see related article EMF delivery status notifications (DSN) on the right). This means that EMF will send a DELAYED notification if the message is stuck in retry for a period of time. This period of time is configurable only to the extent that you can change your Minimum and Maximum retry intervals. The DELAYED notification is sent after EMF has attempted to send the message at the last increment before it reaches the Maximum retry Interval. See timeline below. DELAYED notifications can be disabled if desired (see related article Disabling non-failure Delivery Status Notifications (DSNs)). EMF will also send a FAILURE DSN at the Retry Expiration time.

Assuming a minimum retry interval of 3 minutes, maximum retry interval of 3 hours, and an message expiration of 1 day, the timeline of a message in Retry would look like this:

00:00 Initial Outbound Delivery Fails - message placed in Retry
00:03 (00:00 + 3 minutes) retry attempt
00:09 (00:03 + 6 minutes) retry attempt
00:21 (00:09 + 12 minutes) retry attempt
00:45 (00:21 + 24 minutes) retry attempt
01:33 (00:45 + 48 minutes) retry attempt
03:09 (01:33 + 96 minutes) retry attempt
03:09 DELAYED notification sent to sender
06:09 (03:09 + 3 hours) retry attempt
09:09 (06:09 + 3 hours) retry attempt
12:09 (09:09 + 3 hours) retry attempt
15:09 (12:09 + 3 hours) retry attempt
18:09 (15:09 + 3 hours) retry attempt
21:09 (18:09 + 3 hours) retry attempt
21:09 FAILURE notification sent to sender

Troubleshooting and Using Telnet to test a Relay

QUICK CHECK:
Messages in retry are all inbound - check internal mail server(s)

Messages in retry are all outbound - check Firewall or next relay - verify DNS is working correctly. Check Internet connectivity.

Messages in retry are to one or just a few domains - problem is probably on the recipients end, try connecting manually.

To check a relay to see if it is working you can telnet to it. All relay's on the Internet should listen on port 25. To test this, you can telnet to the IP address of the relay on port 25. Assuming you wanted to test your Exchange servers relay, you could type the following at the NT command prompt on your EMF server:

telnet 10.10.10.2 25 (assuming 10.10.10.2 is the IP address of your Exchange server)

The relay should respond with a 220 "relay ready" response if it is working. Any other response (other then 220) or no response, indicates that a relay isn't working, or your server can't "see" that server for some reason. If your server can't "see" the other server, you should use standard network troubleshooting techniques to determine the problem. If you can "see" the server but can't connect to it, this might mean that the relay service is not running. In either case it explains why EMF can't send mail to it either. There is nothing magical about how EMF connects to another relay, if you can't telnet to the other relay, EMF won't be able to either.

It is possible to get a message in retry even if EMF is able to contact the next server. During a normal, successful SMTP conversation, EMF will send SMTP commands (Helo, Mail From, Rcpt To, and Data) to the next relay and the relay should respond with some message preceded by a 200 series response. For example:

HELO Tumbleweed.com - should be followed by

220 OK, pleased to meet you Tumbleweed.com

The 220 (200 series response) indicates a successful response.

The server might also, for whatever reason, return a 400 series or 500 series response. A 400 series response indicates a temporary problem on the downstream relay's part, where a 500 series response is a permanent error. If we (or any other relay) receives a 400 series error, we will close the session and place the message in retry. If we receive a 500 series error, we will close the session and return the message to the sender. These error codes are defined in RFC2821 - the standard that all Internet Mail clients must adhere too in order to all be compatible.

Additional Information:

For more information on using nslookup, see related article Using Nslookup to find a domains SMTP servers on the Internet.

For more information on using telnet, see related article Using Telnet to manually send an SMTP message to a relay.

In some rare cases, an SMTP session might get broken in the middle of the communication. It is often useful when troubleshooting delivery problems to set the SMTP Logging Level to Trace and trace the message using the event log - see related article Finding lost messages- tracing messages via the event log .