KB Article #102122
Outbound queue backing up or outbound mail delivery slow.
Outbound queue backing up or outbound mail delivery slow.
Summary:
The EMF Outbound queue is processed by every EMF relay instance (every running EMF relay service) that is set with the outbound function. "Outbound" here refers to mail flowing out of EMF, either to your internal mail server, or out to the Internet. Sometimes the Outbound queue will grow large and mail delivery will stop or slow down. This technote discusses troubleshooting techniques for this situation.
Symptom:
The Outbound queue is growing large (the Outbound queue can be viewed in EMF 5.5 and higher), users are reporting that mail is failing to reach them, Windows Performance Monitor shows outbound mail flow very slow or stopped.
Resolution:
There were a number of EMF issues detected and fixed that may contribute to messages backing up in the EMF Outbound queue. Anyone experiencing similar issues should ensure that they are running EMF 5.5 Patch 2 HF10 or later. Updates to EMF are available by using the Tumbleweed Update Service by going to the Tumbleweed program group (Start Menu > Programs > Tumbleweed Email Firewall) and select "Check for Updates" (see related article on right Getting EMF Updates).
While it's important to ensure that your EMF SMTP Relay settings are properly configured, there are two very common problems that might cause mail to backlog in the outbound queue that need to be resolved externally from EMF.
DNS time outs
Events in the EMF Event log similar to:
Event ID: 1079
Event Description: No response from DNS host.
Event Details: No response from DNS host 10.1.9.9 within 60 seconds.
A number of different issues could cause sporadic DNS response failures. Tumbleweed has been able to reproduce this issue by installing DNS on a different network subnet than the EMF SMTP Relay. We've also confirmed that some customers that have reported similar issues also have DNS installed on a different subnet than EMF or at a remote ISP. Note that this is NOT the ONLY reason why DNS queries might fail to be responded to.
EMF 5.x has a potential problem where DNS requests are dropped by intervening firewalls because EMF 5.x improperly reuses DNS request ID's. (This particular problem was fixed in EMF 6.0.) The firewall log can be checked to see if DNS requests are being dropped.
Note that an EMF outbound connection is consumed while EMF is waiting for DNS response. This reduces the number of outbound connections available, and may cause the Outbound queue to backup. As an example, one EMF customer was not performing current employee (SMTP recipient address) validation, causing his internal mail server to generate non-delivery reports back out to the Internet, often to bad addresses. When EMF tried to deliver these non-dels, DNS was timing out, but many outbound connections were being consumed by this process, causing the Outbound queue to backup. The customer solved this problem by adding recipient address validation to their EMF server.
The overall DNS timeout is 10 minutes. EMF actually waits 30 seconds per DNS host, then 60 seconds per DNS host, and so on up to a total of 10 minutes. After the 10-minute timeout, the message is placed in the EMF Retry queue, and is subject to the configured Retry intervals.
One tested and recommended solution to DNS timeouts is to install a caching-only DNS server local to the EMF SMTP Relay, and then configure EMF to use itself as its primary DNS server (found under webadmin Set Up > Relays). You should still leave your other DNS servers in the DNS configuration as secondary.
Installing a caching DNS server on the EMF box is simple. Information on installing caching DNS using Microsoft DNS can be found on Microsoft's Web site:
Microsoft Knowledge Base Article - 167234
(How to Create a Caching-Only Name Server with Microsoft DNS). You install DNS from Add/Remove Windows Components, then configure it.
Slow remote hosts
If EMF is trying to send a lot of mail to a remote SMTP server that is responding slowly, it is expected that mail will backup in the Outbound queue waiting to be sent to that host. You should ensure that you are on the latest release of EMF (EMF 5.5 Patch 2 HF10 or later) as a defect has been fixed that may cause EMF to fail to open ANY outbound connections if it has a lot of mail queued for a single host. You should also ensure that your relay settings (discussed below) are set appropriately.
The following event in the EMF Event log may be an indication that you are trying to send to a slow remote host:
Event ID: 1506
Event Description: Cannot open additional connections to host
Event Details: Connection cache for '1.2.3.4' is full.
Nothing you can do to EMF will make mail flow to a slow remote host any faster if that remote host is the bottleneck. That problem should be addressed separately from evaluating any EMF problems.
For more information on this issue, see related article on right Warning- Cannot open additional connections to host.
NOTE: A few customers have reported to us that Windows 2000 SP3 may cause latency issues when applied to an Exchange 2000 server, and that backing off of SP3 has resolved this issue for them. As far as Tumbleweed is aware, Microsoft has not confirmed this issue (as of 6/11/2003).
Basic outbound backup troubleshooting
1) Is the relay service running? Check the Status page to ensure that you have at least one instance of the "Email Firewall SMTP Relay (Outbound)" running. You must also have at least one instance of (Partition) running or mail backlogging in the partition queue will display as being in the Outbound queue.
2) In EMF 5.5 and later, on the EMF webadmin Set Up > Relays page, is Stop All Outbound Mail checked?
3) Ping your firewall and internal mail server by IP address. If ping fails, either the servers are down or there is a network problem.
4) Open a CMD window on the EMF relay box, and try:
CMD> nslookup
> server s
> set q=mx
> tumbleweed.com.
where s = the fully qualified domain name of the DNS server your EMF server is configured to use (this will be indicated at the top of the EMF webadmin relay setup page). Please be sure to specify the period at the right end of tumbleweed.com.
You should get back the MX record(s) for Tumbleweed, indicating the tumbleweed mail server name(s).
Then issue the command:
CMD> telnet tumbleweed-mail-server-name 25
If you do not get a "220" relay response, a connection was unable to be established with Tumbleweed. If this is the case, repeat the procedure with another Internet domain, like yahoo.com. If you cannot establish connections with external mail servers, there may be network problem, or a firewall problem.
5) Repeat the telnet 25, but this time to your internal mail server(s). Same analysis.
6) Are the Outbound connections downstream saturated? Check the inbound queue on your internal mail server and any outgoing relays (or firewall log).
7) Are there any relevant errors in the Windows system or application logs?
8) Are there any relevant errors in the EMF event log? Be sure to check for errors that may be occurring trying to access the EMF database (Event ID's in the range 17xx).
If there are 17xx errors, check the SQL logs, especially for SQL Error 9002: The log file for database 'tempdb' is full. We have seen problems with the tempdb database cause Outbound delivery problems. Please see related article SQL Error 9002: The log file for database 'tempdb' is full on the right for more information.
9) Are there any errors or warnings in the EMF event log indicating SMTP connectivity issues. 1006 or 1506 events would indicate this and should be investigated.
10) Make sure that your EMFMail database Properties has the Autoshrink option unchecked, and that you do not have any custom SQL jobs that perform Autoshrink on the EMFMail database.
11) Check available disk space, especially on the SQL partition.
12) If the Outbound queue is visible (EMF 5.5 and later), is most of the mail stuck in Outbound destined for the Internet, your internal mail server, or both? Check the leading messages in Outbound for number of recipients. We have seen cases where a high number of recipients (e.g., 40) caused Outbound to backup.
13) When did problem start? Were there network or other configuration changes made around that time (changes on EMF or the internal mail server or firewall)? The entire problem may be that your mail volume is increasing, either spiking several times during the day, or slowly increasing over time. In either case, you may need to scale your internal mail servers and your network hardware. Your ISP's mail servers may even become involved at some point.
14) Try a reboot.
15) See if you can analyze the events pertaining to a particular message (perhaps a message that was reported as not delivered) using related article on right finding lost messages- tracing messages via the event log.
16) Check with your ISP for router, DNS, or other problems.
17) Have you installed any software on the EMF relay machine(s) that blocks port 25? For example, McAfee Antivirus 8.0 installs blocking several ports by default, including port 25.
Other Configuration Considerations
1) In the EMF relay setup, increase the number of "Maximum Outbound connections" -- this is usually a safe thing to do, as long as the network/machines downstream (like the internal mail server) are not overloaded. Increasing the max connections should not be much of an additional strain on the EMF server itself.
2) In the EMF relay setup, increase the number of "Maximum Outbound connections per host". This is the maximum number or connections that EMF will open to any one host or domain. If all of your mail is backlogged going to one destination, this number may not be high enough. This value should never be set higher than 1/2 or your configured "Maximum Outbound connections" (step 1).
3) Try reducing the Delay Closing Outbound Connection setting on the relay setup page. This is the amount of time (seconds) an outbound TCP connection is held open by EMF while there are no other messages queued to that host, so that successive messages can be sent over the same connection without incurring the connection setup overhead. This setting is a performance parameter only, and can be safely reduced, to zero if necessary. The default is 30 seconds. We recommend reducing it in decrements of 10 seconds.
Note that if the EMF server is using DNS to send mail to the Internet, a lower value for this is better - even 0 is good. If EMF is talking to an SMTP firewall or sendmail box and not directly to the Internet, a higher setting may be appropriate, i.e., 30 seconds.
4) Check the 'Max Message Backlog' parameter in the EMF database.
If you have EMF installed on a single host or you only have one Outbound SMTP Relay connection to the EMFMail database:
- Open SQL Enterprise Manager
- Navigate to the EMFMail database > Tables
- Right click the RelayConfigValues table > Open Table > Return all rows
- Increase the 'Max Message Backlog' parameter from 10 to 20
- Click on any other field to commit the change to the database
- Restart the SMTP Relay Service
Next Steps
If the source of the problem is still not determined, capture an EMF SMTP trace:
- temporarily set the EMF relay event logging to Trace level (level 3)
- note the time
- after a few minutes, reset it to Normal logging (level 2)
- note the time
- run the EMFSave utility on your SQL box, from Start > Programs > Tumbleweed EMF
- select the Event Log option only, and specify the appropriate time range
- browse to a temp location, and choose a name for the save zip file
- press the Copy button, and wait for completion
Additional Info:
Please also see these related articles on the right:
Using Nslookup to find a domains SMTP servers on the Internet
Using Telnet to manually send an SMTP message to a relay
Maximum number of outbound connections exceeded -- MMS 4.x
Troubleshooting SQL Server in EMF