X-Itools: Email/Web Log Search Engine Wiki

Strong Email & Apache Log Analysis with Active Security Features

Brought to you by: hahnn

(ARCH) ISP type 1 architecture

Authors:

Attachments

isp1_b_arch.png (317729 bytes)

ELSE: some examples of SMTP architectures

Edited by Nicolas HAHN < hahnn@x-itools.com > / < hahnn@erios.org >

Top: [SMTP architecture examples] | Previous: [(ARCH) Load proof architecture with Microsoft Exchange] | Next: [(ARCH) ISP type 2 architecture]

ISP type 1 architecture with Microsoft Exchange server 2010/2013 farms per customer

ISP type 1 architecture with Microsoft Exchange server 2010/2013 farms per customer

Context

The context is the one of an Internet Service Provider (ISP) providing strong messaging services for his customers.

It's a first example of messaging architecture (called Type 1).

First of all, you can see that in the drawing below, even in its basic version, is becoming quite complex.
You've a box in the drawing representing the Datacenter 1. Even if another box for Datacenter 2 is not in the drawing, you can take the Datacenter 1 box and duplicate it for the other datacenters.
Of course as this architecture is the one of an ISP, we consider we have at least 2 datacenters.

That means here, the SMTP relay infrastructure is running in Active-Active mode: the email flows, the load, are equally distributed between all datacenters. For instance if the Incoming postfix servers are all down in datacenter 1, emails will still be able to come in via the incoming Postfix servers of datacenter 2.

Of course, we retreive the architecture we had in the previous example, except we add some additionnal layers to enhance the service and the quality of it:

We now have Postfix Mailing List servers, where Mailman is running
We add Postfix Sender Routers, that will route the emails according the sender email addresses, and not according to the recipient email addresses. Those servers will be of a particular interest for any application servers having the need to send a lot of email notifications for example.
We add Postfix NDR servers, used to receive all NDRs that might be generated, and send them with their own public IPs on Internet
We add a dedicated security layer for Anti-Spam and anti-Virus, that can run in a farm of several servers (content inspection need a lot of resources). This can be outsourced: you can install a Symantec Blackbox playing this role for instance.
We add a dedicated GreyLSE server, that is a Postfix Policy Server.

This architecture is a shared one. It means it is able to handle email flows on the same role-based servers for any customer. This become seggregated only when emails come to every Microsoft Exchange Server farm dedicated to every customer.
The fact this infrastructure is shared still allow each customer to own dedicated public IPs for his messaging domains. Incoming and Outgoing Postfix servers farms can then be configured to have several tens of different public IPs.
But this architecture also gives the possibility for customers to get their real dedicated incoming and outgoing postfix servers if they want. After all of that, it's just a matter of how the emails will be routed between all servers in this architecture.

In term of routing, internal routing is established between all Outgoing and Incoming Postfix servers. If a user of customer A want to send an email to a user of customer B and the ISP is hosting both customers, then the email will never go out of the datacenter, but it will be directly rerouted by the Outgoing server used by customer A to the Incoming servers used by the customer B.

Postfix servers don't send back Non Delivery Reports (NDRs) by themselves, but they relay them to the specific pool of Postfix NDR servers. We do that to help prevent Backscattering attacks, and finally, to prevent the public IPs of the Postfix Outgoing servers, for example, to be blacklisted by all those DNSRBL services existing over internet. The public IPs of the NDR servers can be blacklisted, that will not prevent us to continue to provide the service and to send emails via the Outgoing Postfix servers.

Below all of that, we get our individual Exchange Server 2010/2013 farms dedicated to every hosted customer. Those farms can be in any of the datacenters, or we can distribute the Exchange Servers of every farm in all datacenters, provided the bandwidth between them is huge (Exchange server need a lot because of the permanent database replication!).

We have a unique ELSE server per datacenter. That's enough to process all logs from everything, at the condition the I/O issue is carrefully considered.

The difference between the ISP Type 1 architecture and the ISP Type 2 architecture is that for Type 1, the ELSE server in every datacenter is standalone. That means every Messaging server in every datacenter send all its logs to both of the ELSE servers (in datacenter 1 and datacenter 2). Both ELSE servers are then supposed to have the same data but are completely independant from each other. Both ELSE servers are also monitoring any component of the overall infrastructure in both datacenters. That means any device or component is monitored twice.

Like in the previous example of architecture and of course, we never expose Microsoft Exchange Servers directly to internet. The exposed products are UNIX based and are well known to resist really much more than Microsoft Products running on Microsoft Operating systems. Postfix is extremely robust and is designed to support strong aggressions.

This kind of architecture allow management of extremely huge mail flows and is extremely robust because, for example, the infrastructure is distributed in several datacenters.

Here, the ELSE receive Microsoft Exchange server 2010/2013 logs in near real time and is able to correlate all with Postfix logs. The ELSE also have an interface allowing you to query the Load Balancer if it is a Big IP F5 one, to know what incoming request has been sent to which Microsoft Exchange node. That's a good thing when you track an email and you want to know on which Exchange server the request has been initially received.

Of course the ELSE is on its dedicated servers, to let you investigate Postfix logs/Exchange Server logs without introducing too much changes or perturbations directly on the SMTP servers.

Please also note that all Postfix servers are interacting with the GreyLSE Policy Server that is part of the ELSE solution. For each incoming email, this policy server will tell if it can pass through or if it must be rejected. GreyLSE is here to provide implementation of Grey listing, and even White and Black listing and more.
The ELSE is able to detect attacks and identify the source and the targets. Very soon, that will become automated and near real time features. This will be handled by the RTAAM module of the ELSE. EDIT: RTAAM module has been introduced in ELSE version 0.9.18.

Various customers can then use the ELSE to search their emails without having the opportunity to investigate the emails of the other customers on the same ELSE server. The ELSE is designed for that.
However, the messaging administrators of the ISP can use the ELSE to investigate any email of any customer.

Drawings

Intermediate

ISP architecture 1, basic view

Pros & Cons

This architecture is really designed for ISPs and for huge volumes. This is the alliance of high-availability with ability to absorb heavy loads, all distributed over several datacenters. It's designed to process several tens of millions emails a day. You can manage several hundreds of thousands email accounts with it (provided that you drastically increase the number of Microsoft Exchange Servers in the farm). If you need more, just add servers in various farms.

This architecture is designed to be able to manage totaly different customers with their own domains and constraints, while re-using most parts of the infrastructure. The ELSE is also designed to show each customer what he has the right to see.

Microsoft Echange Server logs and Postfix server logs are all put together and correlated by the ELSE. In those conditions, it's easy to trace any email, to find and to resolve messaging related issues, even issues impacting several customers.

The ELSE have dedicated and standalone servers in each datacenter and this introduces only minimal changes to the Postfix and Exchange servers. They will not be impacted by resource consumption generated by the ELSE server, and the ELSE server can be down for a reason or another without impacting your production Postfix servers. Also, if a ELSE server is down, you still have the other ones having the same data in the other datacenters.

But the fact to have standalone ELSE servers is also a drawback, because any messaging server has to send its logs and data twice or more if you have a configuration based on 3 datacenters for instance. Each time, this is also double storage, double bandwidth, ... Because of that, you might prefer to implement the ISP Type 2 architecture.

Extreme care must be taken in the implementation of the ELSE server, because such architecture made to handle extremely huge email volumes, will generate an extremely huge number of I/Os on the ELSE database. And I/O congestion is a very well-known issue for any kind of database.

We also benefit of separated management of all NDRs, as well as features offered by the sender routers.