Loading...

Knowledge Center


SIEM disaster recovery guidelines and best practices
Technical Articles ID:   KB90674
Last Modified:  10/18/2019

Environment

McAfee SIEM Enterprise Log Manager (ELM) 11.x.x, 10.x.x
McAfee SIEM Enterprise Security Manager (ESM) 11.x.x, 10.x.x
McAfee SIEM Event Receiver (Receiver) 11.x.x, 10.x.x

Summary

When the worst happens, it is not always known what the right path is to recovery. This document covers the most common failures, the best way to restore service and how to preserve existing data. 

Common disasters can include:
  • Loss of ESM because of hardware or file system failure
  • Loss of Receiver because of hardware or file system failure
  • Loss of ELM because of hardware or file system failure
  • Other disk or filesystem corruption not involving a full box replacement

Problem

The best business practice for disaster recovery is to take regular backups and keep one off-site copy monthly.

SIEM backup and export solutions:
  • ESM database backup
  • ELM database backup
  • Policy editor export including custom rules
  • Alarm export
  • Report and dashboard exports
  • Watchlist export
  • Receiver data source export
Devices with no backup. The ESM retains the device configuration:
  • Event Receiver
  • Application Data Monitor (ADM)
  • Database Event Monitor (DBM)
  • Advanced Correlation Engine (ACE)

Solution

Recovery of an ESM without backups
The ESM is one of the most critical devices to back up. All data source, dashboard, and report information is stored on the ESM. If an ESM is lost and a backup is not available, a partial recovery of data might be possible by using a Receiver sync.
  1. When the ESM is lost, immediately SSH directly to any receivers connected to it and back up /etc/NitroGuard/thirdparty.conf.
    NOTE: If the customer data source names in thirdparty.conf contain a dash ( - ) in the name, replace it with another character. Doing so prevents an issue where the data source names are truncated after syncing. 
  2. Install and rack the replacement ESM. Make sure it is on the same version as the previous device. 
  3. Follow KB74464 and set the SSH keys of all other SIEM devices back to factory default. For example, the Receiver, ACE, ADM, DBM, and ELM.
  4. Add the devices back to the ESM using their existing IP address. For example, if the Receiver IP address before the crash is 192.168.100.103, add it to the new ESM with the same IP address. Do not write out any data source settings. It is important that the Receiver keep its original configuration until the sync is done. 
  5. For each Receiver that has been readded, go to Properties, Receiver Configuration, Sync Device. If you receive an error message about the Receiver needing to have no data sources, make sure that no data sources have been manually added back. Data sources include any ePolicy Orchestrator (ePO) devices that associate with a Receiver. For instance, Sync Device does not work if the Receiver has no data sources but there is an ePO device on the ESM that is associated with that Receiver. 
  6. Use the Receiver thirdparty.conf file to pull its data source configuration back to the ESM and automatically re add the devices. Use of the Receiver thirdparty.conf file preserves ipsids and Device IDs and makes it possible to quickly recover event collection by allowing access to the existing ELM data. It takes a few minutes to sync the configuration from the Receiver back to the ESM. 
  7. When the sync is complete, edit the Receiver data sources under the Data Source tab. Write out the data source settings and roll policy. 
    The ESM now begins collecting events from the Receiver and the existing list of data sources is recovered. 
  8. Open any data source on the Receiver and click Logging in the edit view. This action prompts the user to associate an ELM with the Receiver. Answer Yes and allow the action to complete. 
  9. Repeat step 8 for all Receivers on the ESM. This step makes sure that ELM Archive and Enhanced ELM Search work later. 

Recovery of a Receiver after replacement or crash
The Receiver configuration is stored on the ESM. To replace a Receiver:
  1. Rack and install the replacement Receiver, or re-ISO the failed Receiver if an RMA is not needed. 
  2. Provision the new Receiver with the same IP address as the old Receiver, and make sure it is on the same version as the previous device. 
  3. Key the device from the graphical user interface by going to Receiver Properties, Key Management
  4. Under Receiver Properties, Connection tab, click Check status. Continue to the next step when the status is pulled back and does not generate an SSH error. 
  5. Under the Receiver Properties, Data Source tab, modify any data source to enable the write button, and then click write to write out data source settings and roll policy. 
  6. Data collection on the Receiver resumes after the writing and rolling is performed. 

Recovery of an ELM device without an ELM database backup
The ELM is one of the most critical SIEM devices for compliance, and backups must be created regularly. If the ELM hardware is replaced or needs reimaging, it might be possible to recover it. The database location and ELM logs must be on a CIFS, NFS, SAN, or iSCSI device for recovery to be possible. 
  1. Rack and install the new ELM using the same IP address as the previous unit. Make sure that the ELM is on the same SIEM version as the original. 
  2. Reinstall any missing SAN or iSCSI volumes under ELM Properties, Data Storage tab. If NFS or CIFS was used, skip this step. 
  3. Rekey the ELM under Properties, Key Management
  4. Go to ELM Properties, Storage Pool.
  5. In the top window under devices, re add the previously used NFS, CIFS, SAN, or iSCSI device. For NFS and CIFS devices, make sure that you use the same share name and path that was previously used. If the previous share name and path are not known, use the network path where the mgtdb directory is stored. The idea is to create a storage pool device with access to the ELMs mgtdb directory on the network. 
  6. SSH to the ELM and confirm that the network share is accessible by running df -h.
  7. Locate the mgtdb directory on the network share path from the ELM command line. For example, if the NFS share is 10.10.10.10 and the mount point is /elm_storage/nfs_1, you would use cd /elm_storage/nfs_1 and ls -al to find an mgtdb directory. If all else fails, find / -name 'mgtdb' shows all locations. You are trying to find the original mgtdb location on the network. 
  8. After the original mgtdb location is found, examine the symbolic links in /usr/local/elm/mgtdb and /elm_allocations/MGTDBxxx and make sure they eventually point back to the /elm_storage/xxx nfs share. For instance, if the mgtdb was found in /elm_storage/nfs_1/mgtdb, you would create a symlink in /elm_allocations/xxx pointing to /elm_storage/nfs_1/mgtdb. You would then create a symbolic link in /usr/local/elm/mgtdb pointing to the symlink in /elm_allocations/xx which then points to the nfs mount in /elm_storage/xxx. By way of example /usr/local/elm/mgtdb is a symbolic link pointing to /elm_allocations/MGTDB_Alloc123 and /elm_allocations/MGTDB_Alloc123 is a symlink that points to /elm_storage/nfs_1/mgtdb
    • /usr/local/elm/mgtdb symbolic link points to /elm_allocations/MGTDB_xxx
    • /elm_allocations/MGTDB_xxx points to /elm_storage/name_of_NFS_mount/mgtdb
    • /elm_storage/name_of_NFS_mount/mgtdb is where the NFS share is mounted to SIEM and it contains a subdirectory of mgtdb which contains the database. 
  9. Run the command vi /etc/NitroGuard/mgtdbloc.conf and make sure the path there matches the symlink in step 8. For example, /elm_storage/nfs_1/mgtdb.
  10. ELMStop and ELMStart to make the changes take effect. 
  11. After the ELM database starts, the ELM begins working, but the storage.conf and alloc.conf files need to be manually recreated. 

    It is possible to connect to the ELM database, query it, find the names of the storage pools and their location. For example, nquery -d rec -i --long --noblob opens the database (ELM database is still called rec). It is then possible to get the names of the storage devices by select * from rg. It is also possible to get the names of each shid and allocation name by examining tables like rg2sh or sh.
     
  12. SSH to the newly commissioned ELM.

Rate this document

Glossary of Technical Terms


 Highlight Glossary Terms

Please take a moment to browse our Glossary of Technical Terms.