A powerful, easily deployable network traffic analysis tool suite
This document outlines how to install Malcolm and Hedgehog Linux using the project’s installer ISOs. These instructions apply to installing this software both on a “bare metal” system or in a virtual machine environment using VMware, VirtualBox, QEMU/KVM, etc.
The Malcolm and Hedgehog Linux installers as described in these instructions are intended to be used to replace the existing operating system, if any, of the respective systems onto which they are installed, and, as such, are designed to require as little user input as possible. For this reason, there are NO user prompts and confirmations about partitioning and reformatting hard disks for use by the operating system. The installer assumes that all non-removable storage media (eg., SSD, HDD, NVMe, etc.) are available for use and ⛔🆘😭💀 will partition and format them without warning 💀😭🆘⛔.
In contrast to using the ISO installer, Malcolm can also be installed “natively” on any x86_64 platform that can run Docker. See the installation example using Ubuntu 22.04 LTS for that method of installation and configuration, or Windows host system configuration and macOS host system configuration for those platforms.
Malcolm can be packaged into an installer ISO based on the current stable release of Debian. This customized Debian installation is preconfigured with the bare minimum software needed to run Malcolm.
Similar instructions exist for generating the installer ISO for Hedgehog Linux, Malcolm’s dedicated network sensor appliance OS.
While official downloads of the Malcolm installer ISO are not provided, an unofficial build of the ISO installer for the latest stable release is available for download here. If downloading the unofficial builds, be sure to verify the integrity of ISO files against the SHA256 sums provided on the download page.
Various methods can be used to write the contents of an installer ISO image to a USB flash drive. One simple free and open source application for doing so Etcher, which can be used on Windows, macOS and Linux platforms.
Alternatively, specific instructions may be provided by your operating system (e.g., Arch Linux, Debian Linux, Ubuntu Linux).
Using one of these methods, write the Malcolm and Hedgehog Linux installer ISOs to two 8GB or larger USB flash drives, respectively.
Alternatively, the ISO images could be burned to writable optical media (e.g., DVD±R). For the Malcolm installer you’ll likely have to use DVD±R DL (“dual layer” or “double layer”) DVD media as the installer ISO exceeds the 4.7 GB storage provided by standard DVDs.
Using Etcher on macOS
Using dd on Linux
The ISO media boot on systems that support EFI-mode and legacy (BIOS) booting. Configuring your system’s firmware to allow booting from USB or optical media will vary from manufacturer to manufacturer. Usually manufacturers will provide a one-time boot options menu upon a specific keypress (e.g., F12 for Dell, F9 for HP, etc.). If needed, consult the documentation provided by the hardware manufacturer on how to access the boot options menu and boot from your newly-burned USB flash media or DVD±R.
An example of an EFI boot manager in QEMU
An example of a BIOS boot options menu in QEMU
Upon Booting the Malcolm installation ISO, you’re presented with the following Boot menu. Use the arrow keys to select Install Malcolm, and press Enter.
The first screen of the installer
The next screen of the installer presents the following options relevant to installation:
The Install Malcolm menu
After making your selection for the type of Malcolm install to perform, the installer will ask for several pieces of information prior to installing the Malcolm base operating system:
sudo
groupAfter the passwords have been entered, the installer will proceed to format the system drive and install Malcolm.
At the end of the installation process, you will be prompted with a few self-explanatory yes/no questions:
Following these prompts, the installer will reboot and the Malcolm base operating system will boot.
The Malcolm installer does not require an internet connection to complete successfully. If the installer prompts you to configure network connectivity, you may choose “do not configure the network at this time.”
The Malcolm base operating system is a hardened Linux installation based on the current stable release of Debian running the XFCE desktop environment. It has been preloaded with all of the components that make up Malcolm.
NetworkManager can be used to configure networking for Malcolm. NetworkManager can be configured by clicking the 🖧 (networked computers) icon in the system tray in the upper-right corner of the screen, or right-clicking the icon and selecting Edit Connections… to modify the properties of a given connection.
Display resolution should be detected and adjusted automatically. If you need to make changes to display properties, click the Applications menu and select Settings → Display.
The panel bordering the top of the Malcolm desktop is home to a number of useful shortcuts:
The first time the Malcolm base operating system boots the Malcolm Configuration wizard will start automatically. This same configuration script can be run again later by running ./scripts/configure
from the Malcolm installation directory, or clicking the Configure Malcolm 🔳 icon in the top panel.
The configuration and tuning wizard’s questions proceed as follows. Note that you may not necessarily see every question listed here depending on how you answered earlier questions. Usually the default selection is what you’ll want to select unless otherwise indicated below. The configuration values resulting from these questions are stored in environment variable files in the ./config
directory.
root
user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. The PUID
(process user ID) and PGID
(process group ID) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding user account on the host.logstash
container. The default is calculated based on the number of logical CPUs the system has. See Tuning and Profiling Logstash Performance, logstash.yml
and Multiple Pipelines.freq
. You probably want to answer Y to this question.json
and raw
; you probably want to choose json
.message
.miscbeat
.message
, to match the field name specified above._malcolm_beats
, which is used by Malcolm to recognize and parse metrics sent from Hedgehog Linux.none
: no file extractioninteresting
: extraction of files with mime types of common attack vectorsmapped
: extraction of files with recognized mime typesknown
: extraction of files for which any mime type can be determinedall
: extract all filesquarantined
: preserve only flagged files in ./zeek-logs/extract_files/quarantine
all
: preserve flagged files in ./zeek-logs/extract_files/quarantine
and all other extracted files in ./zeek-logs/extract_files/preserved
none
: preserve no extracted fileshttps://<Malcolm host or IP address>/extracted-files/
. Beware that Zeek-extracted files may contain malware.openssl enc
-compatible format (e.g., openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe
).tcpdump
and netsniff-ng
.enp0s25
or enp10s0,enp11s0
).not port 5044 and not port 5045 and not port 8005 and not port 9200
.ethtool
to disable NIC hardware offloading features and adjust ring buffer sizes for capture interface(s); this should be enabled if the interface(s) are being used for capture only, otherwise answer N. If you’re unsure, you should probably answer N.If you wish to change Malcolm’s hostname or configure system time synchronization, open a terminal (the icon immediately to the right of the Applications menu icon at the top of the Malcolm desktop) and run sudo configure-interfaces.py
then enter your password. If you get an error about your user not belonging to the sudo
group, run su -c configure-interfaces.py
and use the root
password instead.
Here you can configure Malcolm to keep its time synchronized with either an NTP server (using the NTP protocol), another Malcolm aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure.
If htpdate is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for another Malcolm instance, port 9200
may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server.
If ntpdate is selected, you will be prompted to enter the IP address or hostname of the NTP server.
Upon configuring time synchronization, a “Time synchronization configured successfully!” message will be displayed, after which you will be returned to the welcome screen. Select Cancel.
Once the configuration questions have been completed as described above, you can click the circular yellow Malcolm icon the panel at the top of the desktop to start Malcolm. As you have not yet configured authentication, you will be prompted to do so. This authentication setup can be run again later by running ./scripts/auth_setup
from the Malcolm installation directory.
The Configure Authentication dialog
As this is the first time setting up authentication, ensure the all option is selected and press OK.
You will be prompted to do the following:
More detailed instructions for configuring Hedgehog Linux can be found in that section of the documentation.
The Hedgehog Linux installation ISO follows the same process as the Malcolm installation above.
The installer will ask for a few pieces of information prior to installing Hedgehog Linux:
sensor
account under which the various sensor capture and forwarding services runAt the end of the installation process, you will be prompted with a few self-explanatory yes/no questions:
Following these prompts, the installer will reboot and Hedgehog Linux will boot into kiosk mode.
Kiosk mode can be exited by connecting an external USB keyboard and pressing Alt+F4, upon which the sensor user’s desktop is shown.
The Hedgehog Linux base operating system is a hardened Linux installation based on the current stable release of Debian running the XFCE desktop environment.
Display resolution should be detected and adjusted automatically. If you need to make changes to display properties, click the Applications menu and select Settings → Display.
The panel bordering the top of the Malcolm desktop is home to a number of useful shortcuts:
The Hedgehog Linux desktop
The first step of sensor configuration is to configure the network interfaces and sensor hostname. Clicking the Configure Interfaces and Hostname toolbar icon (or, if you are at a command line prompt, running configure-interfaces
) will prompt you for the root password you created during installation, after which the configuration welcome screen is shown. Select Continue to proceed.
You may next select whether to configure the network interfaces, hostname, or time synchronization.
Selecting Hostname, you will be presented with a summary of the current sensor identification information, after which you may specify a new sensor hostname. This name will be used to tag all events forwarded from this sensor in the events’ host.name field.
Returning to the configuration mode selection, choose Interface. You will be prompted if you would like help identifying network interfaces. If you select Yes, you will be prompted to select a network interface, after which that interface’s link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select No.
You will be presented with a list of interfaces to configure as the sensor management interface. This is the interface the sensor itself will use to communicate with the network in order to, for example, forward captured logs to an aggregate server. In order to do so, the management interface must be assigned an IP address. This is generally not the interface used for capturing data. Select the interface to which you wish to assign an IP address. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a -1
will be displayed instead of the interface speed.
Depending on the configuration of your network, you may now specify how the management interface will be assigned an IP address. In order to communicate with an event aggregator over the management interface, either static or dhcp must be selected.
If you select static, you will be prompted to enter the IP address, netmask, and gateway to assign to the management interface.
In either case, upon selecting OK the network interface will be brought down, configured, and brought back up, and the result of the operation will be displayed. You may choose Quit upon returning to the configuration tool’s welcome screen.
Returning to the configuration mode selection, choose Time Sync. Here you can configure the sensor to keep its time synchronized with either an NTP server (using the NTP protocol) or a local Malcolm aggregator or another HTTP/HTTPS server. On the next dialog, choose the time synchronization method you wish to configure.
If htpdate is selected, you will be prompted to enter the IP address or hostname and port of an HTTP/HTTPS server (for a Malcolm instance, port 9200
may be used) and the time synchronization check frequency in minutes. A test connection will be made to determine if the time can be retrieved from the server.
If ntpdate is selected, you will be prompted to enter the IP address or hostname of the NTP server.
Upon configuring time synchronization, a “Time synchronization configured successfully!” message will be displayed, after which you will be returned to the welcome screen. Select Cancel.
Clicking the Configure Capture and Forwarding toolbar icon (or, if you are at a command prompt, running configure-capture
) will launch the configuration tool for capture and forwarding. The root password is not required as it was for the interface and hostname configuration, as sensor services are run under the non-privileged sensor account. Select Continue to proceed. You may select from a list of configuration options.
Choose Configure Capture to configure parameters related to traffic capture and local analysis. You will be prompted if you would like help identifying network interfaces. If you select Yes, you will be prompted to select a network interface, after which that interface’s link LED will blink for 10 seconds to help you in its identification. This network interface identification aid will continue to prompt you to identify further network interfaces until you select No.
You will be presented with a list of network interfaces and prompted to select one or more capture interfaces. An interface used to capture traffic is generally a different interface than the one selected previously as the management interface, and each capture interface should be connected to a network tap or span port for traffic monitoring. Capture interfaces are usually not assigned an IP address as they are only used to passively “listen” to the traffic on the wire. The interfaces are listed by name and MAC address and the associated link speed is also displayed if it can be determined. For interfaces without a connected network cable, generally a -1
will be displayed instead of the interface speed.
Upon choosing the capture interfaces and selecting OK, you may optionally provide a capture filter. This filter will be used to limit what traffic the PCAP service (netsniff-ng or tcpdump) and the traffic analysis services (zeek
and suricata
) will see. Capture filters are specified using Berkeley Packet Filter (BPF) syntax. For example, to indicate that Hedgehog should ignore the ports it uses to communicate with Malcolm, you could specify not port 5044 and not port 5045 and not port 8005 and not port 9200
. Clicking OK will attempt to validate the capture filter, if specified, and will present a warning if the filter is invalid.
Next you must specify the paths where captured PCAP files and logs will be stored locally on the sensor. If the installation worked as expected, these paths should be prepopulated to reflect paths on the volumes formatted at install time for the purpose storing these artifacts. Usually these paths will exist on separate storage volumes. Enabling the PCAP and log pruning autostart services (see the section on autostart services below) will enable monitoring of these paths to ensure that their contents do not consume more than 90% of their respective volumes’ space. Choose OK to continue.
Hedgehog Linux can leverage Zeek’s knowledge of network protocols to automatically detect file transfers and extract those files from network traffic as Zeek sees them.
To specify which files should be extracted, specify the Zeek file carving mode:
If you’re not sure what to choose, either of mapped (except common plain text files) (if you want to carve and scan almost all files) or interesting (if you only want to carve and scan files with mime types of common attack vectors) is probably a good choice.
Next, specify which carved files to preserve (saved on the sensor under /capture/bro/capture/extract_files/quarantine
by default). In order to not consume all of the sensor’s available storage space, the oldest preserved files will be pruned along with the oldest Zeek logs as described below with AUTOSTART_PRUNE_ZEEK in the autostart services section.
You’ll be prompted to specify which engine(s) to use to analyze extracted files. Extracted files can be examined through any of three methods:
/opt/sensor/sensor_ctl/control_vars.conf
and specify your VirusTotal API key in VTOT_API2_KEY
Files which are flagged as potentially malicious will be logged as Zeek signatures.log
entries, and can be viewed in the Signatures dashboard in OpenSearch Dashboards when forwarded to Malcolm.
Finally, you will be presented with the list of configuration variables that will be used for capture, including the values which you have configured up to this point in this section. Upon choosing OK these values will be written back out to the sensor configuration file located at /opt/sensor/sensor_ctl/control_vars.conf
. It is not recommended that you edit this file manually. After confirming these values, you will be presented with a confirmation that these settings have been written to the configuration file, and you will be returned to the welcome screen.
Select Configure Forwarding to set up forwarding logs and statistics from the sensor to an aggregator server, such as Malcolm.
There are three forwarder services used on the sensor, each for forwarding a different type of log or sensor metric.
arkime-capture is not only used to capture PCAP files, but also the parse raw traffic into sessions and forward this session metadata to an OpenSearch database so that it can be viewed in Arkime viewer, whether standalone or as part of a Malcolm instance. If you’re using Hedgehog Linux with Malcolm, please read Correlating Zeek logs and Arkime sessions in the Malcolm documentation for more information.
First, select the OpenSearch connection transport protocol, either HTTPS or HTTP. If the metrics are being forwarded to Malcolm, select HTTPS to encrypt messages from the sensor to the aggregator using TLS v1.2 using ECDHE-RSA-AES128-GCM-SHA256. If HTTPS is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during Malcolm’s configuration), choose None.
Next, enter the OpenSearch host IP address (ie., the IP address of the aggregator) and port. These metrics are written to an OpenSearch database using a RESTful API, usually using port 9200. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator.
You will be asked to enter authentication credentials for the sensor’s connections to the aggregator’s OpenSearch API. After you’ve entered the username and the password, the sensor will attempt a test connection to OpenSearch using the connection information provided. If the Malcolm services have not yet been started, you may receive a Connection refused error. You may select Ignore Error for the credentials to be accepted anyway.
You will be shown a dialog for a list of IP addresses used to populate an access control list (ACL) for hosts allowed to connect back to the sensor for retrieving session payloads from its PCAP files for display in Arkime viewer. The list will be prepopulated with the IP address entered a few screens prior to this one.
Arkime supports compression for the PCAP files it creates. Select none
(at the cost of requiring more storage for PCAP files saved on the sensor) or zstd
(at the cost of higher CPU load when writing and reading PCAP files). If you choose zstd
, you’ll also be prompted for the compression level (something like 3
is probably a good choice).
Finally, you’ll be given the opportunity to review the all of the Arkime capture
options you’ve specified. Selecting OK will cause the parameters to be saved and you will be returned to the configuration tool’s welcome screen.
As described above in the Malcolm configuration under Setting up Authentication, in order for a Hedgehog Linux to securely communicate with Malcolm, it needs the client certificates generated when you answered Y to “(Re)generate self-signed certificates for a remote log forwarder” during that setup. Malcolm can facilitate the secure transfer of these to a sensor running Hedgehog.
Select ssl-client-receive on Hedgehog
Select ssl-client-receive from the Configuration Mode options on the Hedgehog, then press OK when prompted “Run auth_setup on Malcolm ‘Transfer self-signed client certificates…’.” Return to the Malcolm instance where auth_setup
is running (or re-run it if needed) and press OK. You’ll see a message with the title ssl-client-transmit that looks like this:
Run auth_setup and select ssl-client-transmit on Malcolm
Note Malcolm’s IP address (192.168.122.5
in the screenshot above) and the single-use code phrase (8736-janet-kilo-tonight
in the screenshot above) and enter them on the Hedgehog:
Enter Malcolm IP address and single-use code phrase on Hedgehog
After a few seconds (hopefully) a progress bar will update and show the files have been 100% transfered. They are automatically saved into the /opt/sensor/sensor_ctl/logstash-client-certificates
directory on the sensor.
Press OK on the Malcolm instance. If Malcolm’s auth_setup
process was being during Malcolm’s first run, Malcolm will continue to start up.
Filebeat is used to forward Zeek and Suricata logs to a remote Logstash instance for further enrichment prior to insertion into an OpenSearch database.
To configure filebeat, first provide the log path (the same path previously configured for log file generation).
You must also provide the IP address of the Logstash instance to which the logs are to be forwarded, and the port on which Logstash is listening. These logs are forwarded using the Beats protocol, generally over port 5044. Depending on your network configuration, you may need to open this port in your firewall to allow this connection from the sensor to the aggregator.
Next you are asked whether the connection used for log forwarding should be done unencrypted or over SSL. Unencrypted communication requires less processing overhead and is simpler to configure, but the contents of the logs may be visible to anyone who is able to intercept that traffic.
If SSL is chosen, you must choose whether to enable SSL certificate verification. If you are using a self-signed certificate (such as the one automatically created during Malcolm’s configuration, choose None.
The last step for SSL-encrypted log forwarding is to specify the SSL certificate authority, certificate, and key files. These files must match those used by the Logstash instance receiving the logs on the aggregator. The steps above under ssl-client-receive: Receive client SSL files for filebeat from Malcolm should have taken care of the transfer of these files between Malcolm and Hedgehog. Otherwise, manually copy (“sneakernet”) the files from the filebeat/certs/
subdirectory of the Malcolm installation to /opt/sensor/sensor_ctl/logstash-client-certificates
on Hedgehog.
Once you have specified all of the filebeat parameters, you will be presented with a summary of the settings related to the forwarding of these logs. Selecting OK will cause the parameters to be written to filebeat’s configuration keystore under /opt/sensor/sensor_ctl/logstash-client-certificates
and you will be returned to the configuration tool’s welcome screen. If the Malcolm services have not yet been started, you may receive a could not connect error. You may select Ignore Error for the settings to be accepted anyway.
The sensor uses Fluent Bit to gather miscellaneous system resource metrics (CPU, network I/O, disk I/O, memory utilization, temperature, etc.) and the Beats protocol to forward these metrics to a remote Logstash instance for further enrichment prior to insertion into an OpenSearch database. Metrics categories can be enabled/disabled as described in the autostart services section of this document.
This forwarder’s configuration is almost identical to that of filebeat in the previous section. Select miscbeat
from the forwarding configuration mode options and follow the same steps outlined above to set up this forwarder.
Once the forwarders have been configured, the final step is to Configure Autostart Services. Choose this option from the configuration mode menu after the welcome screen of the sensor configuration tool.
Despite configuring capture and/or forwarder services as described in previous sections, only services enabled in the autostart configuration will run when the sensor starts up. The available autostart processes are as follows (recommended services are in bold text):
Note that only one packet capture engine (capture, netsniff-ng, or tcpdump) can be used.
Once you have selected the autostart services, you will be prompted to confirm your selections. Doing so will cause these values to be written back out to the /opt/sensor/sensor_ctl/control_vars.conf
configuration file.
After you have completed configuring the sensor it is recommended that you reboot Hedgehog to ensure all new settings take effect. If rebooting is not an option, you may click the Restart Sensor Services menu icon in the top menu bar, or open a terminal and run:
/opt/sensor/sensor_ctl/shutdown && sleep 10 && /opt/sensor/sensor_ctl/supervisor.sh
This will cause the sensor services controller to stop, wait a few seconds, and restart. You can check the status of the sensor’s processes by choosing Sensor Status from the sensor’s kiosk mode, clicking the Sensor Service Status toolbar icon, or running /opt/sensor/sensor_ctl/status
from the command line:
$ /opt/sensor/sensor_ctl/status
arkime:arkime-capture RUNNING pid 6455, uptime 0:03:17
arkime:arkime-viewer RUNNING pid 6456, uptime 0:03:17
beats:filebeat RUNNING pid 6457, uptime 0:03:17
beats:miscbeat RUNNING pid 6458, uptime 0:03:17
clamav:clamav-service RUNNING pid 6459, uptime 0:03:17
clamav:clamav-updates RUNNING pid 6461, uptime 0:03:17
fluentbit-auditlog RUNNING pid 6463, uptime 0:03:17
fluentbit-kmsg STOPPED Not started
fluentbit-metrics:cpu RUNNING pid 6466, uptime 0:03:17
fluentbit-metrics:df RUNNING pid 6471, uptime 0:03:17
fluentbit-metrics:disk RUNNING pid 6468, uptime 0:03:17
fluentbit-metrics:mem RUNNING pid 6472, uptime 0:03:17
fluentbit-metrics:mem_p RUNNING pid 6473, uptime 0:03:17
fluentbit-metrics:netif RUNNING pid 6474, uptime 0:03:17
fluentbit-syslog RUNNING pid 6478, uptime 0:03:17
fluentbit-thermal RUNNING pid 6480, uptime 0:03:17
netsniff:netsniff-enp1s0 STOPPED Not started
prune:prune-pcap RUNNING pid 6484, uptime 0:03:17
prune:prune-zeek RUNNING pid 6486, uptime 0:03:17
supercronic RUNNING pid 6490, uptime 0:03:17
suricata RUNNING pid 6501, uptime 0:03:17
tcpdump:tcpdump-enp1s0 STOPPED Not started
zeek:capa RUNNING pid 6553, uptime 0:03:17
zeek:clamav RUNNING pid 6512, uptime 0:03:17
zeek:logger RUNNING pid 6554, uptime 0:03:17
zeek:virustotal STOPPED Not started
zeek:watcher RUNNING pid 6510, uptime 0:03:17
zeek:yara RUNNING pid 6548, uptime 0:03:17
zeek:zeekctl RUNNING pid 6502, uptime 0:03:17
The easiest way to verify that network traffic is being captured by the sensor and forwarded to Malcolm is through Malcolm’s Arkime Sessions interface.
If you are logged into the Malcolm desktop environment, click the Arkime icon (🦉) in the top panel. If you’re connecting from another browser, connect to https://<Malcolm host or IP address>
.
As Malcolm is using self-signed TLS certificates, you will likely have to confirm an exception in your browser to allow the self-signed certificates to proceed. Enter the credentials you specified when you configured authentication.
Arkime’s sessions view will be displayed. To view records from a specific Hedgehog Linux sensor, you can filter on the node
field. In the search bar, enter node == hedgehoghostname
(replacing hedgehoghostname
with the hostname you configured for Hedgehog). See the Search Queries in Arkime and OpenSearch cheat sheet for more search syntax hints.
Arkime’s sessions view with a filter on node
Arkime’s views button (indicated by the eyeball 👁 icon) allows overlaying additional previously-specified filters onto the current sessions filters. For convenience, Malcolm provides several Arkime preconfigured views including filtering on the event.provider
and event.dataset
fields. This can be combined with the node
filter described above to verify that different network log types (e.g., Arkime sessions, Zeek logs, Suricata alerts, etc.) are all being captured and forwarded correctly.