A powerful, easily deployable network traffic analysis tool suite
If you already have Docker and Docker Compose installed, the install.py
script can still help you tune system configuration and docker-compose.yml
parameters for Malcolm. To run it in “configuration only” mode, bypassing the steps to install Docker and Docker Compose, run it like this:
./scripts/install.py --configure
Although install.py
will attempt to automate many of the following configuration and tuning parameters, they are nonetheless listed in the following sections for reference:
docker-compose.yml
parametersEdit docker-compose.yml
and search for the OPENSEARCH_JAVA_OPTS
key. Edit the -Xms4g -Xmx4g
values, replacing 4g
with a number that is half of your total system memory, or just under 32 gigabytes, whichever is less. So, for example, if I had 64 gigabytes of memory I would edit those values to be -Xms31g -Xmx31g
. This indicates how much memory can be allocated to the OpenSearch heaps. For a pleasant experience, I would suggest not using a value under 10 gigabytes. Similar values can be modified for Logstash with LS_JAVA_OPTS
, where using 3 or 4 gigabytes is recommended.
Various other environment variables inside of docker-compose.yml
can be tweaked to control aspects of how Malcolm behaves, particularly with regards to processing PCAP files and Zeek logs. The environment variables of particular interest are located near the top of that file under Commonly tweaked configuration options, which include:
ARKIME_ANALYZE_PCAP_THREADS
– the number of threads available to Arkime for analyzing PCAP files (default 1
)AUTO_TAG
– if set to true
, Malcolm will automatically create Arkime sessions and Zeek logs with tags based on the filename, as described in Tagging (default true
)BEATS_SSL
– if set to true
, Logstash will use require encrypted communications for any external Beats-based forwarders from which it will accept logs (default true
)CONNECTION_SECONDS_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the duration threshold (in seconds) for assigning severity to long connections (default 3600
)DASHBOARDS_DARKMODE
– if set to true
, OpenSearch Dashboards will be set to dark mode upon initialization (default true
)EXTRACTED_FILE_CAPA_VERBOSE
– if set to true
, all Capa rule hits will be logged; otherwise (false
) only MITRE ATT&CK® technique classifications will be loggedEXTRACTED_FILE_ENABLE_CAPA
– if set to true
, Zeek-extracted files that are determined to be PE (portable executable) files will be scanned with CapaEXTRACTED_FILE_ENABLE_CLAMAV
– if set to true
, Zeek-extracted files will be scanned with ClamAVEXTRACTED_FILE_ENABLE_YARA
– if set to true
, Zeek-extracted files will be scanned with YaraEXTRACTED_FILE_HTTP_SERVER_ENABLE
– if set to true
, the directory containing Zeek-extracted files will be served over HTTP at ./extracted-files/
(e.g., https://localhost/extracted-files/ if you are connecting locally)EXTRACTED_FILE_HTTP_SERVER_ENCRYPT
– if set to true
, those Zeek-extracted files will be AES-256-CBC-encrypted in an openssl enc
-compatible format (e.g., openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe
)EXTRACTED_FILE_HTTP_SERVER_KEY
– specifies the AES-256-CBC decryption password for encrypted Zeek-extracted files; used in conjunction with EXTRACTED_FILE_HTTP_SERVER_ENCRYPT
EXTRACTED_FILE_IGNORE_EXISTING
– if set to true
, files extant in ./zeek-logs/extract_files/
directory will be ignored on startup rather than scannedEXTRACTED_FILE_PRESERVATION
– determines behavior for preservation of Zeek-extracted filesEXTRACTED_FILE_UPDATE_RULES
– if set to true
, file scanner engines (e.g., ClamAV, Capa, Yara) will periodically update their rule definitions (default false
)EXTRACTED_FILE_YARA_CUSTOM_ONLY
– if set to true
, Malcolm will bypass the default Yara rulesets (Neo23x0/signature-base and bartblaze/Yara-rules) and use only user-defined rules in ./yara/rules
FREQ_LOOKUP
- if set to true
, domain names (from DNS queries and SSL server names) will be assigned entropy scores as calculated by freq
(default false
)FREQ_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the entropy threshold for assigning severity to events with entropy scores calculated by freq
; a lower value will only assign severity scores to fewer domain names with higher entropy (e.g., 2.0
for NQZHTFHRMYMTVBQJE.COM
), while a higher value will assign severity scores to more domain names with lower entropy (e.g., 7.5
for naturallanguagedomain.example.org
) (default 2.0
)LOGSTASH_OUI_LOOKUP
– if set to true
, Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs (default true
)LOGSTASH_REVERSE_DNS
– if set to true
, Logstash will perform a reverse DNS lookup for all external source and destination IP address values when analyzing Zeek logs (default false
)LOGSTASH_SEVERITY_SCORING
- if set to true
, Logstash will perform severity scoring when analyzing Zeek logs (default true
)LOGSTASH_NETWORK_MAP_ENRICHMENT
- if set to true
, Logstash will enrich network traffic metadata directly from net-map.json
(should be the opposite of LOGSTASH_NETBOX_ENRICHMENT
)LOGSTASH_NETBOX_ENRICHMENT
- if set to true
, Logstash will enrich network traffic metadata via NetBox API calls (should be the opposite of LOGSTASH_NETWORK_MAP_ENRICHMENT
)MANAGE_PCAP_FILES
– if set to true
, all PCAP files imported into Malcolm will be marked as available for deletion by Arkime if available storage space becomes too low (default false
)MAXMIND_GEOIP_DB_LICENSE_KEY
- Malcolm uses MaxMind’s free GeoLite2 databases for GeoIP lookups. As of December 30, 2019, these databases are no longer available for download via a public URL. Instead, they must be downloaded using a MaxMind license key (available without charge from MaxMind). The license key can be specified here for GeoIP database downloads during build- and run-time.OPENSEARCH_LOCAL
- if set to true
, Malcolm will use its own internal OpenSearch instance (default true
)OPENSEARCH_URL
- when using Malcolm’s internal OpenSearch instance (i.e., OPENSEARCH_LOCAL
is true
) this should be http://opensearch:9200
, otherwise this value specifies the primary remote instance URL in the format protocol://host:port
(default http://opensearch:9200
)OPENSEARCH_SSL_CERTIFICATE_VERIFICATION
- if set to true
, connections to the primary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default false
)OPENSEARCH_SECONDARY
- if set to true
, Malcolm will forward logs to a secondary remote OpenSearch instance in addition to the primary (local or remote) OpenSearch instance (default false
)OPENSEARCH_SECONDARY_URL
- when forwarding to a secondary remote OpenSearch instance (i.e., OPENSEARCH_SECONDARY
is true
) this value specifies the secondary remote instance URL in the format protocol://host:port
OPENSEARCH_SECONDARY_SSL_CERTIFICATE_VERIFICATION
- if set to true
, connections to the secondary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (default false
)NETBOX_DISABLED
- if set to true
, Malcolm will not start and manage a NetBox instance (default true
)NGINX_BASIC_AUTH
- if set to true
, use TLS-encrypted HTTP basic authentication (default); if set to false
, use Lightweight Directory Access Protocol (LDAP) authenticationNGINX_LOG_ACCESS_AND_ERRORS
- if set to true
, all access to Malcolm via its web interfaces will be logged to OpenSearch (default false
)NGINX_SSL
- if set to true
, require HTTPS connections to Malcolm’s nginx-proxy
container (default); if set to false
, use unencrypted HTTP connections (using unsecured HTTP connections is NOT recommended unless you are running Malcolm behind another reverse proxy like Traefik, Caddy, etc.)PCAP_ENABLE_NETSNIFF
– if set to true
, Malcolm will capture network traffic on the local network interface(s) indicated in PCAP_IFACE
using netsniff-ngPCAP_ENABLE_TCPDUMP
– if set to true
, Malcolm will capture network traffic on the local network interface(s) indicated in PCAP_IFACE
using tcpdump; there is no reason to enable both PCAP_ENABLE_NETSNIFF
and PCAP_ENABLE_TCPDUMP
PCAP_FILTER
– specifies a tcpdump-style filter expression for local packet capture; leave blank to capture all trafficPCAP_IFACE
– used to specify the network interface(s) for local packet capture if PCAP_ENABLE_NETSNIFF
, PCAP_ENABLE_TCPDUMP
, ZEEK_LIVE_CAPTURE
or SURICATA_LIVE_CAPTURE
are enabled; for multiple interfaces, separate the interface names with a comma (e.g., 'enp0s25'
or 'enp10s0,enp11s0'
)PCAP_IFACE_TWEAK
- if set to true
, Malcolm will use ethtool
to disable NIC hardware offloading features and adjust ring buffer sizes for capture interface(s); this should be true
if the interface(s) are being used for capture only, false
if they are being used for management/communicationPCAP_ROTATE_MEGABYTES
– used to specify how large a locally-captured PCAP file can become (in megabytes) before it is closed for processing and a new PCAP file createdPCAP_ROTATE_MINUTES
– used to specify a time interval (in minutes) after which a locally-captured PCAP file will be closed for processing and a new PCAP file createdpipeline.workers
, pipeline.batch.size
and pipeline.batch.delay
- these settings are used to tune the performance and resource utilization of the the logstash
container; see Tuning and Profiling Logstash Performance, logstash.yml
and Multiple PipelinesPUID
and PGID
- Docker runs all of its containers as the privileged root
user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. The PUID
(process user ID) and PGID
(process group ID) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding user account on the host. Note that a few containers (including the logstash
and netbox
containers) may take a few extra minutes during startup if PUID
and PGID
are set to values other than the default 1000
. This is expected and should not affect operation after the initial startup.SENSITIVE_COUNTRY_CODES
- when severity scoring is enabled, this variable defines a comma-separated list of sensitive countries (using ISO 3166-1 alpha-2 codes) (default 'AM,AZ,BY,CN,CU,DZ,GE,HK,IL,IN,IQ,IR,KG,KP,KZ,LY,MD,MO,PK,RU,SD,SS,SY,TJ,TM,TW,UA,UZ'
, taken from the U.S. Department of Energy Sensitive Country List)SURICATA_AUTO_ANALYZE_PCAP_FILES
– if set to true
, all PCAP files imported into Malcolm will automatically be analyzed by Suricata, and the resulting logs will also be imported (default false
)SURICATA_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Suricata logs (default 1
)SURICATA_CUSTOM_RULES_ONLY
– if set to true
, Malcolm will bypass the default Suricata ruleset and use only user-defined rules (./suricata/rules/*.rules
).SURICATA_UPDATE_RULES
– if set to true
, Suricata signatures will periodically be updated (default false
)SURICATA_LIVE_CAPTURE
- if set to true
, Suricata will monitor live traffic on the local interface(s) defined by PCAP_FILTER
SURICATA_ROTATED_PCAP
- if set to true
, Suricata can analyze captured PCAP files captured by netsniff-ng
or tcpdump
(see PCAP_ENABLE_NETSNIFF
and PCAP_ENABLE_TCPDUMP
, as well as SURICATA_AUTO_ANALYZE_PCAP_FILES
); if SURICATA_LIVE_CAPTURE
is true
, this should be false, otherwise Suricata will see duplicate trafficSURICATA_…
- the suricata
container entrypoint script can use many more environment variables to tweak suricata.yaml; in that script, DEFAULT_VARS
defines those variables (albeit without the SURICATA_
prefix you must add to each for use)TOTAL_MEGABYTES_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the size threshold (in megabytes) for assigning severity to large connections or file transfers (default 1000
)VTOT_API2_KEY
– used to specify a VirusTotal Public API v.20 key, which, if specified, will be used to submit hashes of Zeek-extracted files to VirusTotalZEEK_AUTO_ANALYZE_PCAP_FILES
– if set to true
, all PCAP files imported into Malcolm will automatically be analyzed by Zeek, and the resulting logs will also be imported (default false
)ZEEK_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Zeek logs (default 1
)ZEEK_DISABLE_…
- if set to any non-blank value, each of these variables can be used to disable a certain Zeek function when it analyzes PCAP files (for example, setting ZEEK_DISABLE_LOG_PASSWORDS
to true
to disable logging of cleartext passwords)ZEEK_DISABLE_BEST_GUESS_ICS
- see “Best Guess” Fingerprinting for ICS ProtocolsZEEK_EXTRACTOR_MODE
– determines the file extraction behavior for file transfers detected by Zeek; see Automatic file extraction and scanning for more detailsZEEK_INTEL_FEED_SINCE
- when querying a TAXII or MISP feed, only process threat indicators that have been created or modified since the time represented by this value; it may be either a fixed date/time (01/01/2021
) or relative interval (30 days ago
)ZEEK_INTEL_ITEM_EXPIRATION
- specifies the value for Zeek’s Intel::item_expiration
timeout as used by the Zeek Intelligence Framework (default -1min
, which disables item expiration)ZEEK_INTEL_REFRESH_CRON_EXPRESSION
- specifies a cron expression indicating the refresh interval for generating the Zeek Intelligence Framework files (defaults to empty, which disables automatic refresh)ZEEK_LIVE_CAPTURE
- if set to true
, Zeek will monitor live traffic on the local interface(s) defined by PCAP_FILTER
ZEEK_ROTATED_PCAP
- if set to true
, Zeek can analyze captured PCAP files captured by netsniff-ng
or tcpdump
(see PCAP_ENABLE_NETSNIFF
and PCAP_ENABLE_TCPDUMP
, as well as ZEEK_AUTO_ANALYZE_PCAP_FILES
); if ZEEK_LIVE_CAPTURE
is true
, this should be false, otherwise Zeek will see duplicate traffic