Documentation
Getting the Source
In order to check out the code for the current version of the standalone program, you will want to
grab an SVN client for your system (using the command-line version in Linux/Unix/Mac OS X is good
enough, while TortoiseSVN is often used by people under
Windows who do not wish to utilize a command-line client). If you are using the command-line client
then issue the command:
svn co https://scrub-netflows.svn.sourceforge.net/svnroot/scrub-netflows scrub-netflows
This will pull the latest version of the source from the repository. Alternatively, if you are using
TortoiseSVN, then you can just check out a new repository into a directory from the same URL as above.
Using the tool
To invoke the tool, one simply has to issue the following command:
scrub-netflows <input> <output> [options]
The following two options are required:
-r filename | file to be read, pcap format |
-w filename | file to be output, pcap format |
|
The following are options: |
-i device | pcap-style device name from which to capture packets |
-o "anonymization sting" | string of anonymization options as explained in the next paragraph |
-k permutation_key | optional permutation key, only necessary if using keyed randomization |
The source file needs to be a file in a standard NetFlows format, and the output will be in the same NetFlows format.
WARNING: The tool will silently over-write any existing file with the
same name as the output file that you specify.
The anonymization options string is composed
of a number of options, in any order, selected from the tables below. Its format follows the pattern of
<field function> pairs, where the field is one of the designators from table 2
below and the function used to anonymize that field is from table 3 below. They must
be paired as per the accepted pairings in Table 1. Errors in pairings will be accepted
by the system with undefined results. An explanation of what each function means in the context of each field
follows table 3. There will always be an even number
of options in this string, the first indicating the field to be anonymized and the second the function to
use in the anonymization process. For example, to anonymize the source IP address with black marker and
all the ports with bilateral classification the anonymization options string should look like:
"srcip bm tcpsrcport bi tcpdstport bi udpsrcport bi udpdstport bi"
and so on. Each of the field designations must be taken from the left-hand column of Table 2
below. The function names must be taken from Table 3 below. It is up to the
user to ensure that the combination of field and function makes sense (i.e. that you do not attempt to use
bilateral classification on an IP address or prefix-preserving pseudonymization on a timestamp, etc) - pairings
that the system supports (although errors will be silently accepted by the system with undefined and probably
nefarious effects) are listed in Table 1.
Below you will find a table of all of the fields that scrub-netflows anonymizes and the methods that can
be used to anonymize that field. After that will be a description of how to specify those options to
the tool on the command line.
Tabel 1: Fields and Corresponding SCRUB-NetFlows Anonymization Options |
Field | Anonymizing Options |
IP address — source and destination |
|
Time Stamp — beginning and ending timestamp |
|
Milliseconds — special millisecond field for some netflows |
|
Host ID |
|
Number of bytes — sent and received |
|
Number of packets — sent and received |
|
Duration — for formats without an endtime |
|
Protocol |
|
The following table lists the fields and their field designations for use in the anonymization string
for use with scrub-netflows:
Table 2: Packet Field Entities and Corresponding SCRUB-NetFlows Parameter Strings |
Packet Field Entities | SCRUB-NetFlows Parameter String Specifier |
Source port |
srcport |
destination port |
dstport |
Source IP address |
srcip |
Desination IP address |
dstip |
Host ID |
AssignedHostID |
Start timestamp |
StartTime |
Last timestamp |
LastTimeStamp |
Duration |
Duration |
Protocol |
Protocol |
Total bytes sent |
BytesSent |
Total bytes received |
BytesRec |
Total packets sent |
PktSent |
Total packets received |
PktRec |
The following contains the names of the anonymization methods and their specifications for the anonymization string. There is
also a brief description of what that anonymization method does.
Table 3: SCRUB-NetFlows Anonymization Options with Parameters and Descriptions |
Anonymization Options | Parameter String Specifier | Description |
Bilateral classification |
bi |
This is used on TCP/UDP source fields and will classify all low ports to port 1 and all high ports to port
65535. This is provided so that some information about whether the packet was bound for a low port (below 1024,
and therefore one of the 'well-known' ports) or a high port (above 1024, and thus bound for an 'unspecified' port)
was indicated by the original input. |
Black marker |
bm |
This sets the specified field to all 0's (or 255 in some cases) and effectively results in a complete loss of the
data that is in that field. |
Enumeration |
en |
This will annihilate any sense of precision about the spacing of the values in the field, but will maintain the
original order of items, separating all successive packets to the next available value for that field. For example,
when applied to the time-stamp fields, it will order the packets in its output according to their original timestamp,
but they will start counting from 0, with the second packet as having been sent at 1 second, the next packet at
2 seconds, etc. |
Grouping |
grN |
Grouping takes all possible values for a field, partitions them into a number of mutually exclusive, exhaustive
partitions, and then selects a representative member of each partition to represent all occurrances of the members
of that field. For instance, if a field can take all values from 1-100, a possible partitioning scheme would be:
[1-5], [6-15], [16-50], [51-100] with representative members {1, 6, 16, 51}. Then, all values in the specified field
which are in the range [1-5] would be replaced by {1} while all numbers in [6-15] are replaced by {6} and so on.
|
Keyed randomization / Random Permutation |
rp |
This will take a user-specified key and use that as the basis for a randomized permutation of the field. The randomization
is reproduceable if the same key is specified a again. |
Prefix-preserving pseudonmymization |
prefixp |
This method will separately anonymize the subnet mask and the host portion of an IP address, using a modified version of the CryptoPAN
algoirthm, thus ensuring that all hosts in the same subnet in the source file are also mapped to the same subnet
in the resulting output, but it will, obviously, be a different subnet than the original input. |
Pure randomization |
rand |
This will map all input to a random value on output - however, all mappings of a given value X, which is randomly
mapped to Y at its first encounter, will likewise be mapped to Y for the remainder of the file. This is identical
to Random permutatin/keyed randomization above, but it is not reproduceable by the use of a key. |
Random timeshift |
rshiftN,M |
This will randomly assign the specified values of the field to a uniformly-distributed random value over the window
[N:M] |
Regex catchall |
re |
This will eliminate all hostnames and URLs from the payload of any TCP and UDP packets. |
Truncation |
trN |
The N in trN is actually a number, and it specifies the number of characters to be truncated (by being set to 0) from
all elements in this field. |
Unit annihilation |
uaU1,U2,...,UN |
Annihilate specific portion of field, starting from right side as U1 then incrementing to thefar left side with UN. Multiportions can be done with choices separated by a comma. |
|