Network management tool is developed for making networks run smoothly. It can monitor conditions which may incur error or irregular actively and may provide suggestions to solve the problems. The traffic monitor must retrieve the network traffic continuously. It must decide whether each packet should be sent to upper layer process by user setting. Therefore, the packet filter is the key to the performance of a network monitoring tool.
In this paper, we discuss the characteristics, merits and demerits of the existing packet filter mechanisms in literatures at first. Then, we analyze these mechanisms and utilize useful rules to implement the packet filter mechanism with a packet driver on the NDIS driver. We also measure and discuss the difference between with and without performance improvement. At last, we discuss some possible improvements of the packet filter mechanism in the future.
A significant portion of network traffic monitor is the packet filter. We show the basic diagram of packet filter in Figure 1. The packet filter mechanism is used to demultiplex received packets, and it will affect the performance of traffic monitor significantly. In other words, improving performance of packet filter means decreasing demerits of software-based traffic monitor. Packet filter can be implemented in kernel-resident, user-level or hybrid of both.
Kernel-resident packet filter is in the kernel the same as device driver and protocol stack. Opposite to the kernel-resident, user-level packet filter crosses the boundary between user-level and kernel-level and user-level process needs more context switch than kernel-level process. The more details of difference among them can be found in literatures [3, 4].
The demerits of kernel-resident packet filter are as follows: hard to develop, debug, maintain, crash-avoidance and availability of source code. Opposite to the poor flexibility of kernel-resident packet filter, user-level packet filter need more system resource processes and may incur performance degradation.
In this paper, we will observe and discuss the existing packet filter mechanisms, and find out what portions can be used in an efficient packet filter in our developed environment.
The remainder of this paper is organized as follows. Section 2 discusses the characteristics, merits and demerits of exiting packet filter mechanisms. Section 3 describes the implementation. Section 4 describes the performance evaluation. Section 5 describes the conclusion and future consideration.
Figure 1 : Basic diagram of packet filter
As far as we know, the packet filter appears in 1976, in Xerox's PARC . It is also the first user-level packet filter/demultiplexer. The first Unix-based packet filter was implemented in 1980. Up to now, we have found several existing packet filters/demultiplexers in literature in past twenty years. Some are implemented in hardware. Most of them are associated with user-level and kernel-resident. Some are faster than previous existing ones in some particular conditions.
We describe six major existing packet filter schemes: CSPF , BPF , MPF , PathFinder , GBF , and DPF .
Other suggestions which are suitable for our environments are as follows:
We use the Intel Pentium based platform and Microsoft Windows95-based operation system in the CSMA/CD (10Mb/s Ethernet). We hope that a compact network monitoring tool can be used in the personal computer, which is the most popular platform today.
Our packet filter mechanism needs a network packet driver supporting NDIS (Network driver interface specification)  to turn device driver into the promiscuous mode in order to get all the network traffic passed through the network adapter card. We use the improved version of packet driver called 'vpacket driver'  that is modified by Christopher Chlap. We will describe the details of these above in the next section.
Under normal condition, the network traffic passed through the network adapter card is not only for it but also for all the nodes in the same network segment. The network interface device driver in the operation system rejects the traffic, which is not for it. Inversely, a network monitor needs to get all the network traffic no matter whether is destined for it. Therefore, it needs to enter the promiscuous mode to get all the network traffic by a packet driver. A packet driver which supports NDIS layered above the network interface device driver. NDIS, the Microsoft's standard for network driver design, encapsulates the network driver so it is protocol-independent . Protocol-independence is helpful for our developing. With the characteristic of protocol-independence, we can use another device driver above the NDIS.VxD to set the promiscuous mode.
We use the 'vpacket driver' , which is modified from Microsoft's packet driver example by Christopher Chlap. The 'vpacket driver', a virtual device driver, uses dynamic loading technology and can perform dynamic binding. We can use the Win32 SDK function 'CreateFile' to load the VxD and 'CloseHandle' function to unload it. In addition, other operations (like dynamic binding, setting object id (OID) to enter promiscuous mode, and receiving packets can be performed by the 'DeviceIoControl' function. The 'GetOverlappedResult' function can disable the blocking condition. In other words, we can use this function to perform the asynchronous I/O and multi-thread reading and writing. Basically, we can use these four functions to perform the action of sending all the network packets to the packet filter.
CapForm is a child window and can be created by main window. It is used to handle the whole monitoring and filtering operations. It includes loading the vpacket driver, unloading the vpacket driver, binding to NDIS driver, and most operations, which communicate with 'vpacket.vxd'. FilterForm is a dialog window of setting filter. It handles all the setting filter operations, including collecting the data user setting and converting to the type of efficient used by filter. CapThread object is a thread object and is created by the CapForm. It converts the data from setting filter-FilterForm-to efficiently be handled by filter before receiving process started. This step trades the resource and time in real-time process consumption in filtering operation to the filter-setting operations.
In the FilterForm, we convert incontinuously patterns into pointer to character array after user inputs the filter data and values. The characteristic (efficiency passing, addressing of needed offset and length) of pointer to string will help us process for filter more efficiently.
In the CapThread object, we convert the setting filter data to best fit the real time receiving packets. Although the steps of conversion are performed after thread object created, we can suspend the thread operation before we finish the conversion. After optimized conversion, we continue to the thread execution. The steps of conversion are reading once by variable pointers to varisized type, converting filter setting to mask for bit operator, and low/high byte as well as 10/16 base digit conversions.
The first conversion step is reading once by variable pointers to varisized type. When we receive packet by 'DeviceIoControl' function, it passes one byte by one byte. However, sometimes our filter needs to compare with two bytes, like TCP port. Sometimes, it needs to compare with four bytes just reading ones. Therefore, we need this step by casting to help atoms coalescing. We assign three different length of type to outBuffer in order to fit different conditions in filtering once.
The second conversion step converts filter setting to masking for bit operator. A filter may be represented a range but a value (e.g. source IP address is 140.129.*.* instead of 188.8.131.52.). Accordingly, we use 0 to represent any value (* in the above example) in FilterForm. Then, we convert it into masking value (e.g. 140.129.*.* convert to 0xffff0000) and prepare for the filter comparisons. The second casting is for atoms coalescing.
The third conversion step is the low/high byte and 10/16 base digits conversions. In the Intel x86 personal computer based machine, if it read two bytes once, it read high byte after low byte. This is different from reading with one byte. It means that some of them must be converted. For lower the negative of conversion, we choose to convert the filter patterns. For example, 0x0800 is IP type in the type field of ethernet header, and we convert to 0x0008 for filter comparison (Frame format can be referenced from many web sites, , , RFC documents, RFC 792 for IP, RFC 816 for TCP, and so on,).
In the filter comparison of CapThread object, the improving approaches are atoms coalescing, comparison using exclusive OR and NOT bit operators, and masking using AND bit operators. Atoms coalescing appeared in the DPF . We transfer the conception to our platform. For example, we use it in comparing of IP source and destination addresses. We coalesce the pair of address to 64 bits and plus the bits masking to satisfy all filtering conditions. The second approach uses exclusive OR and NOT bit operators for comparison. Using bit operators can act like assembly language to efficient comparison in real time process. The code filters TCP packet by comparing IP header using '!' and '^' bit operator. The 23 means offset from ethernet packet.The results of the running program are illustrated as Figure 3 and 4.
About atoms coalescing, no matter it is in DPF or in this program, can only be used in adjacent atoms, like IP source and destination addresses, as well as TCP source and destination ports. If the filter patterns are far from each other, we must perform twice comparisons. If we copied the two patterns into another place for enforcing atoms coalescing. We will lose the merit of 'in place'. It means that we need additional process to perform the copying operation.
Variable length header handling is introduced in one more packet filter mechanisms in literatures. It is often used in IP and TCP headers, because their headers have inessential field-'option'. It can be easily implemented in our program by adding offset variable and assigning the value of TCP header length. Because the 'option' in IP and TCP header is seldom used in packets, we remove this step for gaining more performance.
The network monitor and packet filter should be independent to each other, like NDIS independent to protocols. It means that packet filter should not be implemented for specified network monitor. However, integrating them may acquire more optimization.
In this section, we discuss the performance evaluation of packet filter we implemented. We must provide an identical environment for fair evaluation. However, most of the packet filters in the literatures were implemented in the UNIX-based operation system and on the platform of workstation. This is very different from our Windows 95 in Intel Pentium-based PC. According to the characteristic of independence between packet filter and platform, we should port the existing packet filter to Windows 95 on PC. However, for the following two reasons, we shift the attention to the optimization of characteristics.
The first reason is that some packet filter were implemented in special environments. For example, DPF was implemented with vcode  and restricted in some aspects especially in platforms (vcode only works in mips and alpha). GBF was implemented using LR parsing. MPF was implemented in Mach 3.0. The Mach 3.0 operation system can dynamically reorder the often-used packet filters to optimize the performance.
The second reason is that some performance improvement mainly depends on improving of operation system and platform. For example, BPF is 1.5 to 20 times faster than CSPF on the same hardware . However, BPF is implemented in the hardware, but CSPF is ported to the hardware. Therefore, the major improvement of BPF is due to the register-based, the newer hardware and its large address spaces. CSPF is restricted in the incoordination between stack-based filter and register-based hardware. Another example is the PathFinder, which is pattern-based filter. The characteristic of a pattern-based filter, which is prepared for hardware implementation, is very different from other packet filters except GBF.
Therefore, we divert the attention to the difference of the optimized and non-optimized characteristics. Our platform is Windows 95 B, running on Intel Pentium 100 processor, 48 mega-bytes main memory, 512 kilo-bytes level 2 cache, D-Link DE-220ECT ISA-bus ethernet adapter and UTP line to the hub. We use the Win32 function of the high resolution performance counter-QueryPerformanceCounter()-to measure the process time. The accuracy of this counter depends on the processor, and the sequential number of the counter can be got by the Win32 function-QueryPerformanceFrequency()-in our testbed. The frequency unit of measuring function is
It means that one counting is about 0.84£gs, which also expresses the accuracy of this function.
At first, we observe the relationship between packet length and filtering time. We found that the filtering time is independent to packet length and the average filtering time is about 0.105ms. Then, we observe the relationship between the after-optimized case and the before-optimized case. Assuming that the mechanism is set to six filters. No matter each receiving packet is accepted or rejected, the range of comparison times from one to six. Therefore, we divide the measurement into two parts: the accepted general case and the rejected worse case. In the accepted general case, we measure the accepted packets no matter the comparison times are. In the rejected worse case, we measure only the rejected packets, and the comparison times of these packets, which equal to the numbers of the setting filters.
In Figure 4, we compare the relationship between the number of filters and processing time in the before- and after-optimized conditions for accepted general case. We can found that after-optimized is about 2.5£gs faster than before-optimized in average. The before-optimized case still has the merits of atom coalescing in 32 bytes and bytes-reordered (without problems of low/high bytes conversion). For actual processing time, the after-optimized is 20 percentages faster than before-optimized in average.
Figure 5 shows the relationship between the number of filters and processing time in the before- and after-optimized conditions for rejected worse case. We found that the after-optimized case is about 1.05£gs faster than the before-optimized case in average. In actual processing time, the after-optimized case is 15 percentages faster than the before-optimized case in average.
In this paper, our focus is on the packet filter mechanism of the network monitoring tool for a popular platform. At first, we try to find out the characteristics, merits and demerits of the packet filter in related literatures. We find out the approach, method or knack that can help us to improve the performance of our specific packet filter. We implement the packet filter of the network monitoring tool by a packet driver for NDIS driver. We also measure the difference between the before- and after-improving by these methods, and show the relationship between packet length and filtering time.
We can find the different denotation between the filter mechanisms in literatures- filter, demultiplexer, classifier and decompositer. They seem similar in some characteristics but some parts of them are different. Maybe we can say that the fields they can be applied to is a bit different (for network monitor, transport protocol server, or events filtering ). Because of some characteristics are similar. A specific interface, or pseudo-machine, maybe can be implemented with both the merits of optimized for the filtering characteristic and portability at the same time. However, for two incomplete successful examples, GBF and DPF. GBF uses LR parsing to gain the generalization. Although it optimizes for performance improving, it still only better than BPF. DPF implements on vode successfully. However, it is restricted by vcode. Vcode can only work on mips and alpha, no fragmentation support and few number of operators. Therefore, there is no good interface or pseudo-machine for event or packet filtering up to now. Maybe developing this interface or pseudo-machine for the filtering is the final optimized way.
In the future, packet filter mechanism should be improved to fit higher performance network architecture and more rigorous requirements.
 W. Richard Stevens. "TCP/IP illustrated, Volume 1," Addison-wesley, page27, 40, 178, and 286.
 D. C. Schmidt, "Scalable High-Performance Event Filtering for Dynamic Multi-point Applications," International Workshop on High Performance Protocol Architectures, held in Sophia Antipolis, France, December 15-16, 1994.
 J.C. Mogul, R.F. Rashid, and M.J. Accetta. "The packet filter: An efficient mechanism for user-level network code," Proceedings of the Eleventh ACM Symposium on Operating Systems Principles, pages 39--51, November 1987.
 M. Jayaram, R. K. Cytron, D. C. Schimidt, and G. Varghese, "Efficient Demultiplexing of Network Packets by Automatic Parsing," ACM SIGPLAN'95 Conference on Programming Language Design and Implementation, ACM, 1994.
 D. Boggs and E. Taft, Private communication, 1987.
 S. McCanne and V. Jacobson. "The BSD packet filter: A new architecture for user-level packet capture," USENIX Technical Conference Proceedings, page 259-269, San Diego, CA, Winter 1993. USENIX.
 M. Yahara, B. Bershad, C. Maeda, and E. Moss. "Efficient packet demultiplexing for multiple endpoints and large messages," Proceedings of the Winter 1994 USENIX Conference, 1994.
 M. L. Bailey, B. Gopal, M. A. Pagels, L. L. Peterson, and P. Sarkar. "PathFinder: A pattern-based packet classifier," Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 115-123, November 1994.
 D. Engler and M. F. Kaashoek. "DPF: Fast, flexible message demultiplexing using dynamic code generation," Proceeding of SIGCOMM '96 Symp., pages 53--59, Aug. 1996.
 Braden R. T. "A pseudo-machine for packet monitoring and statistics," Proceedings of SIGCOMM '88, ACM.
 C. Maeda and B. Bershan, "Protocol service decomposition for high-performance networking," Proceedings of the 14th Symposium on Operating System Principles, (Asheville, North Carolina), ACM, December 1993.
 C. A. Thekkath, H. M. Levy. "Limits to low-latency communication on high-speed networks," ACM Transactions on Computer Systems, 11(2):179-203, May 1993.
 N. C. Hutchinson and L. L. Peterson. "The x-kernel: an architecture for implementing network protocols," IEEE Transactions on Software engineering, 17(1), January 1991.
 D. R. Engler. "VCODE: a retargetable, extensible, very dynamic code generation system," Proceedings of the SIGPLAN '96 Conference on Programming Language Design and Implementation, Philadelphia, PA, May 1996. http://www.pdos.lcs.mit.edu/~engler/vcode.html.
 ftp://ftp.canberra.edu.au/pub/ise/vpacket/ *.
 Microsoft. "Microsoft Developer Network Library Edition - July 1997".