Privacy has always been a serious issue in network traffic analysis, and the gravity of the situation only increases at NAPs. Regardless of whether a NAP is a layer 2 entity or a broadcast medium, Internet traffic is a private matter among Internet clients. Most NAP providers have agreements with their customers not to reveal information about individual customer traffic. Collecting and using more than basic aggregate traffic counts will require precise agreements with customers regarding what may be collected and how it will be used. Behaving out of line with customers' expectations or ethical standards, even for the most noble of research intentions, does not bode well for the longevity of a service provider.
An extreme position is to not
look at network header data (which incidentally is very different from user data, which we do not propose examining) at all because it violates privacy. Analogous to unplugging one's machine from the Ethernet in order to make it secure, this approach is effective in the short term, but has some undesirable side effects. We need to find ways rather to minimize exposure rather than surrendering the ability to understand network behavior. It seems that no one has determined an `optimal' operating point in terms of what to collect, along any of the dimensions we discuss. Indeed, the optimal choice often depends on the service provider, and changes with time and new technologies.
We acknowledge the difficulty for the NAPs, as well as any Internet service provider, to deal with statistics collection at an already very turbulent period of Internet evolution. However, it is just such a time, marked ironically with the cessation of the NSFNET statistics, that a baseline architectural model for statistics collection would be most critical, so that customers can trust the performance and integrity of the services they procure from their network service providers, and yet so service providers do not tie their hands behind their backs in terms of being able to preserve robustness, or forfend demise, of their own clouds.
cost-benefit tradeoff: accurately characterize the workload on the network so NAPs and NSPs can optimize (read, maintain) their networks, but at least cost to these privacy ethics we hold so dear.