Web site
monitoring and management perspectives:
A readiness-evaluation methodology
Joseph Gulla: gulla@us.ibm.com
IBM Global Services Web Hosting Delivery
Research Triangle Park, North Carolina, USA
John Hankins: hankinjo@us.ibm.com
IBM Global Services Web Offering Development
Schaumburg, Illinois, USA
Abstract
In this paper, we define a framework that can be used to evaluate the
quality and completeness of the monitoring and management of a web site.
This approach is scalable and extensible as it can be used for a single
web site, a collection of sites, or an entire hosting center. This framework
can also address the support of the launch of a new site, an upgrade to
an existing site, or a one-time quality and completeness assessment. The
approach is supported by a methodology that is based on a series of "perspectives"
which incorporate a comprehensive view of tools, processes, organizational
structure, and staff skills. The perspectives discussed include system,
support, and end-user. The system perspective has as its focus the monitoring
of essential infrastructure, application, and business-system components.
The support perspective focuses on team processes and related tools. Key
processes include change, problem, performance, and security. The end-user
perspective is cente! red on measurement and improvement of the end user
experience. The framework, and its supporting perspectives, provides the
opportunity to take a comprehensive view of the management of a site.
In support of the framework, we have developed a systematic methodology
that uses a series of data tables to drive and support the analysis. These
tables are used to clearly identify and document the monitoring and management
components, processes, and tools that are the focus of the activity.
Contents
1 Introduction 3
2 Method 5
3 System
perspective 6
3.1 Creating
the infrastructure table 6
3.2 Creating the
application table 8
3.3 Creating the
business table 8
4
Creating the monitoring evaluation table 10
5 Support perspective
12
5.1 Change management
13
5.2 Problem management
17
5.3 Performance
management 22
5.4 Security management
26
6
End user perspective 29
7 Review report or presentation
to complete the review 31
8 Summary 32
About the authors 32
Footnotes 33
References 33
1
Introduction
In many enterprises, web sites have become a mission critical component
of the organization as more and more businesses have come to rely on the
Web for commerce as well as internal and external communication. In 1997,
the global impact of the Internet trade in services accounted for well
over $40 billion of U.S. exports (A Framework, 1997). By 2003, the U.S.
Commerce Department estimates that business to consumer e-commerce will
likely be in the range of $75 to $144 billion. Business to business e-commerce
could reach between $634 billion and $3.9 trillion. (Leadership for the
new millennium, 2001). Because of this, many IT organizations are putting
new focus on establishing and maintaining 7x24 site availability along
with high quality performance and response time. The consequences of Web
site failures can be costly. E*TRADE has felt the pain of costly failures.
From February 3, 1999 through March 3, 1999 E*TRADE experienced four outages
of at least five hours. The dir! ect cost of these failures is not known
but the company's stock price declined twenty two percent on February
5 -- just two days after the initial failure (Frick, 2000). Web management
tools are plentiful but alone they are not enough to do the job.
Many organizations use multiple, ad hoc methods, tools, and services in
the quest for five 9s reliability and sub 3 second response time. This
collection of resources makes determining the effectiveness of Web site
management tools is a challenge for many IT organizations. This is true
whether you use a framework-based management solution or a collection
of point products. Many organizations spend a great deal of time and money
evaluating the capabilities of a given tool but do very little to understand
how the full range of tools will perform in the context of the organization's
staff, skills, and processes. The main challenge of web site monitoring
and management is to be able to detect (and correct) a variety of problems
quickly. Often these problems go undetected because the monitoring and
management implementation is ineffective and does not fully leverage the
capabilities of the toolset Many tools are powerful and specific to the
problems that occur with the! se sites yet fail to perform at the expected
level. Tivoli has dozens of products that plug into its framework that
can be used to manage sites and their application components (Tivoli product
index, 2000). BMC Patrol has a wide variety of Knowledge Modules (Patrol,
2000) and detailed white papers to guide the technologist. Event management
is a good example of this phenomenon (Event management and notification,
2000). Considering the complexity of this implementation challenge, what
analysis and planning is required to implement effective Web site-specific
monitoring and management? How do you go about anticipating problems before
they happen, and when they happen, how do you correct them in a timely
manner? This paper provides a framework for the analysis and planning
that is necessary to identify and evaluate the important monitoring and
system management components for a Web site. This approach can be used
by the enterprise to plan monitoring and management if it is be! ing hosted
in-house or to evaluate the quality of monitoring and management if it
is being outsourced to a web hosting organization.
Our approach is based on perspectives that incorporate tools, processes,
organizational structure, and staff skills. The notion of perspectives
is used by Strum & Bumpus (1999) to give a simple name to what others
called disciplines, domains, functions, processes, and services. In our
use, a perspective is a point-of-view or focus. Our perspectives include
the system, support, and end user. The system perspective has a focus
on hardware and software grouped by infrastructure, application, and business
components. The support perspective is centered on four processes -- change,
problem, performance, and security with a concentration on team, process
and tools. The end user perspective is focused on measurement and improvement
of availability, performance, consistency, and reliability. The approach
is implemented using a three-part method.
Back
to Table of Contents
2
Method
We have developed a systematic methodology supported by a series of data
tables. This approach is used by IT consultants delivering services to
customers like those supporting Information Technology Infrastructure
Library (ITIL) Services (Introduction to ITIL, 2001) and Information Systems
Management Architecture processes (Harikian, Blust, Campbell, Cooke, Foley,
Gulla, Gayo, Howlette, Mosher, & O'Mara, 1996). These tables clearly
identify and document the monitoring and management components, processes,
and tools that are the focus of the framework. The method consists of
three steps. The steps are preplanning, analysis, and review. The focus
of the preplanning step is to gather materials and to perform early planning
activities. The materials collected include web site design diagrams and
configurations. Planning activities include listing site components and
examining items used in production like restart scripts, monitors, and
other tools. Preliminary system, suppo! rt, and end-user perspective tables
are built during this step.
An important activity of the analysis step is the completion of
the system, support, and end-user perspective tables. These completed
tables are used as the basis for a presentation or report. With partially
completed tables in hand, a meeting should be held to review what has
been found with the owners/developers of the web site. This meeting will
offer the opportunity to validate the requirements that have been discovered,
as well as gather any additional information that will have an impact
on the assessment. During the review step, a presentation or report
that contains finding and recommendations is presented. Final versions
of the system, support, and end-user perspective tables are also provided
and explained. The report is a major tool that should be used to drive
the activities to put the systems management solution in place to support
the production web site. An appendix to the presentation or report contains
a high-level implementation plan, whic! h should be used as a guide to
implement the action items from the study. The system, support, and end
user perspectives are now discussed in detail.
Back
to Table of Contents
3
System perspective
The system perspective has three areas of focus -- infrastructure, applications,
and business functions. Each area is different and taken as a whole, they
cover the system aspects of the Web site. The infrastructure focus concentrates
on the operating system, server and network hardware, and other devices
such as firewalls. The practical aspects of infrastructure support have
been discussed in detail elsewhere. A good example is Welter's work (1999).
The application focus places specific attention on the database, middleware,
and the application itself. Business functions focus on the comprehensive
management of a collection of applications. Comprehensive means business
views, monitors, and command and control.
Back
to Table of Contents
3.1
Creating the infrastructure table
For the site's infrastructure focus, create a table with as many specific
components as you determine to be key to the health of the infrastructure.
A good starting point should include the operating system, server hardware,
network hardware, and other devices like firewalls and load-balancing
servers. Derive the list of infrastructure components from the documentation
for the web site. For each specific component, identify a set of detailed
components. For the operating system, this should include detailed components
like CPU utilization, file systems, paging space, memory utilization,
etc. These detailed components will become the focus of the monitors that
will be used for ensure the availability of the infrastructure. Table
1 contains examples of infrastructure specific components and component
details.1
Table 1.
Infrastructure identification table
| Specific
Component |
Component
Details |
| Operating
System |
CPU
utilization |
|
File
systems |
|
Paging
space |
|
Memory
utilization |
|
OS processes
daemon (service) monitoring (Virus alert, log service, etc.) |
|
Interface
status |
|
Network
utilization |
|
Packet
loss |
|
External
access |
|
Network
collisions |
|
Network
processes daemon (service) monitoring |
| Server
Hardware |
RAID
array disk failure |
|
CPU
failure |
|
Disk
drive failure |
| Network
Hardware |
Switch
status |
|
Router
status |
| Other
Devices |
Load
balancing device status |
|
Firewall
status |
|
Caching
server status |
Back
to Table of Contents
3.2
Creating the application table
For the Web site's application focus, the concentration is on the database,
middleware, and the application itself. Table 2 contains examples of application
specific components and component details.
Table 2.
Application identification table
| Specific
Component |
Component
Details |
| Application |
Application
processes daemon (service) monitoring |
| Database |
Database
processes daemon (service) monitoring |
|
Communication
support monitoring |
|
Backup
success monitoring |
| Middleware |
Middleware
processes daemon (service) monitoring |
|
Queue
monitoring |
|
Channel
monitoring |
Back
to Table of Contents
3.3
Creating the business table
For the business aspect of the system perspective, the focus is on relating
the applications as business systems. We first observed this idea from
Tivoli's Global Enterprise Manager product (Tivoli global enterprise manager
instrumentation guide, 1998). However, our approach in this paper is product
independent. To relate applications as business systems, components are
grouped and taken as a whole. Views are used to visually manage the business
systems. Business system monitoring is more inclusive than regular monitors
are. For example, a business system monitor could have a monitor called
all business system interfaces. This monitor could check end-points
of all application interfaces, check availability of queues between applications,
and run a test transaction --all part of the same monitor -- all related
to one business system. Business system command and control has function
that is more inclusive than a regular command. For example, the command
could have a f! unction to shutdown and restart selected or all daemons
(services) of an application for all servers in the business system. Table
3 contains examples of business specific components and component details.
Table 3.
Business identification table
| Specific
Component |
Component
Details |
| Business
system view(s) |
A view
or views that contains related applications and components |
| Business
system monitor(s) |
Checks
all application interfaces, checks availability of queues between
applications, and run a test transaction (health check) |
| Business
system command(s) |
Shutdown
all (or selected) daemons (services) of a business system |
|
Startup
all (or selected) daemons (services) of a business system |
|
Restart
all (or selected) daemons (services) of a business system |
|
Display
all (or selected) events (traps) of a business system |
After
the infrastructure, application, and business components are identified
and documented important data should be transferred to the monitoring
evaluation table.
Back
to Table of Contents
4
Creating the monitoring evaluation table
The monitoring evaluation table serves as a tool to identify monitors
to be used (or developed) to address the management needs of the web site.
The table also contains information about scripts, commands, and views
that are needed for the management of the Web site. Table 4 is an example
of a completed monitoring evaluation table.
Table 4.
Monitoring evaluation table
| Specific
component |
Monitored
today? |
Current
Tool (or proposed) |
Evaluation |
| CPU
utilization |
Yes |
Tivoli
DM2 |
Current
monitor is effective |
| File
systems |
Yes |
Tivoli
DM |
Current
monitor is effective |
| Paging
space |
Yes |
Tivoli
DM |
Current
monitor is effective |
| Memory
utilization |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| OS Processes
Daemon (services) |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| Interface
status |
No |
(NetView)2 |
Proposed
tool will be effective |
| Network
utilization |
No |
(NetView) |
Proposed
tool will be effective |
| Packet
loss |
Yes |
NetView |
Current
tool is effective |
| External
access |
Yes |
ESM2 |
Current
tool is effective |
| Network
collisions |
No |
(Trend)2 |
Proposed
tool will be effective |
| Network
processes daemon (service) monitoring |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| RAID
array disk failures |
No |
(Researching) |
Site
has had problems that have gone undetected |
| CPU
failure |
No |
(Researching) |
Site
has had problems that have gone undetected |
| Disk
drive failure |
No |
(Researching) |
Site
has had problems that have gone undetected |
| Switch
status |
Yes |
NetView |
Current
tool is effective |
| Router
status |
Yes |
NetView |
Current
tool is effective |
| Load
balancing device status |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| Firewall
status |
No |
(Custom
script) |
Proposed
script will be effective |
| Caching
server status |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| Application
processes daemon monitoring |
No |
(Tivoli
DM) |
Proposed
monitor will be effective |
| Database
processes daemon (service) monitoring |
No |
(Tivoli
Oracle Module) |
Proposed
monitor will be effective |
| Communication
support monitoring |
No |
(Tivoli
Oracle Monitor) |
Proposed
monitor will be effective |
| Backup
success monitoring |
Yes |
Tivoli
DM |
At times,
unsuccessful backups have gone undetected |
| Middleware
processes daemon (service) monitoring |
No |
(Tivoli
MQ Module) |
Proposed
monitor will be effective |
| Queue
monitoring |
No |
(Tivoli
MQ Module) |
Proposed
monitor will be effective |
| Channel
monitoring |
No |
(Tivoli
MQ Module) |
Proposed
monitor will be effective |
| Business
views |
Yes |
NetView |
Current
views will need to be enhanced |
| Business
monitors |
No |
(Custom
monitors using Tivoli DM) |
Proposed
monitors will be effective |
| Business
commands |
No |
(Custom
scripts) |
Proposed
commands will be effective |
Back
to Table of Contents
5
Support perspective
The key components of the support perspective are team, process and tools.
These components are explored in the context of four disciplines -- change,
problem, performance, and security. Change, problem, performance, and
security management are widely practiced disciplines in the industry.
They are basic to IBM systems management (Mangold & Brandner, 1993)
and many other organizations focused on the management of systems like
ISIL (Introduction to ITIL, 2001). The scope of the support perspective
is both broad and narrow. The broad scope has to do with the readiness
of change, problem, performance, and security teams to handle the Web
site's needs. The narrow scope is how monitoring, command, and control
interact with the specific functional perspectives. For example, during
a change window, how is monitoring handled to avoid a flood of false alerts?
5.1
Change management
Change management is a process whose goal is to provide defect-free implementation
of changes to the system environment. This process includes planning and
documentation of the change, real time management of the change, verification
of completion or, in the case of failure, verification of restoration
back to the original state, and follow-up analysis and reporting. Changes,
including both installation and modifications, should be accomplished
in a logical and orderly fashion, which achieve the expected result without
undesired service disruption. From a system-monitoring viewpoint, we see
three important issues:
- How
to shut down the monitoring system during a change activity in order
to avoid a flood of false positive alerts
- How
to reactivate the monitoring system follows a change activity to verify
that all systems and services are functioning normally.
- How
to convert problem management activities into change activities
From
an overall management viewpoint, the change evaluation table has a series
of questions that are designed to discover how prepared the change management
team, tools, and processes are to handle the Web site being evaluated.
Table 5 is an example of a completed change evaluation table.
Table 5.
Change evaluation table
| Specific
component |
Component
details |
Evaluation |
| Team |
Who
are the team members? |
Change
team is in place with three members |
|
What
skills, experience, training does the team posses? |
Change
team is experienced and well trained in the change-management discipline |
|
What
coverage is provided by the team? Is this adequate? |
Team
works normal business hours. Recent problems during after-hours change
windows may require change-management team coverage to expedite decision-making. |
|
Are
there any change measurements that can used to evaluate the effectiveness
of the team? If so, what data? |
Few
measurements exist at this time. Change success rate cannot be determined.
Both a method and tool is needed to measure change success rate. |
|
What
are the strengths and weaknesses of the team? |
Team
understands its well-defined process. Team does not have organizational
support. Significant changes that should be done in the weekly window
are done without change team awareness. |
|
What
other teams are key to the success of this team? Are there any issues?
|
Problem
team is related to change team as some problems result in changes
being scheduled in change windows. There is no tool linkage between
change and problem. Two different software tools are used and they
do not work together. |
|
Based
on the above evaluation, what are the primary issues? What is the
action plan to resolve? |
Action items include --
- Recommendation: change team support for change window
- Recommendation: measurements
- Recommendation: more changes need to be planned and scheduled
- Recommendation: interface is needed between problem and change
|
| Tools |
What
is the primary tool for tracking and managing change activities? |
Information
management tool with change-management panels |
|
What
are the strengths and weaknesses of this tool? |
Well
known and understood legacy tool; Access is only through mainframe
(TSO) |
|
What
other tools are used to manage change activities? |
Some
Databases are used to store detailed plans and other documents |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: Web access to change records
- Recommendation: Use Web to integrate change records and supporting
plans
|
| Process |
Is
there a document, which defines the organization's change management
policies and procedures? |
Yes,
well defined |
|
Is
this consistent with actual practice, if not, where are the gaps |
No,
significant changes appear to take place outside change window |
|
Evaluate
and develop action plan to address deficiencies |
Action items include --
- Recommendation: Look for a root cause of unmanaged change
|
| Overall |
What
reports are available to review change activity performance? Are they
adequate? |
Few
reports |
|
What
is the leading cause of failed changes? What is being done to address
this? |
Not
tracked |
|
What
is an acceptable level of failed changes? Is this measure being met? |
No
plan in place |
|
What
is the level of customer satisfaction with change management? If customer
satisfaction is low what are the reasons why? |
Unknown |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: Initiate change management reporting
- Recommendation: Document acceptable level of failed changes
- Recommendation: Initiate measurement of customer satisfaction
with change management
|
Back
to Table of Contents
5.2
Problem management
Problem management is the successful awareness of and response to all
monitoring tool alerts and other manually reported or detected problems
and the resolution of any events, conditions, failures, etc indicated
by this information. The entire set of activities is focused on ensuring
that the site is available and functioning in the manner in which it was
designed. The essential issues include:
- Configuring
automated alert tools with the appropriate level of sensitivity -
A tool that queries too often can impact the functioning of the site,
if it queries too infrequently it will let problems remain undetected
for an unacceptable length of time. Tools that are too sensitive can
also generate too many alerts or tickets, creating a flood of false
positives that obscure the actual functional state of the system.
Tools that are too insensitive miss problems and have little value.
Getting the balance correct can be a challenge.
- The
7x24 nature of e-business web sites and the number of systems used
in the typical web site indicate that in order to scale the problem
management system and control costs some degree of automation is required.
Automation can insure a rapid response to simple problems regardless
of when they occur.
- How
to achieve timely resolution of problems? This is possible but you
need the right people, tools, and processes.
From
an overall management viewpoint, the problem evaluation table has a series
of questions that are designed to discover how prepared the problem management
team, tools, and processes are to handle the Web site being evaluated.
Table 6 is an example of a completed problem evaluation table.
Table 6.
Problem evaluation table
| Specific
component |
Component
details |
Evaluation |
| Team |
Who
are the team members? |
There
is no problem team as such -- problems are handled by helpdesk personnel.
Tough problems are sent over to the administrators of the Web site.
|
|
What
skills, experience, training does the team posses? |
Problem
handlers have basic skills in problem determination; administrators
know the application but could benefit from documented problem-determination
procedures. |
|
What
coverage is provided by the team? Is this adequate? |
Problem
handling is done 24 X 7; administrators are available by pager after
normal business hours. |
|
Are
there any problem measurements that can used to evaluate the effectiveness
of the team? If so, what data? |
Administrators
analyze problem records monthly. Root cause analysis is done on all
serious problems. |
|
What
are the strengths and weaknesses of the team? |
There
is no problem team but relationship between problem catchers and administrators
works well. |
|
What
other teams are key to the success of this team? Are there any issues? |
Change
team should be linked with problem solvers -- some problems result
in changes (both scheduled and non-scheduled) |
|
Based
on the above evaluation, what are the primary issues? What is the
action plan to resolve? |
Action items include --
- Recommendation: document and refine problem determination procedure
- Recommendation: open communication channels between problem
solving group and change team to improve number of changes that
go through a controlled process
|
| Tools |
What
is the primary tool for tracking and managing change activities? |
Information
management tool with problem-management panels |
|
What
are the strengths and weaknesses of this tool? |
Well
known and understood legacy tool; Access is only through mainframe
(TSO) |
|
What
other tools are used to manage change activities? |
Some
databases are used to store root-cause analysis and other documents |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: Web access to problem records
- Recommendation: Use Web to integrate problem records and related
documents like root-cause analysis
|
| Process |
Is
there a document, which defines the organization's problem management
policies and procedures? |
Yes,
but high-level |
|
Is
this consistent with actual practice, if not, where are the gaps |
Not
enough detail to really determine |
|
Evaluate
and develop action plan to address deficiencies |
Action items include --
- Recommendation: linked to an earlier recommendation -- document
and refine problem determination procedure
|
| Overall |
What
reports are available to review problem activity performance? Are
they adequate? |
Problem
reports are detailed |
|
What
is the leading cause of problems? What is being done to address this? |
Application
instability; application is being migrated to more robust software
implementation |
|
What
is an acceptable level of problems? Is this measure being met? |
Team
is looking for 99.5% availability of application |
|
What
is the level of customer satisfaction with problem management? If
customer satisfaction is low what are the reasons why? |
Not
measured but not believed to be a major problem |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: document acceptable level of problems
- Recommendation: Initiate measurement of customer satisfaction
with problem management
|
Back
to Table of Contents
5.3
Performance management
Performance management is focused on the measurement and reporting of
system resources by the application and its users. Performance management
can be used to report problems in real time but is generally used to determine
performance trends and to plan for necessary resources upgrades or modifications.
From an overall management viewpoint, the performance evaluation table
has a series of questions that are designed to discover how prepared the
performance management team, tools, and processes are to handle the Web
site being evaluated. Table 7 is an example of a completed performance
evaluation table.
Table 7.
Performance evaluation table
| Specific
component |
Component
details |
Evaluation |
| Team |
Who
are the team members? |
Performance
team is in place with four members |
|
What
skills, experience, training does the team posses? |
Performance
team is experienced and well trained with performance tools |
|
What
coverage is provided by the team? Is this adequate? |
Team
works normal business hours. Most performance management is done after
the fact -- most tools are not real-time |
|
Are
there any performance measurements that can used to evaluate the effectiveness
of the team? If so, what data? |
Few
measurements exist at this time. Team consults with Web administrators
and shares performance information |
|
What
are the strengths and weaknesses of the team? |
Team
understands tools and does a good job in its consulting role. Team
is not equipped to deal with "emergency" performance problems. |
|
What
other teams are key to the success of this team? Are there any issues?
|
Performance
team works with Web administrators and teams working problems |
|
Based
on the above evaluation, what are the primary issues? What is the
action plan to resolve? |
Action items include --
- Recommendation: performance team needs a methodology to work
emergency performance problems
|
| Tools |
What
is the primary tool for investigating performance problems? |
Team
uses utilities that are part of OS and records statistics to log files |
|
What
are the strengths and weaknesses of this tool? |
Tools
are well-known and easy to use and interpret. Tools require systems
administration level skills. |
|
What
other tools are used to manage performance? |
Just
scripts and basic reporting tools like SAS |
|
What
specific performance metrics are collected |
CPU
utilization, page space, memory, and disk utilization |
|
What
is the timeframe for the collection of performance data |
Data
is stored for the past 6 months |
|
Evaluate
and develop action plan to address deficiencies. |
Action
items include --
- Recommendation: Real-time tool is needed to support emergency
performance problems
|
| Process |
Is
there a document, which defines the organization's performance management
policies and procedures? |
No,
performance management is not a core discipline |
|
Is
this consistent with actual practice, if not, where are the gaps |
Team
just provides consulting-level assistance |
|
Evaluate
and develop action plan to address deficiencies |
Action
items include --
- Recommendation: It is unclear if performance-management focus
need to be more formal as performance of the site is handled carefully
by the administration and performance community
|
| Overall |
What
reports are available to review site performance? Are they adequate? |
No
real reports, just ad-hoc reporting |
|
What
is the leading cause of performance problems? What is being done to
address this? |
Not
tracked |
|
What
is an acceptable level of Web site performance? Is this measure being
met? |
No
plan in place |
|
What
is the level of customer satisfaction with the performance of the
site? If customer satisfaction is low what are the reasons why? |
Unknown.
Current measurements are manual and not shared with the site owners
and administrators. |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: Initiate performance reporting
- Recommendation: Document acceptable level of performance
- Recommendation: Initiate measurement of customer satisfaction
with site performance
|
Back
to Table of Contents
5.4
Security management
Security management has the goal of maintaining the integrity of controls
regarding who has access to what areas of the system and what is viewable
or changeable. Security management takes a variety of forms including
perimeter security (firewalls and site hardening), authentication/authorization
(passwords and associated permissions), intrusion detection, and policy
oversight. From an overall management viewpoint, the security management
evaluation table has a series of questions that are designed to discover
how prepared the security management team, tools, and processes are to
handle the Web site being evaluated. Table 8 is an example of a completed
security evaluation table.
Table 8.
Security evaluation table
| Specific
component |
Component
details |
Evaluation |
| Team |
Who
are the team members? |
Security
team is in place with two members |
|
What
skills, experience, training does the team posses? |
Security
team is experienced and well trained in security products |
|
What
coverage is provided by the team? Is this adequate? |
Team
works normal business hours. Team members are frequently paged for
security alerts and emergency changes. |
|
Are
there any security measurements that can used to evaluate the effectiveness
of the team? If so, what data? |
Few
measurements exist at this time. Security information is kept confidential
and incidents are not disclosed to the public or customers. |
|
What
are the strengths and weaknesses of the team? |
Team
understands products. Team is not always aware of demands place upon
them from new technology choices. |
|
What
other teams are key to the success of this team? Are there any issues? |
Team
works well with other teams as needed. |
|
Based
on the above evaluation, what are the primary issues? What is the
action plan to resolve? |
Action items include --
- Recommendation: consider a change in hours worked by team. Two
12-hour shifts might better align with Web site demands.
- Recommendation: Improve communication so security team gets
advance warning on security needs of new sites
|
| Tools |
What
is the primary tool for tracking and managing security activities? |
Electronic
mail to security team leader |
|
What
are the strengths and weaknesses of this tool? |
At
times, due to high volumes, some requests are missed |
|
What
other tools are used to manage security activities? |
Team
uses web tools to get latest security updates and to perform searches |
|
Evaluate
and develop action plan to address deficiencies. |
Action items include --
- Recommendation: Consider using a work flow tool instead on electronic
mail so work requests are less likely to be overlooked
|
| Process |
Is
there a document, which defines the organization's security management
policies and procedures? |
Yes,
well defined |
|
Is
this consistent with actual practice, if not, where are the gaps |
Yes,
security team handles all security-related activities for the site |
|
Evaluate
and develop action plan to address deficiencies |
Action items include --
- No process recommendations
|
| Overall |
What
reports are available to review security activity performance? Are
they adequate? |
Some
reports are available but are kept in confidence |
|
What
is the leading cause of security breaches? What is being done to address
this? |
Some
hacking has been targeted at the site but it has no serious exposures
at this time |
|
What
is an acceptable level of security breaches? Is this measure being
met? |
No
exposures are tolerated as the site contains financial data. It is
believed that no financial loss has happened with the site due to
hacking. |
|
What
is the level of customer satisfaction with security management? If
customer satisfaction is low what are the reasons why? |
The
team works with auditors to supply the required logs and reports.
Customer satisfaction is high. |
|
Evaluate
and develop action plan to address deficiencies. |
Action
items include --
- No overall recommendations
|
Back
to Table of Contents
6
End user perspective
The focus of the end user perspective is on measurement, evaluation, and
improvement. From an overall management viewpoint, the end user evaluation
table has a series of questions that are designed to discover how well
availability, performance, consistency, and reliability are being addressed.
This perspective is focused on the measurement and improvement in the
quality of the end user experience. Table 9 is an example of a completed
end user evaluation table.
Table 9.
End user evaluation table
| Specific
component |
Measurement |
Evaluation |
Improvement |
Tool |
| Availability |
Is
the site reachable? |
Based
on problem records, it is available 95% of the time |
Could
benefit from an approach that proactively tests the site |
Silk
scripts created for deployment stress tests2 |
|
Are
the site's functions operational? |
Too
rich to test all application functions |
Use
a sampling approach to proactively test site functionality |
Silk
scripts created for deployment stress tests are usually a good sample |
| Performance |
What
is the overall response time of the site? |
No
measurement in place |
At
minimum, should have a performance dataset |
Keynote
Perspective could be used to establish basic performance measures2 |
|
Do
any functions exceed a maximum time to complete? |
No
measurement in place; Service level objective handles availability
not performance |
Should
establish performance SLA |
Keynote
Perspective could be used to measures performance |
| Consistency |
Are
the content and values that are returned by the site consistent from
moment to moment and consistent with the site design and configuration? |
No
measurement in place |
Create
Silk script that tests key consistency measures |
Use
Silk scripts that interact with the application exercising key consistency
measures |
| Reliability |
Is
the content returned correctly? Are all links functional? Are all
values returned correctly? |
No
measurement in place |
Create
Silk script that tests key reliability measures |
Use
Silk scripts that interact with the application exercising key reliability
measures |
Back
to Table of Contents
7
Review report or presentation to complete the review
When the analysis is complete, a report or presentation should be created
to share the finding and recommendations. We have found that some teams
respond to a detailed written report whereas others require a presentation.
The report or presentation should contain the following parts:
- Executive
summary
- Detailed
discussion and findings including a discussion of exposures and opportunities
- Recommendations
The
report should also include two appendixes. Appendix A should contain the
completed tables. This supporting detail will lend support to the findings
and recommendations in the body of the report or presentation. Appendix
B should contain a high-level implementation plan. This plan will provide
linkage to the next phase of the project.
Back
to Table of Contents
8
Summary
This paper described a framework and supporting method to evaluate management
tools and methods for new or existing web site. Three perspectives are
the basis of the framework -- system, support, and end-user. The method
is used for proactive planning or because of problem solving activities.
The method involves three steps -- preplanning, analysis, and review.
A number of tables were used to support the analysis associated with the
method. Findings and recommendations were discussed in a report or presentation.
Appendix B of the report contains a high-level implementation plan.
About
the authors
Joseph Gulla is a Senior Consulting Information Technology Manager for
IBM Global Services. Mr. Gulla works with customers of the IBM Web Hosting
facility in Research Triangle Park, NC. Mr. Gulla has written a number
of IBM internal publications and co-authored the IBM Redbook Distributed
systems management design guidelines: The smart way to design (1996).
Mr. Gulla has spoken at a number of technology forums including Planet
Tivoli, ESM Sharenet, IBM Security Seminars, and the NorthEast Information
Systems User Group where he served on the Advisory Board. He has held
a number of technical and management positions over the last 22 years.
John Hankins is a Senior Architecture and Delivery Specialist on the IBM
Global Services Web Offerings team. He has worked in the field of Internet
services development and management for the past 10 years. The past 5
years have focused on the specific area of web hosting services. Mr. Hankins
has presented at numerous conferences on Internet services and has served
as a co principal investigator on three National Science Foundation grants.
He has held management positions in a variety of technology organizations
over the past 15 years.
Footnotes
1. About the example tables: These tables do not contain information about
a specific IBM web-hosting customer. The tables are also not representative
of IBM's change, problem, performance, and security management policies.
The data in the tables has been made up simply to illustrate the main
points of the paper and the power of the methodology.
2. Copyright information: Tivoli and NetView are trademarks of Tivoli
Systems, Inc; ESM is an abbreviation for Enterprise Security Manager which
is a trademark of AXENT Technologies, Inc; Trend is a trademark of Desktalk,
Inc; Silk is a trademark of Trinagy, Inc; Keynote Perspective is a product
of Keynote Systems; All other products and company names are either trademarks
or registered trademarks of their respective companies.
Back
to Table of Contents
References
Berthold, M. & Brandner, R. (1993). Systems and network management
in distributed environments. Research Triangle Park: International Business
Machines.
Event management and notification - White paper. (2000). http://www.bmc.com/rs-bin/RightSite/getco.
Accessed August 15, 2000.
Frick, Vaughn. (2000). Transforming the enterprise to embrace e-business.
Presentation delivered on March 7, 2000 at IBM Managers meeting in
RTP, NC.
Harikian, V., Blust, B., Campbell, M., Cooke, S., Foley, R., Gulla, J.,
Gayo, F., Howlette, M., Mosher, L., & O'Mara, M. (1996). Distributed
systems management design guidelines: The smart way to design. Research
Triangle Park: International Business Machines.
Introduction to ITIL Books. (2001). http://www.pinkelephant.com.
Accessed April 11, 2001.
Leadership for the new millennium, delivering on digital progress and
prosperity. The third annual report of the Electronic Commerce Working Group.
(2001). http://www.ecommerce.gov/.
Accessed April 11, 2001.
Patrol 2000 by BMC software. (2000). http://www.bmc.com/patrol.
Accessed July 26, 2000.
Strum, R. & Bumpus, W. (1999). Foundations of application management.
New York: John Wiley & Sons.
Tivoli global enterprise manager instrumentation guide. (1998). Raleigh,
NC: Tivoli Systems.
Tivoli product index. (2000). http://www.tivoli.com/products/index/.
Accessed August 31, 2000.
U.S. White House. (1997). A framework for global electronic commerce.
http://www.ecommerce.gov/framework.htm.
Updated July 1, 1997. Accessed May 22, 2000.
Welter, P. (1999). Web server monitoring white paper. http://www.summitonline.com/apps-databases/papers/fhesh-man.html.
Accessed February 9, 1999. Author's email: pete@freshtech.com. |