aboutsummaryrefslogtreecommitdiff
path: root/vendors/kses/README
blob: 192524c9ff837774bee56182df6931fed2140dd8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
kses 0.2.2 README  [kses strips evil scripts!]
=================


* INTRODUCTION *


Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted
HTML elements and attributes, no matter how malformed HTML input you give it.
It also does several checks on attribute values. kses can be used to avoid
Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks,
among other things.

The program is released under the terms of the GNU General Public License. You
should look into what that means, before using kses in your programs. You can
find the full text of the license in the file COPYING.


* FEATURES *


Some of kses' current features are:

* It will only allow the HTML elements and attributes that it was explicitly
told to allow.

* Element and attribute names are case-insensitive (a href vs A HREF).

* It will understand and process whitespace correctly.

* Attribute values can be surrounded with quotes, apostrophes or nothing.

* It will accept valueless attributes with just names and no values (selected).

* It will accept XHTML's closing " /" marks.

* Attribute values that are surrounded with nothing will get quotes to avoid
producing non-W3C conforming HTML
(<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML).

* It handles lots of types of malformed HTML, by interpreting the existing
code the best it can and then rebuilding new code from it. That's a better
approach than trying to process existing code, as you're bound to forget about
some weird special case somewhere. It handles problems like never-ending
quotes and tags gracefully.

* It will remove additional "<" and ">" characters that people may try to
sneak in somewhere.

* It supports checking attribute values for minimum/maximum length and
minimum/maximum value, to protect against Buffer Overflows and Denial of
Service attacks against WWW clients and various servers. You can stop
<iframe src= width= height=> from having too high values for width and height,
for instance.

* It has got a system for whitelisting URL protocols. You can say that
attribute values may only start with http:, https:, ftp: and gopher:, but no
other URL protocols (javascript:, java:, about:, telnet:..). The functions that
do this work handle whitespace, upper/lower case, HTML entities
("jav&#97;script:") and repeated entries ("javascript:javascript:alert(57)").
It also normalizes HTML entities as a nice side effect.

* It removes Netscape 4's JavaScript entities ("&{alert(57)};").

* It handles NULL bytes and Opera's chr(173) whitespace characters.

* There is a procedural version and two object-oriented versions (for PHP 4
  and PHP 5) of kses.


* USE IT *


It's very easy to use kses in your own PHP web application! Basic usage looks
like this:


<?php

include 'kses.php';

$allowed = array('b' => array(),
                 'i' => array(),
                 'a' => array('href' => 1, 'title' => 1),
                 'p' => array('align' => 1),
                 'br' => array());

$val = $_POST['val'];
if (get_magic_quotes_gpc())
  $val = stripslashes($val);
# You must strip slashes from magic quotes, or kses will get confused.

$val = kses($val, $allowed); # The filtering takes place here.

# Do something with $val.

?>


This definition of $allowed means that only the elements B, I, A, P and BR are
allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR
may not have any attributes. A may only have the attributes HREF and TITLE,
while P may only have the attribute ALIGN. You can list the elements and
attributes in the array in any mixture of upper and lower case. kses will also
recognize HTML code that uses both lower and upper case.

It's important to select the right allowed attributes, so you won't open up
an XSS hole by mistake. Some important attributes that you mustn't allow
include but are not limited to: 1) style, and 2) all intrinsic events
attributes (onMouseOver and so on, on* really). I'll write more about this in
the documentation that will be distributed with future versions of kses.

It's also important to note that kses' HTML input must be cleaned of all
slashes coming from magic quotes. If the rest of your code requires these
slashes to be present, you can always add them again after calling kses with
a simple addslashes() call.

You should take a look at the documentation in the docs/ directory and the
examples in the examples/ directory, to get more information on how to use
kses. The object-oriented versions of kses are also worth checking out, and
they're included in the oop/ directory.


* UPGRADING TO 0.2.2 *


kses 0.2.2 is backwards compatible with all previous releases, so upgrading
should just be a matter of using a new version of kses.php instead of an old
one.


* NEW VERSIONS, MAILING LISTS AND BUG REPORTS *


If you want to download new versions, subscribe to the kses-general mailing
list or even take part in the development of kses, we refer you to its
homepage at  http://sourceforge.net/projects/kses . New developers and beta
testers are more than welcome!

If you have any bug reports, suggestions for improvement or simply want to tell
us that you use kses for some project, feel free to post to the kses-general
mailing list. If you have found any security problems (particularly XSS,
naturally) in kses, please contact Ulf privately at  metaur at users dot
sourceforge dot net  so he can correct it before you or someone else tells the
public about it.

(No, it's not a security problem in kses if some program that uses it allows a
bad attribute, silly. If kses is told to accept the element body with the
attributes style and onLoad, it will accept them, even if that's a really bad
idea, securitywise.)


* OTHER HTML FILTERS *


Here are the other stand-alone, open source HTML filters that we currently know
of:

* Htmlfilter for PHP - the filter from Squirrelmail
  PHP
  Konstantin Riabitsev
  http://linux.duke.edu/projects/mini/htmlfilter/

* HTML::StripScripts and related CPAN modules
  Perl
  Nick Cleaton
  http://search.cpan.org/perldoc?HTML%3A%3AStripScripts

* SafeHtmlChecker [is this really open source?]
  PHP
  Simon Willison
  http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker

There are also a lot of HTML filters that were written specifically for some
program. Some of them are better than others.

Please write to the kses-general mailing list if you know of any other
stand-alone, open-source filters.


* DEDICATION *


kses 0.2.2 is dedicated to Audrey Tautou and Jean-Pierre Jeunet.


* MISC *


The kses code is based on an HTML filter that Ulf wrote on his own back in 2002
for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/
gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been
improved a lot since then.

To stop people from having sleepless nights, we feel the urgent need to state
that kses doesn't have anything to do with the KDE project, despite having a
name that starts with a K.

In case someone was wondering, Ulf is available for kses-related consulting.

Finally, the name kses comes from the terms XSS and access. It's also a
recursive acronym (every open-source project should have one!) for "kses
strips evil scripts".


// Ulf and the kses development group, February 2005