diff options
Diffstat (limited to 'mod/htmlawed/vendors/htmLawed/htmLawed_README.htm')
-rw-r--r-- | mod/htmlawed/vendors/htmLawed/htmLawed_README.htm | 105 |
1 files changed, 96 insertions, 9 deletions
diff --git a/mod/htmlawed/vendors/htmLawed/htmLawed_README.htm b/mod/htmlawed/vendors/htmLawed/htmLawed_README.htm index 131838ade..7138ee9c0 100644 --- a/mod/htmlawed/vendors/htmLawed/htmLawed_README.htm +++ b/mod/htmlawed/vendors/htmLawed/htmLawed_README.htm @@ -64,7 +64,7 @@ span.totop a, span.totop a:visited {color: #6699cc;}   <span class="toc-item"><a href="#s2.6"><span class="item-no">2.6</span>  Use without modifying old <span class="term">kses()</span> code</a></span><br />   <span class="toc-item"><a href="#s2.7"><span class="item-no">2.7</span>  Tolerance for ill-written HTML</a></span><br />   <span class="toc-item"><a href="#s2.8"><span class="item-no">2.8</span>  Limitations & work-arounds</a></span><br /> -  <span class="toc-item"><a href="#s2.9"><span class="item-no">2.9</span>  Examples</a></span><br /> +  <span class="toc-item"><a href="#s2.9"><span class="item-no">2.9</span>  Examples of usage</a></span><br /> <span class="toc-item"><a href="#s3"><span class="item-no">3</span>  Details</a></span><br />   <span class="toc-item"><a href="#s3.1"><span class="item-no">3.1</span>  Invalid/dangerous characters</a></span><br />   <span class="toc-item"><a href="#s3.2"><span class="item-no">3.2</span>  Character references/entities</a></span><br /> @@ -110,8 +110,8 @@ span.totop a, span.totop a:visited {color: #6699cc;} <div id="body"> <br /> -<div class="comment">htmLawed_README.txt, 23 April 2009<br /> -htmLawed 1.1.8, 23 April 2009<br /> +<div class="comment">htmLawed_README.txt, 22 December 2009<br /> +htmLawed 1.1.9, 22 December 2009<br /> Copyright Santosh Patnaik<br /> GPL v3 license<br /> A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed">http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed</a> </div> @@ -180,7 +180,7 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl <br />   *  remove <strong>null</strong> characters  *<br />   *  neutralize potentially dangerous proprietary Netscape <strong>Javascript entities</strong>  *<br /> -  *  replace potentially dangerous <strong>soft-hyphen</strong> character in attribute values with spaces  *<br /> +  *  replace potentially dangerous <strong>soft-hyphen</strong> character in URL-accepting attribute values with spaces  *<br /> <br />   *  remove common <strong>invalid characters</strong> not allowed in HTML or XML  ^`<br />   *  replace <strong>characters from Microsoft applications</strong> like <span class="term">Word</span> that are discouraged in HTML or XML  ^~`<br /> @@ -726,9 +726,92 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl </div> <div class="sub-section"><h3> -<a name="s2.9" id="s2.9"></a><span class="item-no">2.9</span>  Examples +<a name="s2.9" id="s2.9"></a><span class="item-no">2.9</span>  Examples of usage </h3><span class="totop"><a href="#peak">(to top)</a></span><br style="clear: both;" /> <br /> +  Safest, allowing only <em>safe</em> HTML markup --<br /> +<br /> + +<code class="code">    $config = array('safe'=>1);</code> +<br /> + +<code class="code">    $out = htmLawed($in);</code> +<br /> +<br /> +  Simplest, allowing all valid HTML markup except <span class="term">javascript:</span> --<br /> +<br /> + +<code class="code">    $out = htmLawed($in);</code> +<br /> +<br /> +  Allowing all valid HTML markup including <span class="term">javascript:</span> --<br /> +<br /> + +<code class="code">    $config = array('schemes'=>'*:*');</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Allowing only <span class="term">safe</span> HTML and the elements <span class="term">a</span>, <span class="term">em</span>, and <span class="term">strong</span> --<br /> +<br /> + +<code class="code">    $config = array('safe'=>1, 'elements'=>'a, em, strong');</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Not allowing elements <span class="term">script</span> and <span class="term">object</span> --<br /> +<br /> + +<code class="code">    $config = array('elements'=>'* -script -object');</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Not allowing attributes <span class="term">id</span> and <span class="term">style</span> --<br /> +<br /> + +<code class="code">    $config = array('deny_attribute'=>'id, style');</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Permitting only attributes <span class="term">title</span> and <span class="term">href</span> --<br /> +<br /> + +<code class="code">    $config = array('deny_attribute'=>'* -title -href');</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Remove bad/disallowed tags altogether instead of converting them to entities --<br /> +<br /> + +<code class="code">    $config = array('keep_bad'=>0);</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config);</code> +<br /> +<br /> +  Allowing attribute <span class="term">title</span> only in <span class="term">a</span> and not allowing attributes <span class="term">id</span>, <span class="term">style</span>, or scriptable <em>on*</em> attributes like <span class="term">onclick</span> --<br /> +<br /> + +<code class="code">    $config = array('deny_attribute'=>'title, id, style, on*');</code> +<br /> + +<code class="code">    $spec = 'a=title';</code> +<br /> + +<code class="code">    $out = htmLawed($in, $config, $spec);</code> +<br /> +<br /> +  Some case-studies are presented below.<br /> +<br />   <strong>1.</strong> A blog administrator wants to allow only <span class="term">a</span>, <span class="term">em</span>, <span class="term">strike</span>, <span class="term">strong</span> and <span class="term">u</span> in comments, but needs <span class="term">strike</span> and <span class="term">u</span> transformed to <span class="term">span</span> for better XHTML 1-strict compliance, and, he wants the <span class="term">a</span> links to be to <span class="term">http</span> or <span class="term">https</span> resources:<br /> <br /> @@ -772,14 +855,14 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl <br />   The character values are replaced with entities/characters and not character values referred to by the entities/characters to keep this task independent of the character-encoding of input text.<br /> <br /> -  The <span class="term">$config["clean_ms_char"]</span> parameter need not be used if authors do not copy-paste Microsoft-created text or if the input text is not believed to use the <span class="term">Windows 1252</span> or a similar encoding. Further, the input form and the web-pages displaying it or its content should have the character encoding appropriately marked-up.<br /> +  The <span class="term">$config["clean_ms_char"]</span> parameter should not be used if authors do not copy-paste Microsoft-created text, or if the input text is not believed to use the <span class="term">Windows 1252</span> (<span class="term">Cp-1252</span>) or a similar encoding like <span class="term">Cp-1251</span>. Further, the input form and the web-pages displaying it or its content should have the character encoding appropriately marked-up.<br /> </div> <div class="sub-section"><h3> <a name="s3.2" id="s3.2"></a><span class="item-no">3.2</span>  Character references/entities </h3><span class="totop"><a href="#peak">(to top)</a></span><br style="clear: both;" /> <br /> -  Valid character entities take the form <span class="term">&*;</span> where <span class="term">*</span> is <span class="term">#x</span> followed by a hexadecimal number (hexadecimal numeric entity; like <span class="term">&#xA0;</span> for non-breaking space), or alphanumeric like <span class="term">gt</span> (external or named entity; like <span class="term">&nbsp;</span> for non-breaking space), or <span class="term">#</span> followed by a number (decimal numeric entity; like <span class="term">&#160;</span> for non-breaking space). Character entities referring to the soft-hyphen character (the <span class="term">&shy;</span> or <span class="term">\xad</span> character; hexadecimal code-point <span class="term">ad</span> [decimal <span class="term">173</span>]) in attribute values are always replaced with spaces; soft-hyphens in attribute values introduce vulnerabilities in some older versions of the Opera and Mozilla [Firefox] browsers.<br /> +  Valid character entities take the form <span class="term">&*;</span> where <span class="term">*</span> is <span class="term">#x</span> followed by a hexadecimal number (hexadecimal numeric entity; like <span class="term">&#xA0;</span> for non-breaking space), or alphanumeric like <span class="term">gt</span> (external or named entity; like <span class="term">&nbsp;</span> for non-breaking space), or <span class="term">#</span> followed by a number (decimal numeric entity; like <span class="term">&#160;</span> for non-breaking space). Character entities referring to the soft-hyphen character (the <span class="term">&shy;</span> or <span class="term">\xad</span> character; hexadecimal code-point <span class="term">ad</span> [decimal <span class="term">173</span>]) in URL-accepting attribute values are always replaced with spaces; soft-hyphens in attribute values introduce vulnerabilities in some older versions of the Opera and Mozilla [Firefox] browsers.<br /> <br />   htmLawed (function <span class="term">hl_ent()</span>):<br /> <br /> @@ -1605,6 +1688,10 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl <br />   <em>Version number - Release date. Notes</em><br /> <br /> +  1.1.9 - 22 December 2009. Soft-hyphens are now removed only from URL-accepting attribute values<br /> +<br /> +  1.1.8.1 - 16 July 2009. Minor code-change to fix a PHP error notice<br /> +<br />   1.1.8 - 23 April 2009. Parameter <span class="term">deny_attribute</span> now accepts the wild-card <span class="term">*</span>, making it simpler to specify its value when all but a few attributes are being denied; fixed a bug in interpreting <span class="term">$spec</span><br /> <br />   1.1.7 - 11-12 March 2009. Attributes globally denied through <span class="term">deny_attribute</span> can be allowed element-specifically through <span class="term">$spec</span>; <span class="term">$config["style_pass"]</span> allowing letting through any <span class="term">style</span> value introduced; altered logic to catch certain types of dynamic crafted CSS expressions<br /> @@ -1658,7 +1745,7 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl <a name="s4.6" id="s4.6"></a><span class="item-no">4.6</span>  Comparison with <span class="term">HTMLPurifier</span> </h3><span class="totop"><a href="#peak">(to top)</a></span><br style="clear: both;" /> <br /> -  The HTMLPurifier PHP library by Edward Yang is a very good HTML filtering script that uses object oriented PHP code. Compared to htmLawed, it:<br /> +  The HTMLPurifier PHP library by Edward Yang is a very good HTML filtering script that uses object oriented PHP code. Compared to htmLawed, it (as of mid-2009):<br /> <br />   *  does not support PHP versions older than 5.0 (HTMLPurifier dropped PHP 4 support after version 2)<br /> <br /> @@ -1970,7 +2057,7 @@ A PHP Labware internal utility - <a href="http://www.bioinformatics.org/phpl </div> </div> <br /> -<hr /><br /><br /><span class="subtle"><small>HTM version of <em><a href="htmLawed_README.txt">htmLawed_README.txt</a></em> generated on 23 Apr, 2009 using <a href="http://www.bioinformatics.org/phplabware/internal_utilities">rTxt2htm</a> from PHP Labware</small></span> +<hr /><br /><br /><span class="subtle"><small>HTM version of <em><a href="htmLawed_README.txt">htmLawed_README.txt</a></em> generated on 22 Dec, 2009 using <a href="http://www.bioinformatics.org/phplabware/internal_utilities">rTxt2htm</a> from PHP Labware</small></span> </div><!-- ended div body --> </div><!-- ended div top --> </body> |