=head1 NAME Html2Wml -- Program that can convert HTML pages to WML pages =for html =head1 SYNOPSIS Html2Wml can be used as either a shell command: $ html2wml file.html or as a CGI: /cgi-bin/html2wml.cgi?url=/index.html In both cases, the file can be either a local file or a URL. =head1 DESCRIPTION Html2Wml converts HTML pages to WML decks, suitable for being viewed on a Wap device. The program can be launched from a shell to statically convert a set of pages, or as a CGI to convert a particular (potentially dynamic) HTML resource. Althought the result is not guarantied to be valid WML, it should be the case for most pages. Good HTML pages will most probably produce valid WML decks. To check and correct your pages, you can use W3C's softwares: the I, available online at http://validator.w3.org and I, written by Dave Raggett. Html2Wml provides the following features: =over 4 =item * translation of the links =item * limitation of the cards size by splitting the result into several cards =item * inclusion of files (similar to the SSI) =item * compilation of the result (using the WML Tools, see L<"LINKS">) =item * a debug mode to check the result using validation functions =back =head1 OPTIONS Please note that most of these options are also available when calling Html2Wml as a CGI. In this case, boolean options are given the value "1" or "0", and other options simply receive the value they expect. For example, C<--ascii> becomes C or C. See the file F for an example on how to call Html2Wml as a CGI. =head2 Conversion Options =over 4 =item -a, --ascii When this option is on, named HTML entities are converted to US-ASCII characters using the same 7 bit approximations as Lynx. For example, C<©> is translated to "(c)", and C<ß> is translated to "ss". This option is off by default. =item --collapse, --nocollapse This option tells Html2Wml to collapse redundant whitespaces, tabulations, carriage returns, lines feeds and empty paragraphs. The aim is to reduce the size of the WML document as much as possible. Collapsing empty paragraphs is necessary for two reasons. First, this avoids empty screens (and on a device with only 4 lines of display, an empty screen can be quite ennoying). Second, Html2wml creates many empty paragraphs when converting, because of the way the syntax reconstructor is programmed. Deleting these empty paragraphs is necessary like cleaning the kitchen :-) If this really bother you, you can desactivate this behaviour with the B<--nocollapse> option. =item -c, --compile Setting this option tells Html2Wml to use the compiler from WML Tools to compile the WML deck. If you want to create a real Wap site, you should seriously use this option in order to reduce the size of the WML decks. Remember that Wap devices have very little amount of memory. If this is not enought, use the splitting options. =item --ignore-images This option tells Html2Wml to completly ignore all image links. =item --img-alt-text, --noimg-alt-text This option tells Html2Wml to replace the image tags with their corresponding alternative text (as with a text mode web browser). This option is on by default. =item --linearize, --nolinearize This option is on by default. This makes Html2Wml flattens the HTML tables (they are linearized), as Lynx does. I think this is better than trying to use the native WML tables. First, they have extremely limited features and possibilities compared to HTML tables. In particular, they can't be nested. In fact this is normal because Wap devices are not supposed to have a big CPU running at some zillions-hertz, and the calculations needed to render the tables are the most complicated and CPU-hogger part of HTML. Second, as they can't be nested, and as typical HTML pages heavily use imbricated tables to create their layout, it's impossible to decide which one could be kept. So the best thing is to keep none of them. B<[Note]> Although you can desactivate this behaviour, and although there is internal support for tables, the unlinearized mode has not been heavily tested with nested tables, and it may produce unexpected results. =item -n, --numeric-non-ascii This option tells Html2wml to convert all non-ASCII characters to numeric entities, i.e., C<©> becomes C<©>, and C<ß> becomes C<ß>. By default, this option is off. =item -p, --nopre This options tells Html2Wml not to use the CpreE> tag. This option was added because the compiler from WML Tools 0.0.4 doesn't support this tag. =item -o, --output Use this option (in shell mode) to specify an output file. By default, Html2Wml prints the result to standard output. =back =head2 Link Reconstruction Options =over 4 =item --hreftmpl=I