13

Removing all except couple of html tags from html file

view full story
linux-howto

http://www.unix.com – I tried to find elegant (or at least simple) way to remove all but couple of html tags from html file, but all examples I found dealt with removing all the tags. The logic of the script would be: - if there is <li> or <ul> on the line, do nothing (=write same line to output) - if there is: font class="titleA" substitute it with: <h2> - otherwise if there is html tag, remove it (=write the lines to output without tags, just content) Could please someone tell me how to approach this problem? I know some perl but my skills are rusty (years from last time I used perl) (HowTos)