
java.util.regex package will have the regex stuff. I've never used regex in java but i doubt that it'll be different. Perl and PHP is the best for that.
this expression will remove the HTML tags as far as I know
s/<(.*?)>//gi
It will match something that starts with < and then has anything one or more times and then ends with > and ignores case
probably not the best way because it'll grab other stuff as well.