non optimal grabbing of author names from webpages

Hello!

I just tried to grab some references from a webpage, and it worked fine apart from one thing:

The author names in this list all ended with a number "put high up" (sorry, I don't know the proper english word for this, but I mean digits written the way you write "a power of two" in mathematics). These numbers are also grabbed as part of the author names. Not good. I guess a general rule "remove any number from author names" would work for this?

Have a very nice day!

//FTC

This is an old discussion that has not been active in a long time. Instead of commenting here, you should start a new discussion. If you think the content of this discussion is still relevant, you can link to it from your new discussion.