Effort has been made to keep the regex as language-agnostic as possible. This can be modified depending on the need.
The RegEx is as follows:
(([a-zA-Z0-9_-]+)([\.\+]?[a-zA-Z0-9_-]+)*)\@([a-zA-Z0-9]+[\.\-])+([a-zA-Z]{2,5})
Explanation
While the RegEx looks overly complicated and verbose, it can be broken down to simpler units as follows:
- ([a-zA-Z0-9_-]+) : A word containing one or more characters that can be alphanumeric, as well as be and ‘underscore’ or a ‘hyphen’.
- ([\.\+]?[a-zA-Z0-9_-]+)* : This is a repeat of the the first part, prefixed with a ‘dot’ or a ‘plus’ sign. This block can be repeated any number of times, including zero.
- \@ : The ‘at’ symbol.
- ([a-zA-Z0-9]+[\.\-])+ : The domain name, can contain ‘hyphens’ or ‘dots’, but not both consecutively.
- ([a-zA-Z]{2,5}) : The TLD. Change this as per requirement.
References
The Wikipedia entry on email addresses gives us the acceptable formats.