April, 2010
Converting HTML Anchor and Image Tags
To ASP.NET Controls
I was recently tasked with converting a large web site from pure XHTML to ASP.NET.
For the most part, the conversion was pretty straight-forward, but I discovered
pretty quickly that converting all of the anchor and image tags to their corresponding
ASP.NET equivalents was going to be a time-consuming, mundane, and downright excruciating
task. While an argument could be made that no conversion is necessary because, of
course, HTML anchor and image tags will work just fine in ASP.NET, I wanted to employ
the "~" symbol so that the site would work in an ASP.NET Development Server
environment. To do so, the ASP.NET controls are required.
This article contains a C# code example of how to:
- convert anchors to asp:Hyperlink
- convert image tags to asp:Image
- convert input tags to asp:Textbox
- use regular expressions to parse and replace text
Before I continue, a little disclaimer: I'm not going to spend any time talking
about how I loaded my content because that would an irrelevant rabbit trail for
most readers. It's sufficient to say that I loaded it into a string variable called
content.
To begin, we need a method that will make the conversion from an HTML anchor tag
to an ASP.NET hyperlink control. This function will take a Match
object as an argument in order to work with the Regex.Replace
function (more about that later). Here's the function:
static int _a = 1;
static RegexOptions _options = RegexOptions.IgnoreCase | RegexOptions.Multiline
| RegexOptions.Singleline;
static string ReplaceAnchorMatch(Match m)
{
string aspAnchor =
@"<asp:HyperLink ID=""{0}"" runat=""server"" "
+ @"NavigateUrl=""{1}"" Text=""{2}""{3}></asp:HyperLink>";
string id = "hl" + (_a++).ToString();
string href = string.Empty;
string target = @" Target=""_blank""";
string text = string.Empty;
if (m.Groups[1] != null)
{
Regex rHref = new Regex(@"href=[\""']([^(\""')]*)[\""']", _options);
href = rHref.Match(m.Groups[1].Value).Groups[1].Value;
if (href.StartsWith("/"))
href = href.Insert(0, "~");
Regex rId = new Regex(@"id=[\""']([^(\""')]*)[\""']", _options);
if (rId.Match(m.Groups[1].Value).Success)
id = rId.Match(m.Groups[1].Value).Groups[1].Value;
if (!m.Groups[1].Value.ToLower().Contains("target="))
target = string.Empty;
}
text = m.Groups[2].Value.Replace(Environment.NewLine, string.Empty);
return (string.Format(aspAnchor, id, href, text, target));
}
A Match object, presumably a match for the
<a> tag, comes into the function. The method then parses out
href and id attributes. If no id
attribute is found, the default (driven by the _a variable)
is used. If no target attribute is discovered, the default
target variable is emptied. I realize that there are many
other options for target, but _blank
handled 100% of my cases. You may need a more comprehensive solution.
Following the same pattern, the function for <img>
tags is:
static int _i = 1;
static string ReplaceImageMatch(Match m)
{
string aspImage = @"<asp:Image ID=""{0}"" runat=""server"" "
+ @"ImageUrl=""{1}"" AlternateText=""{2}"" />";
string id = "img" + (_i++).ToString();
string url = string.Empty;
string altText = string.Empty;
if (m.Groups[1] != null)
{
Regex rId = new Regex(@"id=[\""']([^(\""')]*)[\""']", _options);
if (rId.Match(m.Groups[1].Value).Success)
id = rId.Match(m.Groups[1].Value).Groups[1].Value;
Regex rSrc = new Regex(@"src=[\""']([^(\""')]*)[\""']", _options);
if (rSrc.Match(m.Groups[1].Value).Success)
url = rSrc.Match(m.Groups[1].Value).Groups[1].Value;
if (url.StartsWith("/"))
url = url.Insert(0, "~");
Regex rAlt = new Regex(@"alt=[\""']([^(\""')]*)[\""']", _options);
if (rAlt.Match(m.Groups[1].Value).Success)
altText = rAlt.Match(m.Groups[1].Value).Groups[1].Value;
}
return string.Format(aspImage, id, url, altText);
}
And, the function for <input> tags is:
static int _t = 1;
static string ReplaceTextboxMatch(Match m)
{
string aspTextbox =
@"<asp:TextBox ID=""{0}"" runat=""server"" "
+ @"CssClass=""formtxt"" Text=""{1}"" Columns=""25""></asp:TextBox>";
string id = "txt" + (_t++).ToString();
string text = string.Empty;
if (m.Groups[1] != null)
{
Regex rValue = new Regex(@"value=[\""']([^(\""')]*)[\""']", _options);
if (rValue.Match(m.Groups[1].Value).Success)
text = rValue.Match(m.Groups[1].Value).Groups[1].Value;
}
return (string.Format(aspTextbox, id, text));
}
Finally, three simple functions will allow us to identify and convert the tags we
find:
static string ConvertImages(string content)
{
Regex r = new Regex("<img ([^>]*)/?>(</img>)?", _options);
return r.Replace(content.Trim(), ReplaceImageMatch);
}
static string ConvertAnchors(string content)
{
Regex r = new Regex("<a ([^>]*)>([^<]*)</a>", _options);
return r.Replace(content.Trim(), ReplaceAnchorMatch);
}
static string ConvertTextBoxes(string content)
{
Regex r = new Regex("<input ([^>]*)/?>(</input>)?", _options);
return r.Replace(content.Trim(),ReplaceTextboxMatch);
}
I trust that you're able to construct a control program that makes use of these
three functions, but if not, you can
download an example HTML to ASP.Net converter C# solution for Visual Studio 2008. Simply pass in your HTML
content and get back a string that has the corresponding ASP.NET controls in place
of the anchor and image tags. One thing to note: the style
attribute has no corresponding ASP.NET attribute, so it (and other attributes, if
they exist) will be lost in the conversion. Hopefully, you won't have any trouble
enhancing this code to accomodate your specific needs.
In summary, this little code snippet saved me a couple hundred man-hours this year;
maybe it'll be a help to you.
Your feedback helps me provide content that is meaningful. Please don't
be shy about speaking your mind.
Please provide feedback about this article
Email:
(optional)