We recently used CodeKicker.BBCode in a work project, and part of the licence agreement is a post on an employee’s personal blog – voila!
Except that’s not much fun, so let’s detail some of the modifications we made to it and released to GitHub.
First things first
There’re loads of BBCode parsers out there that suffer from a variety of problems, and the CodeKicker implementation seemed to be the closest to a ‘working out-the-box’ solution, but we still needed to fix a few bugs and behaviours we weren’t happy with.
[code] tags keep parsing their contents
Say you want to post a code snippet of a for-loop to a forum:
[code]
for (int i = 0; i < arr.Length; i++)
{
arr_copy[i] = arr[i];
}
[/code]
We’d expect the for-loop to render maybe in pre tags:
for (int i = 0; i < arr.Length; i++)
{
arr_copy[i] = arr[i];
}
Instead the parser sees those two array accesses and interprets them as italic tags:
for (int i = 0; i < arr.Length; i++)
{
arr_copy = arr[i]; }
So – we want [code] tags to cause the parser to stop processing until it finds a [/code] tag.
Can’t have spaces in tag attributes
If you’re quoting a member of a forum where everyone has a username, the following syntax would do the trick:
[quote=pablissimo]Hi there![/quote]
If however your username has a space in it, or your forum uses real names then you’re in trouble:
[quote=Paul O’Neill]Hi there![/quote]
In this instance, the CodeKicker parser sees the quote tag as having a single default attribute with value ‘Paul’, and disregards the ‘O’Neill’ part. Bummer.
Whitespace handling requires the user to know too many implementation details to format a post
When you’re mixing tags in a longer forum post, you might expect whitespace to be largely ignored if it’s just for the purposes of laying out your BBCode in an understandable fashion. For example, given the following:
[list]
[*]Here's the first item
[*]Here's the second item
[/list]
And here's the code snippet:
[code]
var frob = MakeFrob();
[/code]
It's that easy!
We might expect that:
- The first list item isn’t preceded by a newline just because it’s on a different line to the opening [list] declaration
- There’s no extra new-line after the last item in the list
- There’s no extra new-line after the code tag
Out of the box this isn’t the case, and it can mean that getting a sensible-formatted output means you have to write some pretty horrible-looking BBCode.
The fixes
Stopping [code] tags from parsing BBCode contained within
First we added a property to the BBTag class that lets us specify ‘StopProcessing’ behaviour, defaulting to off – we can turn this on for [code] tags and it gives us the flexibility to introduce a [noparse] tag if we desire.
Now the parser’s stack-based, so to implement the change all we need to do is never match starting tags if the node on the top of the stack is marked ‘StopProcessing’. This’ll make us parse everything between the [code] and [/code] tags as plain text, ignoring anything that happens to look like an [i] or a [u] or anything else.
Allowing spaces within tag attributes
We added another new property to BBTag, ‘GreedyAttributeProcessing’, that again defaults to false. When true, we’ll have the parser assume that there’ll only ever be zero or one attributes against the tag, and that the entire text up until the closing square bracket of the opening tag is the value of the attribute. In our quote example above, we go from the old behaviour (underlined means ‘parsed as attribute value’):
[quote=Paul O’Neill]Hi there![/quote]
to something more sensible in an opt-in fashion:
[quote=Paul O’Neill]Hi there![/quote]
We extend the signature of the BBCodeParser.ParseAttributeValue method to accept a boolean parameter signifying that it should consume all other text in the opening tag as the value of the attribute – we’ll give it a default value of false. Then in ParseTagStart we pass in the value of the current tag’s GreedyAttributeProcessing value into ParseAttributeValue.
Finally, we modify ParseAttributeValue to change the set of characters that it considers ends an attribute value, from {<space>, opening square bracket, closing square bracket} to just {opening square bracket, closing square bracket}. Spaces are now up for grabs as attribute values!
Fixing whitespace handling
We’ll tackle this in two parts. First, we want to suppress the first newline that follows an opening tag – like when the [list] tag and its first [*] item appear on different lines, we probably don’t want an extra newline in there.
We’ll add an extra call to ParseWhitespace near the end of ParseTagStart – this will consume all whitespace characters before we return to the parsing loop so that they’re not interpreted as text nodes.
Next, we tackle the problem that some tags naturally demand a newline after them when we’re composing our message (again, [/list] but also [/code]) to keep it readable while editing, but that don’t need that newline to show in rendered output.
We’ll add another new property to BBTag, ‘SuppressFirstNewlineAfter’ that does what it says on the tin. We’ll set this to false by default, and true on tags that are ‘block-type’ – [list] and [code] being two clear examples.
We’ll then modify the parser again – in ParseTagEnd, we’ll see if the tag we’re closing has the SuppressFirstNewlineAfter attribute set and then consume all whitespace up to and including the first newline after the tag closes so that it doesn’t end up in the output. We’ll do that by adding a new method similar to ParseWhitespace called ParseLimitedWhitespace that takes a parameter for the maximum number of newlines to consume:
static bool ParseLimitedWhitespace(string input, ref int pos, int maxNewlinesToConsume)
{
int end = pos;
int consumedNewlines = 0;
while (end < input.Length && consumedNewlines < maxNewlinesToConsume)
{
char thisChar = input[end];
if (thisChar == '\r')
{
end++;
consumedNewlines++;
if (end < input.Length && input[end] == '\n')
{
// Windows newline - just consume it
end++;
}
}
else if (thisChar == '\n')
{
// Unix newline
end++;
consumedNewlines++;
}
else if (char.IsWhiteSpace(thisChar))
{
// Consume the whitespace
end++;
}
else
{
break;
}
}
var found = pos != end;
pos = end;
return found;
}
Perfect. One final thing to do – make sure that all newlines that escape the above treatment get converted into <br /> tags as appropriate. We can do this in TextNode’s ToHtml method by tacking a simple Replace(“\n”, “<br />”) onto the end.