Reply to this topic  Start new topic
> BB-Code Parser
CSM
post Jul 31 2009, 03:43 PM
Post #1






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



It's more like "more than a month, less than a year" (and "semi-monthly" doesn't make a good post title) ... but anyway ...

I tend to do something because I think it'd be neat or worth the result, and this time I was too lazy to shop around and find the best (or just a good) BB-Code parser in PHP for a website I'm writing. So I wrote one myself, which proves I'm susceptible to NIH ("Not Invented Here", for those that don't know). It took two different approaches to parsing, but I finally got something agreeable.

I thought I'd share it. Since BB-Code isn't well defined, and certain places (like forums) might have codes not used elsewhere (I believe IPB allows making custom codes), I tried to make it as flexible as possible. In that light, this parser (well, more like formatter) should be able to handle quite a range of custom codes beyond the extensive set of defaults, as well as other not-quite-BB-Code formats (ex: parsing for :smile:). I also included a few parsing instructions too, so that certain parser behavior can be influenced. The parser code itself is designed to not have any special cases, so that any feature that might have been a special case, if we restricted ourselves to a small subset of possible BB-Codes, can now be available as a feature for custom codes. When I wrote all the default code implementations, this seems to be pretty true, as only the short form of the [url] code isn't workable.

I've uploaded it to my server so it can be viewed as text here:
BB-Code Parser Source
(Someday I'll write a "front end" to that temp location so source code can be viewed with syntax highlighting)

Sorry about the length, but I think just over 1/3 of the file is taken up by default implementations of codes. There's also a Tokenizer class I don't use, but left in there just in case (as it could be used for innovative things like tokenizing an argument that's comma delimited, etc).



I may eventually move it to a different directory and post it on my website, but who knows ... I still have yet to add Block Puzzler to my website ....




A result of starting my server over, links from my posts may not work (especially those in the "temp" subdomain). If there is a link to something of which anyone would like to have a copy, personal message me with what you're looking for along with a way to provide this to you, and I'll see if I can find a copy. Thanks for your patience and understanding.


IPB Image - "Not just another open source project. Lend your talent and make a difference!" (Dead)

IPB Image - "The future is now." (No longer community site) (Domain has lapsed)

Published: AtomicComicBlast, Barra de Lenguas, ComicWizard-4.0, MicroColors, PassGen, ScrabbleChecker, SoundBank, Uni, VisualWidget, WarpedReality
Unavailable: Paradigm [clock], Puzzled, SecurityLogger, Wayback Widget
Ready to be published: Cαlcυlατοr, CursorTails, Blackout, Block Puzzler, BombSquad, Palette, SnipIt
ActiveDev:
InactiveDev/Dead: BeatMod, Bubble Pop, Canvas Clock, Canvas Gauges, Canvas Pro, Clipboard, Crayon, Hermes, InTune, Konverter, Magic Deck, OverRuled, Outside, Slither, SystemBeat, Tetresque, Tetrad, Widget, WinSysRemote
Dropped: BlankScreen, Document "Fixer", Intuitive [ -> Blackout], Motion Widget: HHGTTG
CoDevelopment: Atmosphere, Block Puzzler,
BombSquad
Miniature Scripts: BinarySearchTree, Calendar, Canvas Gears, Checkbox, File-Browser, LinkedList/Stack, MDI Setup, MiniMax AI, PieGraph, ProgressBar, Slider widget, TabbedPane, Table, Tokenizer, TreeMenu
Java+: Java Music Daemon, ScreenCapture JAR, Widget-Java/Server Bridge Example
"Published" Texts: DynamicWidgetGuide
Konfabulator Libraries: Color-space Library, Javable Widget Project
Widget Tutorials: "Spawning" Widgets, JavaScript Classes
Contests: Widget 4k - "Expanded" [not happening; canceled]
Non-Widget Work: Hazlenut, Konspirators Online, PHP BB-Code Parser, ShortClient, Zap
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Aug 2 2009, 03:41 AM
Post #2






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



If any of you got an "early" copy, I updated it. I added support for [ left ] and [ right ] tags, improved validation of [color] arguments*, and fixed issues with using undefined map keys and variables.

* Before, anything was allowed. Now I check against all cross-browser supported "color names" and ensure it's a valid hex color or rgb[a], or hsl[a] color (the latter four can be optionally turned on and off).

PHP has a level-based error reporting and I don't usually change it, so things that generate warnings escape below my radar (which is bad for things I am putting out for others to use). I was informed that trying to read an undefined map key will produce a warning (but yet return null, giving the sense that everything is working), and it also generates a warning about undefined variables but also implicitly works if you access one. Blah ... not what I expected. I think I like JS's approach better here: undefined variable access = error (stop execution), undefined map key access = return undefined. Then again, PHP has no undefined value ...

Anyway ... I hope the update is useful.




A result of starting my server over, links from my posts may not work (especially those in the "temp" subdomain). If there is a link to something of which anyone would like to have a copy, personal message me with what you're looking for along with a way to provide this to you, and I'll see if I can find a copy. Thanks for your patience and understanding.


IPB Image - "Not just another open source project. Lend your talent and make a difference!" (Dead)

IPB Image - "The future is now." (No longer community site) (Domain has lapsed)

Published: AtomicComicBlast, Barra de Lenguas, ComicWizard-4.0, MicroColors, PassGen, ScrabbleChecker, SoundBank, Uni, VisualWidget, WarpedReality
Unavailable: Paradigm [clock], Puzzled, SecurityLogger, Wayback Widget
Ready to be published: Cαlcυlατοr, CursorTails, Blackout, Block Puzzler, BombSquad, Palette, SnipIt
ActiveDev:
InactiveDev/Dead: BeatMod, Bubble Pop, Canvas Clock, Canvas Gauges, Canvas Pro, Clipboard, Crayon, Hermes, InTune, Konverter, Magic Deck, OverRuled, Outside, Slither, SystemBeat, Tetresque, Tetrad, Widget, WinSysRemote
Dropped: BlankScreen, Document "Fixer", Intuitive [ -> Blackout], Motion Widget: HHGTTG
CoDevelopment: Atmosphere, Block Puzzler,
BombSquad
Miniature Scripts: BinarySearchTree, Calendar, Canvas Gears, Checkbox, File-Browser, LinkedList/Stack, MDI Setup, MiniMax AI, PieGraph, ProgressBar, Slider widget, TabbedPane, Table, Tokenizer, TreeMenu
Java+: Java Music Daemon, ScreenCapture JAR, Widget-Java/Server Bridge Example
"Published" Texts: DynamicWidgetGuide
Konfabulator Libraries: Color-space Library, Javable Widget Project
Widget Tutorials: "Spawning" Widgets, JavaScript Classes
Contests: Widget 4k - "Expanded" [not happening; canceled]
Non-Widget Work: Hazlenut, Konspirators Online, PHP BB-Code Parser, ShortClient, Zap
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Aug 5 2009, 02:39 PM
Post #3






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



After loads of micro-edits and changes, I have a JS version. I don't guarantee that it will work exactly the same as the PHP version, but it should be very close. I started with the PHP code and converted all operators and reserved words (class, public, etc), and language features to JS. After fixing many issues little issues (things I missed before), I then came back to it and JSitized it again to convert the remaining PHP functions which had a direct JS equivalent. As a result, there will be some things which will seem a bit odd (JSitized PHPisms). A good example is default arguments. I converted them to an if statement where they were actually checked against in the code, and removed them everywhere else. Also, some functions which don't have a direct JS equivalent were changed to point to a compatibility function which emulates as best as possible the PHP version.

In doing the conversions, I found one or two things that could be improved with the original PHP version (an unexpected bonus!)

Given that the PHP one is the original, and I don't really want to be bound by maintaining two versions of the same code, this one probably won't be updated in response to changes in the PHP source. However, the license is pretty free and you're welcome to hack it up smile.gif I also won't mind taking in and fixing bugs in the JS version, because I haven't tested it thoroughly.

The parser does output HTML by default, but it's not hard to make your own formatters. So I shouldn't think it hard to create one that outputs XML which can be saved and loaded via Widget.createWindowFromXML(*) ... (though I'd probably think you're a bit crazy if you build a Widget GUI around a custom BB-Code).

JS BB-Code Parser Source



P.S.: I suppose this thread could get moved to the Widget Classes category now, as there's a JS class/library posted now (which was not the original intent) ... but the post title doesn't fit. Oh well. Anyway ...
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jan 6 2010, 02:15 AM
Post #4






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



For those actually using this or even paying attention to this thread, I reuploaded the files and updated them. The smart thing to do when pointing someone else at them is not to read them over again (just because), I found several bugs and many not-quite-right comments that way. They're all fixed now.

The bad comments were caused by the usual: updating the code but neglecting the comments. The actual bugs were a few spots where the code still assumed that "[" and "]" would always be the start and ending codes (not true after I added the ability to change them).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jan 7 2010, 11:39 PM
Post #5






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



I fixed two bugs and some documentation errors in the JS version of the library. The first was two bad "this." (left over from the conversion from PHP, I assume) which caused an error when the parser encountered an unordered list bb-code. The documentation errors were in the code samples in the beginning and involved the improper syntax for the instantiation of arrays and objects. I also added the "var" keyword where appropriate.

To keep track of changes, I added a date and a URL to the locations of the files in the file "header"s. This should make it easier to determine whether it has updated since the current local copy was obtained, and where to check/get a copy. I will update the previous posts' links to point to the right files immediately after I post this.

I should also note that while it is currently possible to link to the parser file from other websites/HTML files, I please ask that you do not. I do not wish to have to add something like json.org/json2.js and its alert dialog. Thank you.

-- Edit --
I forgot to mention previously, that I had also updated the license for both copies. It is now purely a 3-clause ("modified") BSD license. Before, it was similar to the 3-clause BSD license, but with different wording and conditions for clause 2.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jan 12 2010, 12:46 AM
Post #6






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



I hadn't given it much thought before, because my original thought was "it should just work", however, this parser may not be compatible with itself. What I mean is this: a typical usecase might be to setup an instance of the parser capable of handing : smile: -type codes, and another to handle standard bracket-type codes. The colon parser would be run first, and the output of that would be fed into the second parser to catch all bracket-based codes. This won't work currently (I believe) because all input that is deemed "content" will be properly escaped. So if the first parser outputs HTML, the second parser will escape the HTML output of the first one. Therefore, an option might be necessary to disable escaping so the parser can be used in chained instances such as in the previous example. If I find time, I'll evaluate this and see if a no-escape option is feasible (it should be).

-- Edit --
I should also note that turning off escaping only for the second parser does not normally open up the resulting output to "injections" (unwanted HTML that makes it unescaped into the output).* This is because the first parser will handle escaping what it considers to be "content" while leaving the codes for the second parser alone (as long as those codes aren't composed of characters escaped by that parser).

* I have to qualify this statement because if two parsers with escaping turned off are chained together in the manner described above, it will be possible for unintended HTML to make it through. Also, since this parser allows a different set of bb-code implementations to be used instead of/with the default, it is possible that set of bb-codes won't properly escape input (HTML or otherwise).

-- Edit --
sad.gif I only noticed today (Jan. 13, 2009) how badly the JavaScript version is leaking variables. They were all used in such a way (as far as I could tell) changing them would not affect any parser, and their existence outside the parser would not affect other parser instances. At worst case, they would overwrite any existing variables on the outside with the same name. Basically the variables in the PHP were automatically local to the functions they were used in due to language scoping rules. When I converted it to the JS version, I guess more than a few variables didn't get the proper "var" declaration they needed.

While in there, I also noticed variables supposed to be private to the parser were in fact public (via assignment to "this.property"). This was a result of PHP's "this" pointing to the scope enclosed by the class instance, whereas JavaScript's "this" pointing to the class instance itself. I forgot about the difference when converting the constructor and format function code.

I'll have a fix up for both of these issues before the end of the day (Eastern Standard Time).
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jan 13 2010, 11:50 PM
Post #7






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



Alright, fix is up. I believe I fixed all leaky variables (accidentally public and non-scoped). I did this by recording all the variables in the global "this" and comparing the list to the variables in it after running the parser. Any differences I checked out and fixed. Hopefully I got them all.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jan 15 2010, 04:15 AM
Post #8






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



After taking a look at it, it appears that I already have quite a few arguments to the parser, and to add one more for disabling escaping might be one too many. I'm thinking I should possibly convert them to a "settings map" (independent of the that's passed to all bb-code implementations during parsing) to handle the growing set of parameters.

Also, a new version of both are up. I changed the comments toward the top in the JS version to match the PHP version, and I removed the array_dump function and BBCode_Tokenizer class from the PHP version as they were not being used. The JS version had already axed both.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Feb 7 2010, 04:17 AM
Post #9






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



Fixed a bug in the JS version that would cause issues when using a color tag. New version is up.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Feb 7 2010, 08:07 AM
Post #10






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



Added control over escaping of output to both the PHP and JS version. It can now be disabled. I also removed some of the more immutable options from being able to be changed per format call. I figured changing things such as the start and end codes per call wouldn't be all that useful if the code implementations themselves couldn't be changed also. I also realize that it might be useful to change the allowed codes per parse call, and that might be fixed in a future release.

The addition of disabling escaping of content output enables the parser to be used in a "two pass" system wherein the first parser parses one type of codes, and the second a different type. If escaping isn't disabled on the second parser, it may escape the output from the first parser. A few bugs were also squashed because of this relating to the creation of a default global code and determining whether or not enough codes are provided when format is called.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post Jun 10 2010, 05:45 AM
Post #11






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



If anyone still reads this/cares ...

Updated to fix a few bugs I noticed in both versions. Bugs fixed:
  • JS Error when using same start and end character and having an unclosed code for a given code that requires one.
    • May be triggered only if the last code is unclosed, or if there's only one start code in the input, didn't test that before fixing.
  • JS Error when using the same start and end character and having an incomplete code in the input (start character but no matching end character) and it is not the last start/end character in the input.
    • This previously only worked for non-same start and end characters.
There is still a "repeat" bug, where if a code is incomplete and the last code in the input, the parser will add the broken code (and anything following it) to the end of the input again before returning it.

This project should really be revisioned/in a repository, and have test cases to prevent regression problems. However at this point, and given BB-Code + what the parser can do, writing test cases for this is (personally) kinda insurmountable. Something to consider for the future is profiling, to see if anything can be improved in the speed department.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
CSM
post May 28 2012, 08:54 AM
Post #12






Posts: 2,386
Joined: 1-September 06
From: ̶O̶h̶i̶o̶ Washington
Member No.: 16,587



This forum will disappear later this year, but an update can't hurt. This project moved to a public repository hosted by bitbucket.
User is offlineProfile CardPM
Go to the top of the page
+Quote Post
  Reply to this topic    Start new topic  
2 User(s) are reading this topic (2 Guests and 0 Anonymous Users)
0 Members: