mirror of
https://github.com/percona/percona-toolkit.git
synced 2025-09-28 17:15:44 +00:00
War on typos Act 3 (#662)
* Copy NaturalDocs v1.52 from original ZIP UTF-8 encoded LF line ends * Fix Logo.png
This commit is contained in:
File diff suppressed because it is too large
Load Diff
@@ -1,83 +1,83 @@
|
||||
|
||||
Architecture: File Parsing
|
||||
|
||||
####################################################################################
|
||||
|
||||
This is the architecture and code path for general file parsing. We pick it up at <NaturalDocs::Parser->Parse()> because we're not interested in how the files are gathered and their languages determined for the purposes of this document. We are just interested in the process each individual file goes through when it's decided that it should be parsed.
|
||||
|
||||
|
||||
|
||||
Stage: Preparation and Differentiation
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
<NaturalDocs::Parser->Parse()> can be called from one of two places, <NaturalDocs::Parser->ParseForInformation()> and <NaturalDocs::Parser->ParseForBuild()>, which correspond to the parsing and building phases of Natural Docs. There is no noteworthy work done in either of them before they call Parse().
|
||||
|
||||
|
||||
Stage: Basic File Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
The nitty-gritty file handling is no longer done in <NaturalDocs::Parser> itself due to the introduction of full language support in 1.3, as it required two completely different code paths for full and basic language support. Instead it's handled in NaturalDocs::Languages::Base->ParseFile(), which is really a virtual function that leads to <NaturalDocs::Languages::Simple->ParseFile()> for basic language support or a version appearing in a package derived from <NaturalDocs::Languages::Advanced> for full language support.
|
||||
|
||||
The mechinations of how these functions work is for another document, but their responsibility is to feed all comments Natural Docs should be interested in back to the parser via <NaturalDocs::Parser->OnComment()>.
|
||||
|
||||
|
||||
Stage: Comment Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
<NaturalDocs::Parser->OnComment()> receives the comment sans comment symbols, since that's language specific. All comment symbols are replaced by spaces in the text instead of removed so any indentation is properly preserved. Also passed is whether it's a JavaDoc styled comment, as that varies by language as well.
|
||||
|
||||
OnComment() runs what it receives through <NaturalDocs::Parser->CleanComment()> which normalizes the text by removing comment boxes and horizontal lines, expanding tabs, etc.
|
||||
|
||||
|
||||
Stage: Comment Type Determination
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
OnComment() sends the comment to <NaturalDocs::Parser::Native->IsMine()> to test if it's definitely Natural Docs content, such as by starting with a recognized header line. If so, it sends it to <NaturalDocs::Parser::Native->ParseComment()>.
|
||||
|
||||
If not, OnComment() sends the comment to <NaturalDocs::Parser::JavaDoc->IsMine()> to test if it's definitely JavaDoc content, such as by having JavaDoc tags. If so, it sends it to <NaturalDocs::Parser::JavaDoc->ParseComment()>.
|
||||
|
||||
If not, the content is ambiguous. If it's a JavaDoc-styled comment it goes to <NaturalDocs::Parser::Native->ParseComment()> to be treated as a headerless Natural Docs comment. It is ignored otherwise, which lets normal comments slip through. Note that it's only ambiguous if neither parser claims it; there's no test to see if they both do. Instead Natural Docs always wins.
|
||||
|
||||
We will not go into the JavaDoc code path for the purposes of this document. It simply converts the JavaDoc comment into <NDMarkup> as best it can, which will never be perfectly, and adds a <NaturalDocs::Parser::ParsedTopic> to the list for that file. Each of those ParsedTopics will be headerless as indicated by having an undefined <NaturalDocs::Parser::ParsedTopic->Title()>.
|
||||
|
||||
|
||||
Stage: Native Comment Parsing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
At this point, parsing is handed off to <NaturalDocs::Parser::Native->ParseComment()>. It searches for header lines within the comment and divides the content into individual topics. It also detects (start code) and (end) sections so that anything that would normally be interpreted as a header line can appear there without breaking the topic.
|
||||
|
||||
The content between the header lines is sent to <NaturalDocs::Parser::Native->FormatBody()> which handles all the block level formatting such as paragraphs, bullet lists, and code sections. That function in turn calls <NaturalDocs::Parser::Native->RichFormatTextBlock()> on certain snippets of the text to handle all inline formatting, such as bold, underline, and links, both explicit and automatic.
|
||||
|
||||
<NaturalDocs::Parser::Native->ParseComment()> then has the body in <NDMarkup> so it makes a <NaturalDocs::Parser::ParsedTopic> to add to the list. It keeps track of the scoping via topic scoping, regardless of whether we're using full or basic language support. Headerless topics are given normal scope regardless of whether they might be classes or other scoped types.
|
||||
|
||||
|
||||
Group: Post Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
After all the comments have been parsed into ParsedTopics and execution has been returned to <NaturalDocs::Parser->Parse()>, it's time for some after the fact cleanup. Some things are done like breaking topic lists, determining the menu title, and adding automatic group headings that we won't get into here. There are two processes that are very relevant though.
|
||||
|
||||
|
||||
Stage: Repairing Packages
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
If the file we parsed had full language support, the <NaturalDocs::Languages::Advanced> parser would have done more than just generate various OnComment() calls. It would also return a scope record, as represented by <NaturalDocs::Languages::Advanced::ScopeChange> objects, and a second set of ParsedTopics it extracted purely from the code, which we'll refer to as autotopics. The scope record shows, purely from the source, what scope each line of code appears in. This is then combined with the topic scoping to update ParsedTopics that come from the comments in the function <NaturalDocs::Parser->RepairPackages()>.
|
||||
|
||||
If a comment topic changes the scope, that's honored until the next autotopic or scope change from the code. This allows someone to document a class that doesn't appear in the code purely with topic scoping without throwing off anything else. Any other comment topics have their scope changed to the current scope no matter how it's arrived at. This allows someone to manually document a function without manually documenting the class and still have it appear under that class. The scope record will change the scope to part of that class even if topic scoping did not. Essentially the previous topic scoping is thrown out, which I guess is something that can be improved.
|
||||
|
||||
None of this affects the autotopics, as they are known to have the correct scoping since they are gleaned from the code with a dedicated parser. Wouldn't there be duplication of manually documented code elements, which would appear both in the autotopics and in the comment topics? Yes. That brings us to our next stage, which is...
|
||||
|
||||
|
||||
Stage: Merging Auto Topics
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
As mentioned above, ParseFile() also returns a set of ParsedTopics gleaned from the code called autotopics. The function <NaturalDocs::Parser->MergeAutoTopics()> merges this list with the comment topics.
|
||||
|
||||
The list is basically merged by line number. Since named topics should appear directly above the thing that they're documenting, topics are tested that way and combined into one if they match. The description and title of the comment topic is merged with the prototype of the autotopic. JavaDoc styled comments are also merged in this function, as they should appear directly above the code element they're documenting. Any headerless topics that don't, either by appearing past the last autotopic or above another comment topic, are discarded.
|
||||
|
||||
|
||||
Stage: Conclusion
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Thus ends all processing by <NaturalDocs::Parser->Parse()>. The file is now a single list of <NaturalDocs::Parser::ParsedTopics> with all the body content in <NDMarkup>. If we were using <NaturalDocs::Parser->ParseForBuild()>, that's pretty much it and it's ready to be converted into the output format. If we were using <NaturalDocs::Parser->ParseForInformation()> though, the resulting file is scanned for all relevant information to feed into other packages such as <NaturalDocs::SymbolTable>.
|
||||
|
||||
Note that no prototype processing was done in this process, only the possible tranferring of prototypes from one ParsedTopic to another when merging autotopics with comment topics. Obtaining prototypes and formatting them is handled by <NaturalDocs::Languages::Simple> and <NaturalDocs::Languages::Advanced> derived packages.
|
||||
|
||||
Architecture: File Parsing
|
||||
|
||||
####################################################################################
|
||||
|
||||
This is the architecture and code path for general file parsing. We pick it up at <NaturalDocs::Parser->Parse()> because we're not interested in how the files are gathered and their languages determined for the purposes of this document. We are just interested in the process each individual file goes through when it's decided that it should be parsed.
|
||||
|
||||
|
||||
|
||||
Stage: Preparation and Differentiation
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
<NaturalDocs::Parser->Parse()> can be called from one of two places, <NaturalDocs::Parser->ParseForInformation()> and <NaturalDocs::Parser->ParseForBuild()>, which correspond to the parsing and building phases of Natural Docs. There is no noteworthy work done in either of them before they call Parse().
|
||||
|
||||
|
||||
Stage: Basic File Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
The nitty-gritty file handling is no longer done in <NaturalDocs::Parser> itself due to the introduction of full language support in 1.3, as it required two completely different code paths for full and basic language support. Instead it's handled in NaturalDocs::Languages::Base->ParseFile(), which is really a virtual function that leads to <NaturalDocs::Languages::Simple->ParseFile()> for basic language support or a version appearing in a package derived from <NaturalDocs::Languages::Advanced> for full language support.
|
||||
|
||||
The mechinations of how these functions work is for another document, but their responsibility is to feed all comments Natural Docs should be interested in back to the parser via <NaturalDocs::Parser->OnComment()>.
|
||||
|
||||
|
||||
Stage: Comment Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
<NaturalDocs::Parser->OnComment()> receives the comment sans comment symbols, since that's language specific. All comment symbols are replaced by spaces in the text instead of removed so any indentation is properly preserved. Also passed is whether it's a JavaDoc styled comment, as that varies by language as well.
|
||||
|
||||
OnComment() runs what it receives through <NaturalDocs::Parser->CleanComment()> which normalizes the text by removing comment boxes and horizontal lines, expanding tabs, etc.
|
||||
|
||||
|
||||
Stage: Comment Type Determination
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
OnComment() sends the comment to <NaturalDocs::Parser::Native->IsMine()> to test if it's definitely Natural Docs content, such as by starting with a recognized header line. If so, it sends it to <NaturalDocs::Parser::Native->ParseComment()>.
|
||||
|
||||
If not, OnComment() sends the comment to <NaturalDocs::Parser::JavaDoc->IsMine()> to test if it's definitely JavaDoc content, such as by having JavaDoc tags. If so, it sends it to <NaturalDocs::Parser::JavaDoc->ParseComment()>.
|
||||
|
||||
If not, the content is ambiguous. If it's a JavaDoc-styled comment it goes to <NaturalDocs::Parser::Native->ParseComment()> to be treated as a headerless Natural Docs comment. It is ignored otherwise, which lets normal comments slip through. Note that it's only ambiguous if neither parser claims it; there's no test to see if they both do. Instead Natural Docs always wins.
|
||||
|
||||
We will not go into the JavaDoc code path for the purposes of this document. It simply converts the JavaDoc comment into <NDMarkup> as best it can, which will never be perfectly, and adds a <NaturalDocs::Parser::ParsedTopic> to the list for that file. Each of those ParsedTopics will be headerless as indicated by having an undefined <NaturalDocs::Parser::ParsedTopic->Title()>.
|
||||
|
||||
|
||||
Stage: Native Comment Parsing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
At this point, parsing is handed off to <NaturalDocs::Parser::Native->ParseComment()>. It searches for header lines within the comment and divides the content into individual topics. It also detects (start code) and (end) sections so that anything that would normally be interpreted as a header line can appear there without breaking the topic.
|
||||
|
||||
The content between the header lines is sent to <NaturalDocs::Parser::Native->FormatBody()> which handles all the block level formatting such as paragraphs, bullet lists, and code sections. That function in turn calls <NaturalDocs::Parser::Native->RichFormatTextBlock()> on certain snippets of the text to handle all inline formatting, such as bold, underline, and links, both explicit and automatic.
|
||||
|
||||
<NaturalDocs::Parser::Native->ParseComment()> then has the body in <NDMarkup> so it makes a <NaturalDocs::Parser::ParsedTopic> to add to the list. It keeps track of the scoping via topic scoping, regardless of whether we're using full or basic language support. Headerless topics are given normal scope regardless of whether they might be classes or other scoped types.
|
||||
|
||||
|
||||
Group: Post Processing
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
After all the comments have been parsed into ParsedTopics and execution has been returned to <NaturalDocs::Parser->Parse()>, it's time for some after the fact cleanup. Some things are done like breaking topic lists, determining the menu title, and adding automatic group headings that we won't get into here. There are two processes that are very relevant though.
|
||||
|
||||
|
||||
Stage: Repairing Packages
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
If the file we parsed had full language support, the <NaturalDocs::Languages::Advanced> parser would have done more than just generate various OnComment() calls. It would also return a scope record, as represented by <NaturalDocs::Languages::Advanced::ScopeChange> objects, and a second set of ParsedTopics it extracted purely from the code, which we'll refer to as autotopics. The scope record shows, purely from the source, what scope each line of code appears in. This is then combined with the topic scoping to update ParsedTopics that come from the comments in the function <NaturalDocs::Parser->RepairPackages()>.
|
||||
|
||||
If a comment topic changes the scope, that's honored until the next autotopic or scope change from the code. This allows someone to document a class that doesn't appear in the code purely with topic scoping without throwing off anything else. Any other comment topics have their scope changed to the current scope no matter how it's arrived at. This allows someone to manually document a function without manually documenting the class and still have it appear under that class. The scope record will change the scope to part of that class even if topic scoping did not. Essentially the previous topic scoping is thrown out, which I guess is something that can be improved.
|
||||
|
||||
None of this affects the autotopics, as they are known to have the correct scoping since they are gleaned from the code with a dedicated parser. Wouldn't there be duplication of manually documented code elements, which would appear both in the autotopics and in the comment topics? Yes. That brings us to our next stage, which is...
|
||||
|
||||
|
||||
Stage: Merging Auto Topics
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
As mentioned above, ParseFile() also returns a set of ParsedTopics gleaned from the code called autotopics. The function <NaturalDocs::Parser->MergeAutoTopics()> merges this list with the comment topics.
|
||||
|
||||
The list is basically merged by line number. Since named topics should appear directly above the thing that they're documenting, topics are tested that way and combined into one if they match. The description and title of the comment topic is merged with the prototype of the autotopic. JavaDoc styled comments are also merged in this function, as they should appear directly above the code element they're documenting. Any headerless topics that don't, either by appearing past the last autotopic or above another comment topic, are discarded.
|
||||
|
||||
|
||||
Stage: Conclusion
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Thus ends all processing by <NaturalDocs::Parser->Parse()>. The file is now a single list of <NaturalDocs::Parser::ParsedTopics> with all the body content in <NDMarkup>. If we were using <NaturalDocs::Parser->ParseForBuild()>, that's pretty much it and it's ready to be converted into the output format. If we were using <NaturalDocs::Parser->ParseForInformation()> though, the resulting file is scanned for all relevant information to feed into other packages such as <NaturalDocs::SymbolTable>.
|
||||
|
||||
Note that no prototype processing was done in this process, only the possible tranferring of prototypes from one ParsedTopic to another when merging autotopics with comment topics. Obtaining prototypes and formatting them is handled by <NaturalDocs::Languages::Simple> and <NaturalDocs::Languages::Advanced> derived packages.
|
||||
|
@@ -8,7 +8,7 @@
|
||||
#
|
||||
###############################################################################
|
||||
|
||||
# This file is part of Natural Docs, which is Copyright <EFBFBD> 2003-2010 Greg Valure
|
||||
# This file is part of Natural Docs, which is Copyright © 2003-2010 Greg Valure
|
||||
# Natural Docs is licensed under version 3 of the GNU Affero General Public License (AGPL)
|
||||
# Refer to License.txt for the complete details
|
||||
|
||||
|
@@ -1,107 +1,107 @@
|
||||
|
||||
Title: Language Notes
|
||||
_______________________________________________________________________________
|
||||
|
||||
This is more for my personal reference than anything else.
|
||||
|
||||
|
||||
___________________________________________________________________________
|
||||
|
||||
Topic: Prototype Parameter Styles
|
||||
___________________________________________________________________________
|
||||
|
||||
Parameters via Commas, Typed via Spaces:
|
||||
|
||||
> FunctionName ( type indentifier, type identifier = value, modifier type identifier )
|
||||
> FunctionName ( indentifier, identifier = value )
|
||||
|
||||
The general idea is that parameters are separated by commas. Identifiers cannot contain spaces. Types and modifiers,
|
||||
if available, are separated from the identifiers with spaces. There may be an equals sign to set the default value.
|
||||
|
||||
So parsing means splitting by commas, stripping everything past an equals sign for the default value, stripping everything
|
||||
after the last space for the identifier, and the rest is the type. If there are no internal spaces after the default value is
|
||||
stripped, it's all identifier.
|
||||
|
||||
Note that internal parenthesis, brackets, braces, and angle brackets should be parsed out. They may be present in default
|
||||
values or types and any commas and equal signs in them should not be included.
|
||||
|
||||
Applies to C++, Java, C#, JavaScript, Python, PHP, Ruby.
|
||||
|
||||
Applies to Perl as well, even though it doesn't have any real parameter declaration structure. Just adding it with comments
|
||||
is fine.
|
||||
|
||||
Parameters via Semicolons and Commas, Typed via Colons:
|
||||
|
||||
> FunctionName ( identifier: type; identifier, identifier: type; identifier: type := value )
|
||||
|
||||
Parameters via semicolons, types via colons. However, there can be more than one parameter per type via commas.
|
||||
Default values via colon-equals.
|
||||
|
||||
Applies to Pascal, Ada.
|
||||
|
||||
|
||||
SQL:
|
||||
|
||||
> FunctionName ( identifier type, identifier modifier type, identifier type := value )
|
||||
|
||||
Parameters separated by commas. Identifiers come before the types and are separated by a space. Default values are
|
||||
specified with colon-equals.
|
||||
|
||||
> FunctionName @identifier type, @dentifier modifier type, @identifier type = value
|
||||
|
||||
Microsoft's SQL uses equals instead of colon-equals, doesn't need parenthesis, and starts its parameter names with an @
|
||||
symbol.
|
||||
|
||||
|
||||
Visual Basic:
|
||||
|
||||
> FunctionName ( modifiers identifier as type, identifier = value )
|
||||
|
||||
Parameters separated by commas. Default values via equals. However, any number of modifiers may appear before the
|
||||
identifier. Those modifiers are ByVal, ByRef, Optional, and ParamArray.
|
||||
|
||||
|
||||
Tcl:
|
||||
|
||||
> FunctionName { identifier identifier { whatever } } { code }
|
||||
|
||||
Identifiers are specified in the first set of braces and have no commas. However, they can be broken out into sub-braces.
|
||||
|
||||
|
||||
___________________________________________________________________________
|
||||
|
||||
Topic: Syntax References
|
||||
___________________________________________________________________________
|
||||
|
||||
C++ - http://www.csci.csusb.edu/dick/c++std/syntax.html
|
||||
|
||||
C# - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/CSharpSpecStart.asp. Open in IE.
|
||||
|
||||
Java - http://cui.unige.ch/db-research/Enseignement/analyseinfo/
|
||||
Ada - http://cui.unige.ch/db-research/Enseignement/analyseinfo/
|
||||
|
||||
SQL - http://cui.unige.ch/db-research/Enseignement/analyseinfo/,
|
||||
<http://www.cs.umb.edu/cs634/ora9idocs/appdev.920/a96624/13_elems.htm>, or
|
||||
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_tsqlcon_6lyk.asp?frame=true> (open in IE).
|
||||
|
||||
JavaScript - http://academ.hvcc.edu/~kantopet/javascript/index.php
|
||||
|
||||
Python - http://www.python.org/doc/2.3.4/ref/ref.html
|
||||
|
||||
PHP - http://www.php.net/manual/en/langref.php
|
||||
|
||||
Visual Basic - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbls7/html/vbspecstart.asp. Open in IE.
|
||||
|
||||
Pascal - <http://pages.cpsc.ucalgary.ca/~becker/231/SyntaxDiagrams/pascal-syntax_files/frame.htm>. Open in IE.
|
||||
|
||||
Ruby - http://www.rubycentral.com/book/
|
||||
|
||||
ActionScript 2 - <http://download.macromedia.com/pub/documentation/en/flash/fl8/fl8_as2lr.pdf>
|
||||
ActionScript 3 - <http://download.macromedia.com/pub/documentation/en/flex/2/prog_actionscript30.pdf>
|
||||
E2X - http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-357.pdf
|
||||
|
||||
R - Somewhere on http://www.r-project.org.
|
||||
|
||||
ColdFusion - <http://livedocs.macromedia.com/coldfusion/6/Developing_ColdFusion_MX_Applications_with_CFML/contents.htm>
|
||||
|
||||
Eiffel - http://www.gobosoft.com/eiffel/syntax/
|
||||
|
||||
Title: Language Notes
|
||||
_______________________________________________________________________________
|
||||
|
||||
This is more for my personal reference than anything else.
|
||||
|
||||
|
||||
___________________________________________________________________________
|
||||
|
||||
Topic: Prototype Parameter Styles
|
||||
___________________________________________________________________________
|
||||
|
||||
Parameters via Commas, Typed via Spaces:
|
||||
|
||||
> FunctionName ( type indentifier, type identifier = value, modifier type identifier )
|
||||
> FunctionName ( indentifier, identifier = value )
|
||||
|
||||
The general idea is that parameters are separated by commas. Identifiers cannot contain spaces. Types and modifiers,
|
||||
if available, are separated from the identifiers with spaces. There may be an equals sign to set the default value.
|
||||
|
||||
So parsing means splitting by commas, stripping everything past an equals sign for the default value, stripping everything
|
||||
after the last space for the identifier, and the rest is the type. If there are no internal spaces after the default value is
|
||||
stripped, it's all identifier.
|
||||
|
||||
Note that internal parenthesis, brackets, braces, and angle brackets should be parsed out. They may be present in default
|
||||
values or types and any commas and equal signs in them should not be included.
|
||||
|
||||
Applies to C++, Java, C#, JavaScript, Python, PHP, Ruby.
|
||||
|
||||
Applies to Perl as well, even though it doesn't have any real parameter declaration structure. Just adding it with comments
|
||||
is fine.
|
||||
|
||||
Parameters via Semicolons and Commas, Typed via Colons:
|
||||
|
||||
> FunctionName ( identifier: type; identifier, identifier: type; identifier: type := value )
|
||||
|
||||
Parameters via semicolons, types via colons. However, there can be more than one parameter per type via commas.
|
||||
Default values via colon-equals.
|
||||
|
||||
Applies to Pascal, Ada.
|
||||
|
||||
|
||||
SQL:
|
||||
|
||||
> FunctionName ( identifier type, identifier modifier type, identifier type := value )
|
||||
|
||||
Parameters separated by commas. Identifiers come before the types and are separated by a space. Default values are
|
||||
specified with colon-equals.
|
||||
|
||||
> FunctionName @identifier type, @dentifier modifier type, @identifier type = value
|
||||
|
||||
Microsoft's SQL uses equals instead of colon-equals, doesn't need parenthesis, and starts its parameter names with an @
|
||||
symbol.
|
||||
|
||||
|
||||
Visual Basic:
|
||||
|
||||
> FunctionName ( modifiers identifier as type, identifier = value )
|
||||
|
||||
Parameters separated by commas. Default values via equals. However, any number of modifiers may appear before the
|
||||
identifier. Those modifiers are ByVal, ByRef, Optional, and ParamArray.
|
||||
|
||||
|
||||
Tcl:
|
||||
|
||||
> FunctionName { identifier identifier { whatever } } { code }
|
||||
|
||||
Identifiers are specified in the first set of braces and have no commas. However, they can be broken out into sub-braces.
|
||||
|
||||
|
||||
___________________________________________________________________________
|
||||
|
||||
Topic: Syntax References
|
||||
___________________________________________________________________________
|
||||
|
||||
C++ - http://www.csci.csusb.edu/dick/c++std/syntax.html
|
||||
|
||||
C# - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/csspec/html/CSharpSpecStart.asp. Open in IE.
|
||||
|
||||
Java - http://cui.unige.ch/db-research/Enseignement/analyseinfo/
|
||||
Ada - http://cui.unige.ch/db-research/Enseignement/analyseinfo/
|
||||
|
||||
SQL - http://cui.unige.ch/db-research/Enseignement/analyseinfo/,
|
||||
<http://www.cs.umb.edu/cs634/ora9idocs/appdev.920/a96624/13_elems.htm>, or
|
||||
<http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_tsqlcon_6lyk.asp?frame=true> (open in IE).
|
||||
|
||||
JavaScript - http://academ.hvcc.edu/~kantopet/javascript/index.php
|
||||
|
||||
Python - http://www.python.org/doc/2.3.4/ref/ref.html
|
||||
|
||||
PHP - http://www.php.net/manual/en/langref.php
|
||||
|
||||
Visual Basic - http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vbls7/html/vbspecstart.asp. Open in IE.
|
||||
|
||||
Pascal - <http://pages.cpsc.ucalgary.ca/~becker/231/SyntaxDiagrams/pascal-syntax_files/frame.htm>. Open in IE.
|
||||
|
||||
Ruby - http://www.rubycentral.com/book/
|
||||
|
||||
ActionScript 2 - <http://download.macromedia.com/pub/documentation/en/flash/fl8/fl8_as2lr.pdf>
|
||||
ActionScript 3 - <http://download.macromedia.com/pub/documentation/en/flex/2/prog_actionscript30.pdf>
|
||||
E2X - http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-357.pdf
|
||||
|
||||
R - Somewhere on http://www.r-project.org.
|
||||
|
||||
ColdFusion - <http://livedocs.macromedia.com/coldfusion/6/Developing_ColdFusion_MX_Applications_with_CFML/contents.htm>
|
||||
|
||||
Eiffel - http://www.gobosoft.com/eiffel/syntax/
|
||||
|
@@ -1,92 +1,92 @@
|
||||
|
||||
Architecture: NDMarkup
|
||||
_______________________________________________________________________________
|
||||
|
||||
A markup format used by the parser, both internally and in <NaturalDocs::Parser::ParsedTopic> objects. Text formatted in
|
||||
NDMarkup will only have the tags documented below.
|
||||
|
||||
|
||||
About: Top-Level Tags
|
||||
|
||||
All content will be surrounded by one of the top-level tags. These tags will not appear within each other.
|
||||
|
||||
<p></p> - Surrounds a paragraph. Paragraph breaks will replace double line breaks, and single line breaks will
|
||||
be removed completely.
|
||||
|
||||
<code type=""></code> - Surrounds code or text diagrams that should appear literally in the output. The type can
|
||||
be code, text, or anonymous, in which case it's not specified whether it's code or text.
|
||||
|
||||
<h></h> - Surrounds a heading.
|
||||
|
||||
<ul></ul> - Surrounds a bulleted (unordered) list.
|
||||
<dl></dl> - Surrounds a description list, which is what you are reading.
|
||||
|
||||
<img mode="inline" target="" original=""> - An inline image. Target contains the image target, and original contains the
|
||||
original text in case it doesn't resolve.
|
||||
|
||||
|
||||
About: List Item Tags
|
||||
|
||||
These tags will only appear within their respective lists.
|
||||
|
||||
<li></li> - Surrounds a bulleted list item.
|
||||
<de></de> - Surrounds a description list entry, which is the left side. It will always be followed by a description list
|
||||
description.
|
||||
<ds></ds> - Surrounds a description list symbol. This is the same as a description list entry, except that the content
|
||||
is also a referenceable symbol. This occurs when inside a list topic. This tag will always
|
||||
be followed by a description list description.
|
||||
<dd></dd> - Surrounds a description list description, which is the right side. It will always be preceded by a description
|
||||
list entry or symbol.
|
||||
|
||||
About: Text Tags
|
||||
|
||||
These tags will only appear in paragraphs, headings, or description list descriptions.
|
||||
|
||||
<b></b> - Bold
|
||||
<i></i> - Italics
|
||||
<u></u> - Underline
|
||||
|
||||
<link target="" name="" original=""> - Surrounds a potential link to a symbol; potential because the target is not guaranteed to
|
||||
exist. This tag merely designates an attempted link. Target is what is attempting to be
|
||||
linked to, name is the text that should appear for a successful link, and original is the
|
||||
original text in case the link doesn't resolve.
|
||||
|
||||
<url target="" name=""> - An external link. There's no need for an original attribute because it will always be
|
||||
turned into an actual link.
|
||||
<email target="" name=""> - A link to an e-mail address.
|
||||
|
||||
<img mode="link" target="" original=""> - An image link. Target contains the image target, and original contains the original
|
||||
text in case it doesn't resolve.
|
||||
|
||||
|
||||
About: Amp Chars
|
||||
|
||||
These are the only amp chars supported, and will appear everywhere. Every other character will appear as is.
|
||||
|
||||
& - The ampersand &.
|
||||
" - The double quote ".
|
||||
< - The less than sign <.
|
||||
> - The greater than sign >.
|
||||
|
||||
About: Tabs
|
||||
|
||||
NDMarkup will not contain tab characters, only spaces. Any tab characters appearing in the source files will be
|
||||
expanded/replaced as necessary.
|
||||
|
||||
|
||||
About: General Tag Properties
|
||||
|
||||
Since the tags are generated, they will always have the following properties, which will make pattern matching much
|
||||
easier.
|
||||
|
||||
- Tags and amp chars will always be in all lowercase.
|
||||
- Properties will appear exactly as documented here. They will be in all lowercase, in the documented order, and will have no
|
||||
extraneous whitespace. Anything appearing in the properties will have amp chars.
|
||||
- All code is valid, meaning tags will always be closed, <li>s will only appear within <ul>s, etc.
|
||||
|
||||
So, for example, you can match description list entries with /<de>(.+?)<\/de>/ and $1 will be the text. No surprises or
|
||||
gotchas. No need for sophisticated parsing routines.
|
||||
|
||||
Remember that for symbol definitions, the text should appear as is, but internally (such as for the anchor) they need to
|
||||
be passed through <NaturalDocs::SymbolTable->Defines()> so that the output file is just as tolerant as
|
||||
<NaturalDocs::SymbolTable>.
|
||||
|
||||
Architecture: NDMarkup
|
||||
_______________________________________________________________________________
|
||||
|
||||
A markup format used by the parser, both internally and in <NaturalDocs::Parser::ParsedTopic> objects. Text formatted in
|
||||
NDMarkup will only have the tags documented below.
|
||||
|
||||
|
||||
About: Top-Level Tags
|
||||
|
||||
All content will be surrounded by one of the top-level tags. These tags will not appear within each other.
|
||||
|
||||
<p></p> - Surrounds a paragraph. Paragraph breaks will replace double line breaks, and single line breaks will
|
||||
be removed completely.
|
||||
|
||||
<code type=""></code> - Surrounds code or text diagrams that should appear literally in the output. The type can
|
||||
be code, text, or anonymous, in which case it's not specified whether it's code or text.
|
||||
|
||||
<h></h> - Surrounds a heading.
|
||||
|
||||
<ul></ul> - Surrounds a bulleted (unordered) list.
|
||||
<dl></dl> - Surrounds a description list, which is what you are reading.
|
||||
|
||||
<img mode="inline" target="" original=""> - An inline image. Target contains the image target, and original contains the
|
||||
original text in case it doesn't resolve.
|
||||
|
||||
|
||||
About: List Item Tags
|
||||
|
||||
These tags will only appear within their respective lists.
|
||||
|
||||
<li></li> - Surrounds a bulleted list item.
|
||||
<de></de> - Surrounds a description list entry, which is the left side. It will always be followed by a description list
|
||||
description.
|
||||
<ds></ds> - Surrounds a description list symbol. This is the same as a description list entry, except that the content
|
||||
is also a referenceable symbol. This occurs when inside a list topic. This tag will always
|
||||
be followed by a description list description.
|
||||
<dd></dd> - Surrounds a description list description, which is the right side. It will always be preceded by a description
|
||||
list entry or symbol.
|
||||
|
||||
About: Text Tags
|
||||
|
||||
These tags will only appear in paragraphs, headings, or description list descriptions.
|
||||
|
||||
<b></b> - Bold
|
||||
<i></i> - Italics
|
||||
<u></u> - Underline
|
||||
|
||||
<link target="" name="" original=""> - Surrounds a potential link to a symbol; potential because the target is not guaranteed to
|
||||
exist. This tag merely designates an attempted link. Target is what is attempting to be
|
||||
linked to, name is the text that should appear for a successful link, and original is the
|
||||
original text in case the link doesn't resolve.
|
||||
|
||||
<url target="" name=""> - An external link. There's no need for an original attribute because it will always be
|
||||
turned into an actual link.
|
||||
<email target="" name=""> - A link to an e-mail address.
|
||||
|
||||
<img mode="link" target="" original=""> - An image link. Target contains the image target, and original contains the original
|
||||
text in case it doesn't resolve.
|
||||
|
||||
|
||||
About: Amp Chars
|
||||
|
||||
These are the only amp chars supported, and will appear everywhere. Every other character will appear as is.
|
||||
|
||||
& - The ampersand &.
|
||||
" - The double quote ".
|
||||
< - The less than sign <.
|
||||
> - The greater than sign >.
|
||||
|
||||
About: Tabs
|
||||
|
||||
NDMarkup will not contain tab characters, only spaces. Any tab characters appearing in the source files will be
|
||||
expanded/replaced as necessary.
|
||||
|
||||
|
||||
About: General Tag Properties
|
||||
|
||||
Since the tags are generated, they will always have the following properties, which will make pattern matching much
|
||||
easier.
|
||||
|
||||
- Tags and amp chars will always be in all lowercase.
|
||||
- Properties will appear exactly as documented here. They will be in all lowercase, in the documented order, and will have no
|
||||
extraneous whitespace. Anything appearing in the properties will have amp chars.
|
||||
- All code is valid, meaning tags will always be closed, <li>s will only appear within <ul>s, etc.
|
||||
|
||||
So, for example, you can match description list entries with /<de>(.+?)<\/de>/ and $1 will be the text. No surprises or
|
||||
gotchas. No need for sophisticated parsing routines.
|
||||
|
||||
Remember that for symbol definitions, the text should appear as is, but internally (such as for the anchor) they need to
|
||||
be passed through <NaturalDocs::SymbolTable->Defines()> so that the output file is just as tolerant as
|
||||
<NaturalDocs::SymbolTable>.
|
||||
|
@@ -1,59 +1,59 @@
|
||||
|
||||
Architecture: Symbol Management
|
||||
|
||||
####################################################################################
|
||||
|
||||
This is the architecture and code path for symbol management. This is almost exclusively managed by <NaturalDocs::SymbolTable>, but it's complicated enough that I want a plain-English walk through of the code paths anyway.
|
||||
|
||||
An important thing to remember is that each section below is simplified initially and then expanded upon in later sections as more facets of the code are introduced. You will not get the whole story of what a function does by reading just one section.
|
||||
|
||||
|
||||
|
||||
Topic: Symbol Storage
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Symbols are indexed primarily by their <SymbolString>, which is the normalized, pre-parsed series of identifiers that make it up. A symbol can have any number of definitions, including none, but can only have one definition per file. If a symbol is defined more than once in a file, only the first definition is counted. Stored for each definition is the <TopicType>, summary, and prototype.
|
||||
|
||||
Each symbol that has a definition has one designated as the global definition. This is the one linked to by other files, unless that file happens to have its own definition which then takes precedence. Which definition is chosen is rather arbitrary at this point; probably the first one that got defined. Similarly, if the global definition is deleted, which one is chosen to replace it is completely arbitrary.
|
||||
|
||||
Each symbol also stores a list of references to it. Note that references can be interpreted as multiple symbols, and each of those symbols will store a link back to the reference. In other words, every reference a symbol stores is one that _can_ be interpreted as that symbol, but that is not necessarily the interpretation the reference actually uses. A reference could have a better interpretation it uses instead.
|
||||
|
||||
For example, suppose there are two functions, MyFunction() and MyClass.MyFunction(). The reference text "MyFunction()" appearing in MyClass can be interpreted as either MyClass.MyFunction(), or if that doesn't exist, the global MyFunction(). Both the symbols for MyFunction() and MyClass.MyFunction() will store that it's referenced by the link, even though the class scoped one serves as the actual definition.
|
||||
|
||||
This is also the reason a symbol can exist that has no definitions: it has references. We want symbols to be created in the table for each reference interpretation, even if it doesn't exist. These are called potential symbols. The reason is so we know whether a new symbol definition fulfills an existing reference, since it may be a better interpretation for the reference than what is currently used.
|
||||
|
||||
|
||||
|
||||
Topic: Reference Storage
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
References are indexed primarily by their <ReferenceString>, which is actually an elaborate data structure packed into a string. It includes a <SymbolString> of the text that appears in the link and a bunch of other data that determines the rules by which the link can be resolved. For example, it includes the scope it appears in and any "using" statements in effect, which are alternate possible scopes. It includes the type of link it is (text links, the ones you explicitly put in comments, aren't the only kind) and resolving flags which encode the language-specific rules of non-text links. But the bottom line is the <ReferenceString> encodes everything that influences how it may be resolved, so if two links come up with the same rules, they're considered two definitions of the same reference. This is the understanding of the word "reference" that will used in this document.
|
||||
|
||||
Like symbols, each reference stores a list of definitions. However, it only stores the name as all the other relevant information is encoded in the <ReferenceString> itself. Unlike a symbol, which can be linked to the same no matter what kind of definitions it has, references that are in any way different might be interpreted differently and so need their own distinct entries in the symbol table.
|
||||
|
||||
References also store a list of interpretations. Every possible interpretation of the reference is stored and given a numeric score. The higher the score, the better it suits the reference. In the MyFunction() example from before, MyClass.MyFunction() would have a higher score than just MyFunction() because the local scope should win. Each interpretation has a unique score, there are no duplicates.
|
||||
|
||||
So the symbol and reference data structures are complimentary. Each symbol has a list of every reference that might be interpreted as it, and every reference has a list of each symbol that it could be interpreted as. Again, objects are created for potential symbols (those with references but no definitions) so that this structure always remains intact.
|
||||
|
||||
The interpretation with the highest score which actually exists is deemed the current interpretation of the reference. Unlike symbols where the next global definition is arbitrary, the succession of reference interpretations is very controlled and predictable.
|
||||
|
||||
|
||||
Topic: Change Detection
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Change management is handled a couple of ways. First, there is a secondary file index in <NaturalDocs::SymbolTable> that stores which symbols and references are stored in each file. It doesn't have any information other than a list of <SymbolStrings> and <ReferenceStrings> since they can be used in the main structures to look up the details. If a file is deleted, the symbol table can then prune any definitions that should no longer be in the table.
|
||||
|
||||
Another way deals with how the information parsing stage works. Files parsed for information just have their symbols and references added to the table regardless of whether this was the first time it was ever parsed or if it had been parsed before. If it had been parsed before, all the information from the previous parse should be in the symbol table and file indexes already. If a new symbol or reference is defined, that's fine, it's added to the table normally. However, if a symbol is redefined it's ignored because only the first definition matters. Also, this won't detect things that disappear.
|
||||
|
||||
Enter watched files. <NaturalDocs::Parser> tells <NaturalDocs::SymbolTable> to designate a file as watched before it starts parsing it, and then says to analyze the changes when it's done. The watched file is a second index of all the symbols and references that were defined since the watch started, including the specific details on the symbol definitions. When the analysis is done, it compares the list of symbols and references to the one in the main file index. Any that appear in the main file index but not the watched one are deleted because they didn't show up the second time around. Any symbol definitions that are different in the watched file than the main file are changed to the former, since the first definition that appeared the second time around was different than the original.
|
||||
|
||||
|
||||
Topic: Change Management
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
When a symbol's global definition changes, either because it switches to another file or because the details of the current file's definition changed (prototype, summary, etc.) it goes through all the references that can be interpreted as that symbol, finds the ones that use it as their current definition, and marks all the files that define them for rebuilding. The links in their output files have to be changed to the new definition or at least have their tooltips updated.
|
||||
|
||||
When a symbol's last definition is deleted, it goes through all the references that can be interpreted as that symbol, finds the ones that use it as their current definition, and has them reinterpreted to the definition with the next highest score. The files that define them are also marked for rebuilding.
|
||||
|
||||
When a potential symbol's first definition is found, it goes through all the references that can be interpreted as it and sees if it can serve as a higher scored interpretation than the current one. If so, the interpretations are changed and all the files that define them are marked for rebuilding.
|
||||
|
||||
|
||||
Architecture: Symbol Management
|
||||
|
||||
####################################################################################
|
||||
|
||||
This is the architecture and code path for symbol management. This is almost exclusively managed by <NaturalDocs::SymbolTable>, but it's complicated enough that I want a plain-English walk through of the code paths anyway.
|
||||
|
||||
An important thing to remember is that each section below is simplified initially and then expanded upon in later sections as more facets of the code are introduced. You will not get the whole story of what a function does by reading just one section.
|
||||
|
||||
|
||||
|
||||
Topic: Symbol Storage
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Symbols are indexed primarily by their <SymbolString>, which is the normalized, pre-parsed series of identifiers that make it up. A symbol can have any number of definitions, including none, but can only have one definition per file. If a symbol is defined more than once in a file, only the first definition is counted. Stored for each definition is the <TopicType>, summary, and prototype.
|
||||
|
||||
Each symbol that has a definition has one designated as the global definition. This is the one linked to by other files, unless that file happens to have its own definition which then takes precedence. Which definition is chosen is rather arbitrary at this point; probably the first one that got defined. Similarly, if the global definition is deleted, which one is chosen to replace it is completely arbitrary.
|
||||
|
||||
Each symbol also stores a list of references to it. Note that references can be interpreted as multiple symbols, and each of those symbols will store a link back to the reference. In other words, every reference a symbol stores is one that _can_ be interpreted as that symbol, but that is not necessarily the interpretation the reference actually uses. A reference could have a better interpretation it uses instead.
|
||||
|
||||
For example, suppose there are two functions, MyFunction() and MyClass.MyFunction(). The reference text "MyFunction()" appearing in MyClass can be interpreted as either MyClass.MyFunction(), or if that doesn't exist, the global MyFunction(). Both the symbols for MyFunction() and MyClass.MyFunction() will store that it's referenced by the link, even though the class scoped one serves as the actual definition.
|
||||
|
||||
This is also the reason a symbol can exist that has no definitions: it has references. We want symbols to be created in the table for each reference interpretation, even if it doesn't exist. These are called potential symbols. The reason is so we know whether a new symbol definition fulfills an existing reference, since it may be a better interpretation for the reference than what is currently used.
|
||||
|
||||
|
||||
|
||||
Topic: Reference Storage
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
References are indexed primarily by their <ReferenceString>, which is actually an elaborate data structure packed into a string. It includes a <SymbolString> of the text that appears in the link and a bunch of other data that determines the rules by which the link can be resolved. For example, it includes the scope it appears in and any "using" statements in effect, which are alternate possible scopes. It includes the type of link it is (text links, the ones you explicitly put in comments, aren't the only kind) and resolving flags which encode the language-specific rules of non-text links. But the bottom line is the <ReferenceString> encodes everything that influences how it may be resolved, so if two links come up with the same rules, they're considered two definitions of the same reference. This is the understanding of the word "reference" that will used in this document.
|
||||
|
||||
Like symbols, each reference stores a list of definitions. However, it only stores the name as all the other relevant information is encoded in the <ReferenceString> itself. Unlike a symbol, which can be linked to the same no matter what kind of definitions it has, references that are in any way different might be interpreted differently and so need their own distinct entries in the symbol table.
|
||||
|
||||
References also store a list of interpretations. Every possible interpretation of the reference is stored and given a numeric score. The higher the score, the better it suits the reference. In the MyFunction() example from before, MyClass.MyFunction() would have a higher score than just MyFunction() because the local scope should win. Each interpretation has a unique score, there are no duplicates.
|
||||
|
||||
So the symbol and reference data structures are complimentary. Each symbol has a list of every reference that might be interpreted as it, and every reference has a list of each symbol that it could be interpreted as. Again, objects are created for potential symbols (those with references but no definitions) so that this structure always remains intact.
|
||||
|
||||
The interpretation with the highest score which actually exists is deemed the current interpretation of the reference. Unlike symbols where the next global definition is arbitrary, the succession of reference interpretations is very controlled and predictable.
|
||||
|
||||
|
||||
Topic: Change Detection
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
Change management is handled a couple of ways. First, there is a secondary file index in <NaturalDocs::SymbolTable> that stores which symbols and references are stored in each file. It doesn't have any information other than a list of <SymbolStrings> and <ReferenceStrings> since they can be used in the main structures to look up the details. If a file is deleted, the symbol table can then prune any definitions that should no longer be in the table.
|
||||
|
||||
Another way deals with how the information parsing stage works. Files parsed for information just have their symbols and references added to the table regardless of whether this was the first time it was ever parsed or if it had been parsed before. If it had been parsed before, all the information from the previous parse should be in the symbol table and file indexes already. If a new symbol or reference is defined, that's fine, it's added to the table normally. However, if a symbol is redefined it's ignored because only the first definition matters. Also, this won't detect things that disappear.
|
||||
|
||||
Enter watched files. <NaturalDocs::Parser> tells <NaturalDocs::SymbolTable> to designate a file as watched before it starts parsing it, and then says to analyze the changes when it's done. The watched file is a second index of all the symbols and references that were defined since the watch started, including the specific details on the symbol definitions. When the analysis is done, it compares the list of symbols and references to the one in the main file index. Any that appear in the main file index but not the watched one are deleted because they didn't show up the second time around. Any symbol definitions that are different in the watched file than the main file are changed to the former, since the first definition that appeared the second time around was different than the original.
|
||||
|
||||
|
||||
Topic: Change Management
|
||||
_______________________________________________________________________________________________________
|
||||
|
||||
When a symbol's global definition changes, either because it switches to another file or because the details of the current file's definition changed (prototype, summary, etc.) it goes through all the references that can be interpreted as that symbol, finds the ones that use it as their current definition, and marks all the files that define them for rebuilding. The links in their output files have to be changed to the new definition or at least have their tooltips updated.
|
||||
|
||||
When a symbol's last definition is deleted, it goes through all the references that can be interpreted as that symbol, finds the ones that use it as their current definition, and has them reinterpreted to the definition with the next highest score. The files that define them are also marked for rebuilding.
|
||||
|
||||
When a potential symbol's first definition is found, it goes through all the references that can be interpreted as it and sees if it can serve as a higher scored interpretation than the current one. If so, the interpretations are changed and all the files that define them are marked for rebuilding.
|
||||
|
||||
|
Reference in New Issue
Block a user