Design#
This document outlines the architecture and key design decisions made in the implementation of Clanguru. I’ve tried to keep things simple while adhering to good software design principles where appropriate.
CLangParser#
CLangParser is responsible for parsing C/C++ source files using libclang and creating an internal representation that can be used for documentation generation.
It’s designed to be flexible and extensible, with special consideration for handling tokens, nodes, and their relationships.
classDiagram
class CLangParser {
+load(file: Path, compilation_options_manager: CompilationOptionsManager) TranslationUnit
}
class TranslationUnit {
+raw_tu: _TranslationUnit
+tokens: TokensCollection
+nodes: List[Node]
+source_file: Path
}
class Token {
+raw_token: _Token
+previous_token: Token
+next_token: Token
+is_comment: bool
}
class Node {
+raw_node: Cursor
+previous_node: Node
+next_node: Node
+tokens: TokensCollection
+parent: TranslationUnit
+is_function_definition() bool
}
class Function {
+name: str
+origin: Node
+description_token: Token
+body: str
+is_definition: bool
}
class CppClass {
+name: str
+origin: Node
+description_token: Token
+body: str
}
CLangParser --> TranslationUnit : creates
TranslationUnit --> Token : contains
TranslationUnit --> Node : contains
Node --> Token : contains
Function --|> Node : extends
CppClass --|> Node : extends
Decorator Pattern for Chained Lists#
I’ve implemented a decorator pattern to create chained lists of tokens and nodes. This design allows us to easily traverse the token and node lists in both directions, which is particularly useful for context-aware operations like finding description comments.
@dataclass
class Token:
raw_token: _Token
previous_token: Optional["Token"]
next_token: Optional["Token"]
Handling Function and CppClass Nodes#
Function and CppClass are currently the only node types that are documented. They are handled as follows:
When parsing, we identify nodes that represent functions or classes.
For each of these nodes, we look for a description token:
We search for the first token above the node’s first token.
If this token is a comment, it’s considered the description token.
The description token, along with other relevant information, is stored in the
FunctionorCppClassobject.
Handling Macro Expansions#
The implementation is designed to work correctly even when declarations are expanded from macros, and intermediate tokens are inserted.
The _collect_node_tokens method in CLangParser which collects tokens for a node based on their source locations rather than relying on the exact token sequence from libclang.
Documentation Generator#
The doc_generator uses an intermediate format for storing documentation before rendering it in a specific output format. This design allows for flexibility in adding new output formats without modifying existing code.
Here’s a class diagram illustrating the relationships between these components:
classDiagram
class DocStructure {
+title: str
+sections: List[Section]
+add_section(section: Section)
}
class Section {
+title: str
+content: List[SectionContent]
+subsections: List[Section]
+add_content(content: SectionContent)
+add_subsection(subsection: Section)
}
class SectionContent {
<<interface>>
}
class TextContent {
+text: str
}
class CodeContent {
+code: str
+language: str
+linenos: bool
+highlight_lines: List[int]
}
class OutputFormatter {
<<abstract>>
+format(doc: DocStructure) str
+format_text(text: str) str
+format_code(content: CodeContent) str
+file_extension() str
}
class MarkdownFormatter {
+flavour: MarkdownFlavour
+format(doc: DocStructure) str
+format_text(text: str) str
+format_code(content: CodeContent) str
+file_extension() str
}
class RSTFormatter {
+format(doc: DocStructure) str
+format_text(text: str) str
+format_code(content: CodeContent) str
+file_extension() str
}
DocStructure --> "1..*" Section
Section --> "0..*" SectionContent
Section --> "0..*" Section : subsections
TextContent --|> SectionContent
CodeContent --|> SectionContent
MarkdownFormatter --|> OutputFormatter
RSTFormatter --|> OutputFormatter
Design Principles#
Open/Closed Principle#
The doc_generator adheres to the Open/Closed Principle in several ways:
Intermediate Format: By using
DocStructureas an intermediate representation, we can add new output formats without modifying the existing parsing or structure generation code.OutputFormatter Interface: The
OutputFormatterabstract base class allows for the addition of new formatting styles (e.g., HTML, LaTeX) without changing the core documentation generation logic.
Dependency Injection#
Dependency Injection is utilized in the generate_documentation function:
def generate_documentation(translation_unit: TranslationUnit, formatter: OutputFormatter, output_file: Path) -> None:
doc_structure = generate_doc_structure(translation_unit)
output_file.write_text(formatter.format(doc_structure))
This function takes an OutputFormatter as a parameter, allowing the caller to inject the desired formatter. This decouples the documentation generation process from the specific output format, making the system more flexible and easier to extend.
Workflow#
generate_doc_structurecreates aDocStructurefrom aTranslationUnit.DocStructureis populated withSectionobjects, which in turn containSectionContent(eitherTextContentorCodeContent).generate_documentationtakes thisDocStructureand anOutputFormatter.The
OutputFormattertraverses theDocStructure, formatting each section and its content according to the specific output format.
Extending the System#
To add a new output format:
Create a new class that inherits from
OutputFormatter.Implement the required methods (
format,format_text,format_code,file_extension).Use the new formatter with the existing
generate_documentationfunction.
No changes to existing classes or the core generation logic are required, demonstrating the flexibility of this design.