|
| 1 | +# DomManager |
| 2 | + |
| 3 | +The primary role of `DomManager` is to provide an interface to the Document Object Model (DOM) |
| 4 | +and cache as much information as possible about it for optimization. Most of this data is collected |
| 5 | +right during the initialization phase when an instance is created, while the remaining details |
| 6 | +are fetched during interaction processes. |
| 7 | + |
| 8 | +Simplified structure of `DomManager`: |
| 9 | +```mermaid |
| 10 | +erDiagram |
| 11 | + DomManager ||--|| DomData : contains |
| 12 | + DomData ||--o{ DDStaticElement: includes |
| 13 | + DomData ||--|{ DDDynamicBlock: includes |
| 14 | + DomData ||--o{ DDExtraText: includes |
| 15 | + DDDynamicBlock ||--o{ DDSpanElement: includes |
| 16 | + DDSpanElement ||--o{ DDSpanElement: includes |
| 17 | + DDDynamicBlock ||--o{ DDTextElement: includes |
| 18 | + DDSpanElement ||--o{ DDTextElement: includes |
| 19 | +
|
| 20 | +
|
| 21 | + DomData { |
| 22 | + number endPos |
| 23 | + number displayedText |
| 24 | + number displayedTextPos |
| 25 | + Array~DDStaticElement|DDDynamicBlock|DDExtraText~ elements |
| 26 | + } |
| 27 | + DDStaticElement { |
| 28 | + HTMLElement node |
| 29 | + number start |
| 30 | + string path |
| 31 | + } |
| 32 | + DDDynamicBlock { |
| 33 | + number start |
| 34 | + number end |
| 35 | + string path |
| 36 | + Array~DDSpanElement|DDTextElement~ children |
| 37 | + } |
| 38 | + DDSpanElement { |
| 39 | + number start |
| 40 | + number end |
| 41 | + HTMLSpanElement node |
| 42 | + Array~DDSpanElement|DDTextElement~ children |
| 43 | + } |
| 44 | + DDTextElement { |
| 45 | + Text node |
| 46 | + number start |
| 47 | + number end |
| 48 | + string[] content |
| 49 | + } |
| 50 | +``` |
| 51 | + |
| 52 | + |
| 53 | +### DomData |
| 54 | + |
| 55 | + |
| 56 | +`DomManager` stores all needed data in the object being an instance of `DomData` class. |
| 57 | +This object stores representation of displayed content in the way it can be seen by users, |
| 58 | +which was achieved by using Selection Api. It also keeps in mind positions coming |
| 59 | +after the last processed element in displayed text and in the content of `RichText`. |
| 60 | +But all this is relevant only on the stage of initialization. |
| 61 | + |
| 62 | +The last field `DomData` contains is an array of elements that should represent the DOM tree itself. |
| 63 | +And that is the point of interest. |
| 64 | + |
| 65 | +All elements being contained by `DomData` could be logically divided in two groups. |
| 66 | +The one that is static and never changes, they reflect the structure of the DOM. |
| 67 | +And the other one where we store all highlight spans and texts. |
| 68 | + |
| 69 | + |
| 70 | +### Structural elements |
| 71 | + |
| 72 | +The first group consists of elements of types `DDStaticElement`, `DDDynamicBlock` and `DDExtraText`. |
| 73 | +It’s a flat list on the first level of descendents and represents the default state |
| 74 | +of `RichText`’s content. |
| 75 | + |
| 76 | + |
| 77 | +- `DDStaticElement` contains information about its related tag in DOM. It contains reference |
| 78 | +to its html-node, its start position calculated as an global offset and its xpath. |
| 79 | +The last two fields are used to search the right |
| 80 | +elements in DomData. |
| 81 | +- `DDExtraText` is just a string. It has no real analog in DOM but it is what we get |
| 82 | +when we work with Selection Api to collect text representation. For example in case |
| 83 | +when the content itself has some block elements or other line breaks. |
| 84 | +This exists only to be sure that all symbols of displayed text are accounted for |
| 85 | +in the region's text field. |
| 86 | +- `DDDynamicBlock` is a container for managing all real text elements and highlighting spans |
| 87 | +that belong to regions. It provides slots for dynamically changing content. On the initialization |
| 88 | +it has relation only with one text node in DOM. It stores information of start and end |
| 89 | +of the editable block in terms of global offsets, xpath of its original text element |
| 90 | +and set of children elements. |
| 91 | + |
| 92 | + |
| 93 | +### Content elements |
| 94 | + |
| 95 | + |
| 96 | +The second group is sets of elements that dynamically change when regions are created and deleted. |
| 97 | +It is represented by elements of types `DDSpanElement` and `DDTextElement`. |
| 98 | + |
| 99 | + |
| 100 | +- `DDSpanElement` is similar to `DDDynamicBlock` but it also can be created / deleted |
| 101 | +during the annotating, stores the reference to its highlighting span html-node and |
| 102 | +has a method to remove this span itself from DOM. |
| 103 | +- The content of `DDTextElement` is an array of strings where each element |
| 104 | +on the one hand is a thing that is counted by global offsets as one symbol |
| 105 | +and on the other hand is a substring of displayed text |
| 106 | +so that there is no any character here that the browser does not provide as visible. |
| 107 | + |
| 108 | + |
| 109 | +### Examples |
| 110 | + |
| 111 | +#### Simple Html |
| 112 | +The simple data `<p>The <b>HTML</b></p>` will be converted in this way: |
| 113 | + |
| 114 | + |
| 115 | +```mermaid |
| 116 | +flowchart TD |
| 117 | + content["<p>The <b>HTML</b></p>"] |
| 118 | + body["<sup>0 </sup>DDStaticElement<br>path: '/'"] |
| 119 | + p["<sup>0 </sup>DDStaticElement<br>path: '/p[1]'"] |
| 120 | + the["<sup>0 </sup>DDDynamicBlock<sup> 4</sup><br>path: '/p[1]/text()[1]'"] |
| 121 | + b["<sup>4 </sup>DDStaticElement<br>path: '/p[1]/b[1]'"] |
| 122 | + html["<sup>4 </sup>DDDynamicBlock<sup> 8</sup><br>path: '/p[1]/b[1]/text()[1]'"] |
| 123 | + content --> body |
| 124 | + content--> p |
| 125 | + content--> the |
| 126 | + content--> b |
| 127 | + content--> html |
| 128 | + t_the["<sup>0 </sup>DDTextElement<sup> 4</sup><br>content: ['T', 'h', 'e', ' ']"] |
| 129 | + the--> t_the |
| 130 | + t_html["<sup>4 </sup>DDTextElement<sup> 8</sup><br>content: ['H', 'T', 'M', 'L']"] |
| 131 | + html--> t_html |
| 132 | +``` |
| 133 | + |
| 134 | +#### A text with a region |
| 135 | + |
| 136 | +A text `“Text"` with region over `“x”` would be represented as: |
| 137 | + |
| 138 | + |
| 139 | +```mermaid |
| 140 | +flowchart TD |
| 141 | + content["Te<mark>x<sup>label_x</sup></mark>t"] |
| 142 | + body["<sup>0</sup> DDStaticElement\npath: '/'"] |
| 143 | + text["<sup>0</sup> DDDynamicBlock <sup>4</sup><br>path: '/text()[1]'"] |
| 144 | + content --> body |
| 145 | + content --> text |
| 146 | + t_text_Te["<sup>0</sup> DDTextElement <sup>2</sup><br>content: ['T','e']"] |
| 147 | + text --> t_text_Te |
| 148 | + s_span_x["<sup>2</sup> DDSpanElement <sup>3</sup>"] |
| 149 | + text --> s_span_x |
| 150 | + t_text_x["<sup>2</sup> DDTextElement <sup>3</sup><br>content: ['x']"] |
| 151 | + s_span_x --> t_text_x |
| 152 | + t_text_t["<sup>3</sup> DDTextElement <sup>4</sup><br>content: ['t']"] |
| 153 | + text--> t_text_t |
| 154 | +``` |
| 155 | + |
| 156 | +#### Replacing characters |
| 157 | + |
| 158 | +The tricky content `a<br>b\nc` will be: |
| 159 | + |
| 160 | +```mermaid |
| 161 | +flowchart TD |
| 162 | + content["a<br/>b#92;nc"] |
| 163 | + body["<sup>0</sup> DDStaticElement<br>path: '/'"] |
| 164 | + a["<sup>0</sup> DDDynamicBlock <sup>1</sup><br>path: '/text()[1]'"] |
| 165 | + n["#92;#92;n"] |
| 166 | + bc["<sup>2</sup> DDDynamicBlock <sup>5</sup><br>path: '/text()[2]'"] |
| 167 | + content --> body |
| 168 | + content --> a |
| 169 | + content --> n |
| 170 | + content --> bc |
| 171 | + t_a["<sup>0</sup> DDTextElement <sup>1</sup><br>content: ['a']"] |
| 172 | + a --> t_a |
| 173 | + t_bc["<sup>2 </sup>DDTextElement<sup> 5</sup><br>content: ['b', ' ', 'c']"] |
| 174 | + bc --> t_bc |
| 175 | +``` |
| 176 | +- `\n` is converted to space character as it is displayed in the browser. |
| 177 | +- `<br>` becomes extra text element `\n` as it will be displayed as a line break. |
| 178 | + |
| 179 | +#### Edge cases |
| 180 | +There could be more complicated cases, for example when HTML is not well-formed. |
| 181 | +```html |
| 182 | +<p>This |
| 183 | +is part<br/> of |
| 184 | +<abbr tytle="HyperText Markup Language"><b>HTML</b></abbr> |
| 185 | +</p> |
| 186 | +``` |
| 187 | +Is displayed in browser as: |
| 188 | + |
| 189 | +This is part<br> |
| 190 | +of <b>HTML</b> |
| 191 | + |
| 192 | +And results in: |
| 193 | +```mermaid |
| 194 | +flowchart TD |
| 195 | + content["<p>This is part<br/> of <abbr tytle="HyperText Markup Language"><b>HTML</b></abbr> |
| 196 | +</p>"] |
| 197 | + body["<sup>0</sup> DDStaticElement<br>path: '/'"] |
| 198 | + p["<sup>0</sup> DDStaticElement<br>path: '/p[1]'"] |
| 199 | + ThisIsPart["<sup>0</sup> DDDynamicBlock <sup>4</sup><br>path: '/p[1]/text()[1]'"] |
| 200 | + ThisIsPart_text["<sup>0</sup> DDTextElement <sup>12</sup><br>content: ['T','h','i','s',' ','i','s',' ','p','a','r','t']"] |
| 201 | + extra1["#92;#92;n"] |
| 202 | + of["<sup>13</sup> DDDynamicBlock <sup>18</sup><br>path: '/p[1]/text()[2]'"] |
| 203 | + of_text["<sup>13</sup> DDTextElement <sup>18</sup><br>content: ['','o','f',' ','']"] |
| 204 | + abbr["<sup>18</sup> DDStaticElement<br>path: '/p[1]/abbr[1]'"] |
| 205 | + b["<sup>18</sup> DDStaticElement<br>path: '/p[1]/abbr[1]/b[1]'"] |
| 206 | + html["<sup>18</sup> DDDynamicBlock <sup>22</sup><br>path: '/p[1]/abbr[1]/b[1]/text()[1]'"] |
| 207 | + html_text["<sup>18</sup> DDTextElement <sup>22</sup><br>content: ['H','T','M','L']"] |
| 208 | + empty["<sup>22</sup> DDDynamicBlock <sup>23</sup><br>path: '/p[1]/text()[3]'"] |
| 209 | + empty_text["<sup>22</sup> DDTextElement <sup>23</sup><br>content: ['']"] |
| 210 | + content --> body |
| 211 | + content --> p |
| 212 | + content --> ThisIsPart |
| 213 | + ThisIsPart --> ThisIsPart_text |
| 214 | + content --> extra1 |
| 215 | + content --> of |
| 216 | + of --> of_text |
| 217 | + content --> abbr |
| 218 | + content --> b |
| 219 | + content --> html |
| 220 | + html --> html_text |
| 221 | + content --> empty |
| 222 | + empty --> empty_text |
| 223 | +``` |
| 224 | + |
| 225 | +In the second text node we have a content `['','o','f',' ','']` |
| 226 | + |
| 227 | +An empty string as a first element is a result of the fact that the browser does not display |
| 228 | +space at the beginning of the tag content. |
| 229 | + |
| 230 | +An empty string as a last element is a result of the fact that the browser knows about the line break |
| 231 | +in original html and also considers it as a character, but it does not display it. |
| 232 | + |
| 233 | +### Content field |
| 234 | +Displayed text is stored in the `content` field of elements. It is represented as an array of strings. |
| 235 | +Each item in the array is a character displayed in the browser. |
| 236 | + |
| 237 | +Some of the characters are empty strings, that means that they are not displayed in the browser |
| 238 | +and cannot be got by Selection Api. But there are met in DOM's text nodes in `textContent`. |
| 239 | +So to keep that information we store them in the `content` field as a placeholder. |
| 240 | +But in the same time it can be used to calculate the global offset or range offset in the displayed text. |
| 241 | + |
| 242 | +In case if we have text for annotating: `<p>🐱\nmeans cat</p>` the whole content will be: |
| 243 | +`['🐱', ' ', 'm', 'e', 'a', 'n', 's', ' ', 'c', 'a', 't', '.']` |
| 244 | +When we create region over the word `cat` we can: |
| 245 | +- get the displayed text of the region by joining the content array from the 9th to the 11th element. |
| 246 | +(it is how it is displayed in the browser) |
| 247 | +- get the global offset of the region. It is exactly the number of elements in the content array till |
| 248 | +the region. ([8, 11]) |
| 249 | +- get an offset of the range related to the region. For that we need to sum the length |
| 250 | +of the content of all elements and in case of empty string consider it as a one |
| 251 | +(even if it is hidden) character. ([9, 12]) |
| 252 | + |
0 commit comments