JupyterLab DocumentSearch SVG Bug
Issue: #17349
Problem
DocumentSearch extension searches through unsupported nodes inside Jupyter Notebooks. This was found by a user reporting that they were searching for the plot title of a Plotly chart, and after half an hour they found that the search for their plot title was actually stripping the plot.
This was traced to the plot having a SVG tag that was being searched despite it being in the unsupported nodes list. Example is fairly trivial to produce, you just need to embed an SVG tag within your cell output and search for content within that tag like a text tag.
Root Cause
Going through the debugger the issue ended up being that the expected nodeName was searched to be capitalized, however the SVG tag’s node name was lower case.
The reason the svg could even possibly be lower case was because the guarantee to be uppercase was for HTML.Element.nodeName which corresponds to the Element.tagName. SVG is actually based off XML and not HTML and as such in parsing the case is preserved, which is why Element.tagName can be lower case for SVG’s. An element is parsed using the XML parser when the document type accepts XML input such as text/xml.
Weirdly enough when you set the node using Element.innerHTML, the string that is parsed via the DOMParser for text/html MIME types always converts the SVG’s tagName property to lower case despite the input string.
d = new DOMParser(); d.parseFromString('<svg></svg>', 'text/html').body.firstChild.nodeName
// "svg"
d = new DOMParser(); d.parseFromString('<SVG></SVG>', 'text/html').body.firstChild.nodeName
// "svg"
d = new DOMParser(); d.parseFromString('<svg></svg>', 'text/xml').firstChild.nodeName
// "svg"
d = new DOMParser(); d.parseFromString('<SVG></SVG>', 'text/xml').firstChild.nodeName
// "SVG"