jQuery 源码系列（五）sizzle 后续

newtrek 发布于2019-08-21 15:13 / 1158人阅读

摘要：欢迎来我的专栏查看系列文章。现在我们再来理一理数组，这个数组目前是一个多重数组，现在不考虑逗号的情况，暂定只有一个分支。源码源码之前，来看几个正则表达式。

欢迎来我的专栏查看系列文章。

select 函数

前面已经介绍了 tokensize 函数的功能，已经生成了一个 tokens 数组，而且对它的组成我们也做了介绍，下面就是介绍对这个 tokens 数组如何处理。

DOM 元素之间的连接关系大概有 > + ~ 几种，包括空格，而 tokens 数组中是 type 是有 tag、attr 和连接符之分的，区分它们 Sizzle 也是有一套规则的，比如上一章我们所讲的 Expr 对象，它真的非常重要：

Expr.relative = {
  ">": { dir: "parentNode", first: true },
  " ": { dir: "parentNode" },
  "+": { dir: "previousSibling", first: true },
  "~": { dir: "previousSibling" }
};

Expr.relative 标记用来将连接符区分，对其种类又根据目录进行划分。

现在我们再来理一理 tokens 数组，这个数组目前是一个多重数组，现在不考虑逗号的情况，暂定只有一个分支。如果我们使用从右向左的匹配方式的话，div > div.seq h2 ~ p，会先得到 type 为 TAG 的 token，而对于 type 为 ~ 的 token 我们已经可以用 relative 对象来判断，现在来介绍 Expr.find 对象：

Expr.find = {};
Expr.find["ID"] = function( id, context ) {
  if ( typeof context.getElementById !== "undefined" && documentIsHTML ) {
    var elem = context.getElementById( id );
    return elem ? [ elem ] : [];
  }
};
Expr.find["CLASS"] = support.getElementsByClassName && function( className, context ) {
  if ( typeof context.getElementsByClassName !== "undefined" && documentIsHTML ) {
    return context.getElementsByClassName( className );
  }
};
Expr.find["TAG"] = function(){...};

实际上 jQuery 的源码还考虑到了兼容性，这里以 find["ID"] 介绍：

if(support.getById){
  Expr.find["ID"] = function(){...}; // 上面
}else{
  // 兼容 IE 6、7
  Expr.find["ID"] = function( id, context ) {
    if ( typeof context.getElementById !== "undefined" && documentIsHTML ) {
      var node, i, elems,
        elem = context.getElementById( id );

      if ( elem ) {

        // Verify the id attribute
        node = elem.getAttributeNode("id");
        if ( node && node.value === id ) {
          return [ elem ];
        }

        // Fall back on getElementsByName
        elems = context.getElementsByName( id );
        i = 0;
        while ( (elem = elems[i++]) ) {
          node = elem.getAttributeNode("id");
          if ( node && node.value === id ) {
            return [ elem ];
          }
        }
      }

      return [];
    }
  };
}

可以对 find 对象进行简化：

Expr.find = {
  "ID": document.getElementById,
  "CLASS": document.getElementsByClassName,
  "TAG": document.getElementsByTagName
}

以后还会介绍 Expr.filter。

select 源码

源码之前，来看几个正则表达式。

var runescape = /([da-f]{1,6}[x20	
f]?|([x20	
f])|.)/gi
//这个正则是用来对转义字符特殊处理，带个反斜杠的 token
runescape.exec("ab"); //["ab", "ab", undefined]
var rsibling = /[+~]/; //匹配 +、~

matchExpr["needsContext"] = /^[x20	
f]*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:([x20	
f]*((?:-d)?d*)[x20	
f]*)|)(?=[^-]|$)/i
//needsContext 用来匹配不完整的 selector
matchExpr["needsContext"].test(" + p")//true
matchExpr["needsContext"].test(":first-child p")//true
//这个不完整，可能是由于抽调 #ID 导致的

而对于 runescape 正则，往往都是配合 replace 来使用：

var str = "ab";
str.replace(runescape, funescape);
var funescape = function (_, escaped, escapedWhitespace) {
  var high = "0x" + escaped - 0x10000;
  // NaN means non-codepoint
  // Support: Firefox<24
  // Workaround erroneous numeric interpretation of +"0x"
  return high !== high || escapedWhitespace ? escaped : high < 0 ?
  // BMP codepoint
  String.fromCharCode(high + 0x10000) :
  // Supplemental Plane codepoint (surrogate pair)
  String.fromCharCode(high >> 10 | 0xD800, high & 0x3FF | 0xDC00);
}

我完全看不懂啦，你们自己意会去吧，O(∩_∩)O哈哈~

var select = Sizzle.select = function (selector, context, results, seed) {
  var i, tokens, token, type, find, compiled = typeof selector === "function" && selector,
    match = !seed && tokenize((selector = compiled.selector || selector));

  results = results || [];

  // 长度为 1，即表示没有逗号，Sizzle 尝试对此情况优化
  if (match.length === 1) {
    tokens = match[0] = match[0].slice(0);
    // 第一个 TAG 为一个 ID 选择器，设置快速查找
    if (tokens.length > 2 && (token = tokens[0]).type === "ID" && context.nodeType === 9 && documentIsHTML && Expr.relative[tokens[1].type]) {
      //将新 context 设置成那个 ID
      context = (Expr.find["ID"](token.matches[0].replace(runescape, funescape), context) || [])[0];
      if (!context) {
        // 第一个 ID 都找不到就直接返回
        return results;

      // 此时 selector 为 function，应该有特殊用途
      } else if (compiled) {
        context = context.parentNode;
      }

      selector = selector.slice(tokens.shift().value.length);
    }

    // 在没有 CHILD 的情况，从右向左，仍然是对性能的优化
    i = matchExpr["needsContext"].test(selector) ? 0 : tokens.length;
    while (i--) {
      token = tokens[i];

      // 碰到 +~ 等符号先停止
      if (Expr.relative[(type = token.type)]) {
        break;
      }
      if ((find = Expr.find[type])) {
        // Search, expanding context for leading sibling combinators
        if ((seed = find(
        token.matches[0].replace(runescape, funescape), rsibling.test(tokens[0].type) && testContext(context.parentNode) || context))) {
          // testContext 是判断 getElementsByTagName 是否存在
          // If seed is empty or no tokens remain, we can return early
          tokens.splice(i, 1);
          selector = seed.length && toSelector(tokens);
          //selector 为空，表示到头，直接返回
          if (!selector) {
            push.apply(results, seed);
            return results;
          }
          break;
        }
      }
    }
  }

  // Compile and execute a filtering function if one is not provided
  // Provide `match` to avoid retokenization if we modified the selector above
  (compiled || compile(selector, match))(
  seed, context, !documentIsHTML, results, !context || rsibling.test(selector) && testContext(context.parentNode) || context);
  return results;
}

toSelector 函数是将 tokens 除去已经选择的将剩下的拼接成字符串：

function toSelector(tokens) {
  var i = 0,
    len = tokens.length,
    selector = "";
  for (; i < len; i++) {
    selector += tokens[i].value;
  }
  return selector;
}

在最后又多出一个 compile 函数，是 Sizzle 的编译函数，下章讲。

到目前为止，该优化的都已经优化了，selector 和 context，还有 seed，而且如果执行到 compile 函数，这几个变量的状态：

selector 可能已经不上最初那个，经过各种去头去尾；

match 没变，仍是 tokensize 的结果；

seed 事种子集合，所有等待匹配 DOM 的集合；

context 可能已经是头（#ID）；

results 没变。

可能，你也发现了，其实 compile 是一个异步函数 compile()()。

总结

select 大概干了几件事，

将 tokenize 处理 selector 的结果赋给 match，所以 match 实为 tokens 数组；

在长度为 1，且第一个 token 为 ID 的情况下，对 context 进行优化，把 ID 匹配到的元素赋给 context；

若不含 needsContext 正则，则生成一个 seed 集合，为所有的最右 DOM 集合；

最后事 compile 函数，参数真多...

参考

jQuery 2.0.3 源码分析Sizzle引擎 - 解析原理

本文在 github 上的源码地址，欢迎来 star。

欢迎来我的博客交流。

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/88135.html

jQuery 源码系列（四）Tokens 词法分析

摘要：欢迎来我的专栏查看系列文章。我们以为例，这是一个很简单的，逗号将表达式分成两部分。这是针对于存在的情况，对于不存在的情况，其就是的操作，后面会谈到。参考源码分析引擎词法解析选择器参考手册本文在上的源码地址，欢迎来。欢迎来我的专栏查看系列文章。在编译原理中，词法分析是一个非常关键的环节，词法分析器读入字节流，然后根据关键字、标识符、标点、字符串等进行划分，生成单词。Sizzle 选择...

rollback 2019-08-21 15:12 评论0 收藏0
jQuery 源码系列（三）sizzle 选择器

摘要：原本是中用来当作选择器的，后来被单独分离出去，成为一个单独的项目，可以直接导入到项目中使用。。本来我们使用当作选择器，选定一些或，使用或就可以很快锁定所在的位置，然后返回给当作对象。的优势使用的是从右向左的选择方式，这种方式效率更高。欢迎来我的专栏查看系列文章。 Sizzle 原本是 jQuery 中用来当作 DOM 选择器的，后来被 John Resig 单独分离出去，成为一个单独...

icyfire 2019-08-20 14:20 评论0 收藏0
jQuery 源码系列（六）sizzle 编译

摘要：一种比较合理的方法就是对应每个可判断的生成一个闭包函数，统一进行查找。根据关系编译闭包函数，为四组编译函数主要借助和。第四步将所有的编译闭包函数放到一起，生成函数。欢迎来我的专栏查看系列文章。 compile 讲了这么久的 Sizzle，总感觉差了那么一口气，对于一个 selector，我们把它生成 tokens，进行优化，优化的步骤包括去头和生成 seed 集合。对于这些种子集合，...

Terry_Tai 2019-08-20 14:21 评论0 收藏0
jQuery 源码系列（二）init 介绍

摘要：源码中接受个参数，空参数，这个会直接返回一个空的对象，。，这是一个标准且常用法，表示一个选择器，这个选择器通常是一个字符串，或者等，表示选择范围，即限定作用，可为，对象。，会把普通的对象或对象包装在对象中。介绍完入口，就开始来看源码。欢迎来我的专栏查看系列文章。 init 构造器前面一讲总体架构已经介绍了 jQuery 的基本情况，这一章主要来介绍 jQuery 的入口函数 jQu...

Tony_Zby 2019-08-21 15:13 评论0 收藏0
jQuery 源码系列（七）Callbacks 函数

摘要：的支持的方法有几个主要的，和，比如官方有一个例子这两个作为函数调用的生成从基本可以看出，函数生成了一个对象，这个对象的方法是添加回调函数，而方法则是执行回调函数。欢迎来我的专栏查看系列文章。讲真，Sizzle 的源码真的太压抑了，以至于写 Sizzle 文章的这段时间里都非常的痛苦，刚开始觉得它还挺有意思的，越到后面越觉得代码很难读懂，烦。寒假也过完了，在家里待了两周的时间，感觉...

timger 2019-08-20 14:27 评论0 收藏0