Afs*_*ani 17 javascript node.js
我试图全面了解我的问题.我需要编写一个Node.js程序,它应该能够检测函数的所有依赖项.
例如
function a() {
//do something
b();
};
function b() {
console.log("Hey, This is b");
};
Run Code Online (Sandbox Code Playgroud)
在上面的例子中,我需要一个像这样的JSON:
{
"a": {
dependencies: ["b"],
range: [1, 4]
},
"b": {
dependencies: [],
range: [5, 8]
}
}
Run Code Online (Sandbox Code Playgroud)
在dependencies属性中我需要有一个在函数内部调用的函数数组,range我的意思是函数定义的行范围.
我需要一个解决方案来实现这一目标.是否有Node.js的工具或插件?
Zir*_*rak 47
(I apologise in advance: I usually try and make my answers humorous to ease the reader through them, but I couldn't successfully do so in this case. Consider that a double apology for the length of this answer.)
This is not an easy problem. Instead of solving it in full, we will limit its scope - we will only solve the portion of the problem we care about. We will do so by parsing the input with a JavaScript parser and going over it with a simple recurive-descent algorithm. Our algorithm will analyse the program's scope and correctly identify function calls.
All the rest is just filling in the blanks! The result is at the bottom of the answer, so I recommend you grab to the first comment if you don't want to read through.
As Benjamin Gruenbaum's answer says, this is a very, very hard problem because of JavaScript's dynamic nature. However, what if instead of making a solution which'll work for 100% of programs, we instead do it for a subset of programs, if we limit ourselves to handle certain things?
The most important limitation:
eval.如果我们包括eval,它就会陷入混乱.这是因为eval允许你使用任意字符串,这使得跟踪依赖性成为不可能,而无需检查每个可能的输入.在NodeJS中没有document.writes,setTimeout只接受一个函数,所以我们不必担心这些.但是,我们也不允许使用vm模块.以下限制是为了简化该过程.它们可能是可以解决的,但解决它们超出了这个答案的范围:
obj[key]()让我很难引入这个限制,但它确实可以解决某些情况(例如key = 'foo'但不是key = userInput())var self = this.绝对可以使用完整的范围解析器解决.(a, b)()And finally, limitations to the implementation in this answer - either because of complexity constraints or time constraints (but they are very solvable):
foo.bar() or this.foo() would've at least doubled the program complexity. Put in enough time, and it's very doable.with statement, catch blocks). We don't deal with them.In this answer, I'll outline (and provide) a proof-of-concept parser.
Given a program, how can we decipher its function dependencies?
//A. just a global function
globalFunction();
//B. a function within a function
var outer = function () {
function foo () {}
foo();
};
//C. calling a function within itself
var outer = function inner () {
inner();
};
//D. disambiguating between two identically named functions
function foo () {
var foo = function () {};
foo();
}
foo();
Run Code Online (Sandbox Code Playgroud)
为了理解一个程序,我们需要将它的代码分开,我们需要理解它的语义:我们需要一个解析器.我选择了橡子,因为我从未使用它并听到了好评.我建议你稍微玩一下,看看SpiderMonkeys的AST中的程序是什么样的.
现在我们有一个神奇的解析器将JavaScript转换为AST(一个抽象语法树),我们将如何逻辑处理查找依赖关系?我们需要做两件事:
我们可以看出为什么上面的例子D可能含糊不清:有两个函数叫做foo,我们怎么知道哪一个foo()意味着什么?这就是我们需要实施范围界定的原因.
由于解决方案分为两部分,让我们这样解决.从最大的问题开始:
所以...我们有一个AST.它有一堆节点.我们如何建立范围?好吧,我们只关心功能范围.这简化了流程,因为我们知道我们只需要处理功能.但在我们讨论如何使用范围之前,让我们定义制作范围的函数.
范围有什么作用?这不是一个复杂的存在:它有一个父范围(或者null如果它是全局范围),它有它包含的项目.我们需要一种方法来向范围中添加内容,并从一个方面获取内容.我们这样做:
var Scope = function (parent) {
var ret = { items : {}, parent : parent, children : [] };
ret.get = function (name) {
if (this.items[name]) {
return this.items[name];
}
if (this.parent) {
return this.parent.get(name);
}
//this is fake, as it also assumes every global reference is legit
return name;
};
ret.add = function (name, val) {
this.items[name] = val;
};
if (parent) {
parent.children.push(ret);
}
return ret;
};
Run Code Online (Sandbox Code Playgroud)
As you may have noticed, I'm cheating in two aspects: First, I'm assigning child scopes. That is to make it easier for us measly humans to see that things are working (otherwise, all scoping would be internal, we'd only see the global scope). Second, I'm assuming the global scope contains all - that is, if foo isn't defined in any scope, then it must be an existing global variable. That may or may not be desirable.
OK, we have a way to represent scopes. Don't crack open the champagne yet, we still have to actually make them! Let's see how a simple function declaration, function f(){} looks like in AST:
{
"type": "Program",
"start": 0,
"end": 14,
"body": [{
"type": "FunctionDeclaration",
"start": 0,
"end": 14,
"id": {
"type": "Identifier",
"start": 9,
"end": 10,
"name": "f"
},
"params": [],
"body": {
"type": "BlockStatement",
"start": 12,
"end": 14,
"body": []
}
}]
}
Run Code Online (Sandbox Code Playgroud)
That's quite a mouthful, but we can brave through it! The juicy part is this:
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f"
},
"params": [ ... ],
"body": { ... }
}
Run Code Online (Sandbox Code Playgroud)
We have a FunctionDeclaration node with an id property. That id's name is our function's name! Let's assume we have a function walk which takes care of walking over nodes, and currentScope and currentFuncName variables, and we've just arrived at parsing our function declaration node. How do we do it? Code speaks louder than words:
//save our state, so we will return to it after we handled the function
var cachedScope = currentScope,
cachedName = currentFuncName;
//and now we change the state
currentScope = Scope(cachedScope);
currentFuncName = node.id.name;
//create the bindings in the parent and current scopes
//the following lines have a serious bug, we'll get to it later (remember that
// we have to meet Captain Crunchypants)
cachedScope.add(currentFuncName, currentName);
currentScope.add(currentFuncName, currentName);
//continue with the parsing
walk(node.body);
//and restore the state
currentScope = cachedScope;
currentFuncName = cachedName;
Run Code Online (Sandbox Code Playgroud)
But wait, what about function expressions? They behave a bit differently! First and foremost, they don't necessarily have a name, and if they do, it's only visible inside them:
var outer = function inner () {
//outer doesn't exist, inner is visible
};
//outer is visible, inner doesn't exist
Run Code Online (Sandbox Code Playgroud)
Let's make another huge assumption that we've dealt with the variable declaration part - we created the proper binding at the parent scope. Then, the logic above for handling functions changes only slightly:
...
//and now we change the state
currentScope = Scope(cachedScope);
//we signify anonymous functions with <anon>, since a function can never be called that
currentFuncName = node.id ? node.id.name : '<anon>';
...
if (node.id) {
currentScope.add(currentFuncName, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
cachedScope.add(currentFuncName, currentFuncName);
}
...
Run Code Online (Sandbox Code Playgroud)
And believe it or not, that's more or less the entire scope handling mechanism in the final solution. I expect as you add things like objects it'll get more a bit more complicated, but it not by much.
It's time to meet Captain Crunchpants. The very observant listener will by now have remembered example D. Let's freshen our memory:
function foo () {
function foo () {}
foo();
}
foo();
Run Code Online (Sandbox Code Playgroud)
In parsing that, we need a way to tell the outer foo and the inner foo apart - otherwise, we won't be able to know which of these foo calls, and our dependency finder will be toast. Furthermore, we won't be able to tell them apart in the dependency management - if we just add to the results by function name, we'll get overwriting. In other words, we need an absolute function name.
I chose to represent nesting with separation with a # character. The above, then, has a function foo, with an inner function foo#foo, with a call to foo#foo and a call to foo. Or, for a less confusing example:
var outer = function () {
function inner () {}
inner();
};
outer();
Run Code Online (Sandbox Code Playgroud)
Has a function outer and a function outer#inner. There's a call to outer#inner and a call to outer.
So, let's create this function which takes the previous name, and the current function's name, and mushes them together:
function nameToAbsolute (parent, child) {
//foo + bar => foo#bar
if (parent) {
return parent + '#' + name;
}
return name;
}
Run Code Online (Sandbox Code Playgroud)
And modify our function handling pseudo-code (which is about to come to life! I promise!):
...
currentScope = Scope(cachedScope);
var name = node.id ? node.id.name : '<anon>';
currentFuncName = nameToAbsolute(cachedName, name);
...
if (node.id) {
currentScope.add(name, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
cachedScope.add(name, currentFuncName);
}
Run Code Online (Sandbox Code Playgroud)
Now we're talking! It's time to move on to actually doing something! Maybe I've lied to you all along and I know nothing, maybe I failed miserably and I continued writing until now because I knew nobody will read this far and I'll get many upvotes because it's a long answer!?
HAH! Keep dreming! There's much more to come! I didn't sit on this for a few days for no reason! (As an interesting social experiment, could anyone upvoting comment, saying something around the lines "Captain Crunchpants was happy to see you"?)
On a more serious note, we should begin making the parser: What holds our state and walks over nodes. Since we'll have two parsers at the end, scope and dependency, we'll make a "master parser" which calls each one when needed:
var parser = {
results : {},
state : {},
parse : function (string) {
this.freshen();
var root = acorn.parse(string);
this.walk(root);
return this.results;
},
freshen : function () {
this.results = {};
this.results.deps = {};
this.state = {};
this.state.scope = this.results.scope = Scope(null);
this.state.name = '';
},
walk : function (node) {
//insert logic here
},
// '' => 'foo'
// 'bar' => 'bar#foo'
nameToAbsolute : function (parent, name) {
return parent ? parent + '#' + name : name;
},
cacheState : function () {
var subject = this.state;
return Object.keys( subject ).reduce(reduce, {});
function reduce (ret, key) {
ret[key] = subject[key];
return ret;
}
},
restoreState : function (st) {
var subject = this.state;
Object.keys(st).forEach(function (key) {
subject[key] = st[key];
});
}
};
Run Code Online (Sandbox Code Playgroud)
That's a bit of cruft, but hopefully it's understandable. We made state into an object, and to make it flexible, cacheState and restoreState are simply cloning/merging.
Now, for our beloved scopeParser:
var scopeParser = {
parseFunction : function (func) {
var startState = parser.cacheState(),
state = parser.state,
name = node.id ? node.id.name : '<anon>';
state.scope = Scope(startState.scope);
state.name = parser.nameToAbsolute(startState.name, name);
if (func.id) {
state.scope.add(name, state.name);
}
if (func.type === 'FunctionDeclaration') {
startState.scope.add(name, state.name);
}
this.addParamsToScope(func);
parser.walk(func.body);
parser.restoreState(startState);
}
};
Run Code Online (Sandbox Code Playgroud)
The casually observant reader will notice that parser.walk is empty. Time to fill 'er up!
walk : function (node) {
var type = node.type;
//yes, this is tight coupling. I will not apologise.
if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
scopeParser.parseFunction(node)
}
else if (node.type === 'ExpressionStatement') {
this.walk(node.expression);
}
//Program, BlockStatement, ...
else if (node.body && node.body.length) {
node.body.forEach(this.walk, this);
}
else {
console.log(node, 'pass through');
}
//...I'm sorry
}
Run Code Online (Sandbox Code Playgroud)
Again, mostly technicalities - to understand these, you need to play with acorn. We want to make sure we iterate and walk into nodes correctly. Expressions Nodes like (function foo() {}) has an expression property we walk over, BlockStatement Nodes (e.g. the actual body of a function) and Program Nodes have a body array, etc.
Since we have something resembling logic, let's try:
> parser.parse('function foo() {}').scope
{ items: { foo: 'foo' },
parent: null,
children:
[ { items: [Object],
parent: [Circular],
children: [],
get: [Function],
add: [Function] } ],
get: [Function],
add: [Function] }
Run Code Online (Sandbox Code Playgroud)
Neat! Play around with function declarations and expressions, see that they're nested correctly. We did however forget to include variable declaration:
var foo = function () {};
bar = function () {};
Run Code Online (Sandbox Code Playgroud)
A good (and fun!) exercise is adding them yourself. But don't worry - they'll be included in the final parser;
Who'd believe!? We're done with scopes! D-O-N-E! Let's do a cheer!
Oh oh oh...where did you think you're going!? We only solved part of the problem - we still have to find the dependencies! Or did you forget all about it!? Fine, you can go to the toilet. But it better be #1.
Wow, did you even remember we had section numbers? On an unrelated note, when I typed the last sentence, my keyboard made a sound reminiscent of the first note of the Super Mario Theme Song. Which is now stuck in my head.
Ok! So, we have our scopes, we have our function names, it's time to identify function calls! This will not take long. Doing acorn.parse('foo()') gives:
{
"type": "Program",
"body": [{
"type": "ExpressionStatement",
"expression": {
"type": "CallExpression",
"callee": {
"type": "Identifier",
"name": "f"
},
"arguments": []
}
}]
}
Run Code Online (Sandbox Code Playgroud)
So we're looking for a CallExpression. But before we go all walk over it, let's first review our logic. Given this node, what do we do? How do we add a dependency?
This is not a difficult problem, as we already took care of all the scoping. We add to the dependencies of the containing function (parser.state.name) the scope resolution of callExpression.callee.name. Sounds simple!
var deps = parser.results.deps,
scope = parser.state.scope,
context = parser.state.name || '<global>';
if (!deps[context]) {
deps[context] = [];
}
deps[context].push(scope.get(node.callee.name));
Run Code Online (Sandbox Code Playgroud)
There're, once again, a trick with handling the global context. If the current state is nameless, we assume it's the global context and give it a special name <global>.
Now that we have that, let's build our dependencyParser:
var dependencyParser = {
parseCall : function (node) {
...the code above...
}
};
Run Code Online (Sandbox Code Playgroud)
Truly beautiful. We still need to modify parser.walk to include CallExpressions:
walk : function (node) {
...
else if (type === 'CallExpression') {
dependencyParser.parseCall(node);
}
}
Run Code Online (Sandbox Code Playgroud)
And try it out on example D:
> parser.parse('function foo() { var foo = function () {}; foo(); } foo()').deps
{ foo: [ 'foo#foo' ], '<global>': [ 'foo' ] }
Run Code Online (Sandbox Code Playgroud)
HAHA! IN YOUR FACE, PROBLEM! WOOOOOOOOOOO!
You may commence celebrations. Remove your pants, run around in the city, claim you're the town chicken and burn stray garbage cans (Zirak and Affiliates in no way support arson of any kind or indecent exposure. Any action taken by oh, say, any reader is not to be blamed upon Zirak and/or Affiliates).
But seriously now. We solved a very, very limited subset of the problem, and to solve it for a small percentage of real-case scenarios there are a lot of things which have to be done. This is not a discouragement - quite the opposite! I urge you to try and do this. It's fun! (Zirak and Affiliates are in no way responsible for any mental breakdown as a result from trying to to what was just said)
Presented here is the source code of the parser, sans any NodeJS specific stuff (i.e. requiring acorn or exposing the parser):
var parser = {
results : {},
state : {},
verbose : false,
parse : function (string) {
this.freshen();
var root = acorn.parse(string);
this.walk(root);
return this.results;
},
freshen : function () {
this.results = {};
this.results.deps = {};
this.state = {};
this.state.scope = this.results.scope = Scope(null);
this.state.name = '';
},
walk : function (node) {
var type = node.type;
//yes, this is tight coupling. I will not apologise.
if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
scopeParser.parseFunction(node)
}
else if (type === 'AssignmentExpression') {
scopeParser.parseBareAssignmentExpression(node);
}
else if (type === 'VariableDeclaration') {
scopeParser.parseVarDeclaration(node);
}
else if (type === 'CallExpression') {
dependencyParser.parseCall(node);
}
else if (node.type === 'ExpressionStatement') {
this.walk(node.expression);
}
//Program, BlockStatement, ...
else if (node.body && node.body.length) {
node.body.forEach(this.walk, this);
}
else if (this.verbose) {
console.log(node, 'pass through');
}
//...I'm sorry
},
// '' => 'foo'
// 'bar' => 'bar#foo'
nameToAbsolute : function (parent, name) {
return parent ? parent + '#' + name : name;
},
cacheState : function () {
var subject = this.state;
return Object.keys( subject ).reduce(reduce, {});
function reduce (ret, key) {
ret[key] = subject[key];
return ret;
}
},
restoreState : function (st) {
var subject = this.state;
Object.keys(st).forEach(function (key) {
subject[key] = st[key];
});
}
};
var dependencyParser = {
//foo()
//yes. that's all.
parseCall : function (node) {
if (parser.verbose) {
console.log(node, 'parseCall');
}
var deps = parser.results.deps,
scope = parser.state.scope,
context = parser.state.name || '<global>';
if (!deps[context]) {
deps[context] = [];
}
deps[context].push(scope.get(node.callee.name));
}
};
var scopeParser = {
// We only care about these kinds of tokens:
// (1) Function declarations
// function foo () {}
// (2) Function expressions assigned to variables
// var foo = function () {};
// bar = function () {};
//
// Do note the following property:
// var foo = function bar () {
// `bar` is visible, `foo` is not
// };
// `bar` is not visible, `foo` is
/*
function foo () {}
=>
{
"type": 'FunctionDeclaration',
"id": {
"type": Identifier,
"name": 'foo'
},
"params": [],
"body": { ... }
}
(function () {})
=>
{
"type": "FunctionExpression",
"id": null,
"params": [],
"body": { ... }
}
*/
parseFunction : function (func) {
if (parser.verbose) {
console.log(func, 'parseFunction');
}
var startState = parser.cacheState(),
state = parser.state,
name = this.grabFuncName(func);
state.scope = Scope(startState.scope);
state.name = parser.nameToAbsolute(startState.name, name);
if (func.id) {
state.scope.add(name, state.name);
}
if (func.type === 'FunctionDeclaration') {
startState.scope.add(name, state.name);
}
this.addParamsToScope(func);
parser.walk(func.body);
parser.restoreState(startState);
},
grabFuncName : function (func) {
if (func.id) {
return func.id.name;
}
else if (func.type === 'FunctionExpression') {
return '<anon>';
}
else {
//...this shouldn't happen
throw new Error(
'scope.parseFunction encountered an anomalous function: ' +
'nameless and is not an expression');
}
},
/*
[{
"type": "Identifier",
"name": "a"
}, {
"type": "Identifier",
"name": "b"
}, {
"type": "Identifier",
"name": "c"
}]
*/
addParamsToScope : function (func) {
var scope = parser.state.scope,
fullName = parser.state.name;
func.params.forEach(addParam);
function addParam (param) {
var name = param.name;
scope.add(name, parser.nameToAbsolute(fullName, name));
}
},
parseVarDeclaration : function (tok) {
if (parser.verbose) {
console.log(tok, 'parseVarDeclaration');
}
tok.declarations.forEach(parseDecl, this);
function parseDecl (decl) {
this.parseAssignment(decl.id, decl.init);
}
},
// Lacking a better name, this:
// foo = function () {}
// without a `var`, I call a "bare assignment"
parseBareAssignmentExpression : function (exp) {
if (parser.verbose) {
console.log(exp, 'parseBareAssignmentExpression');
}
this.parseAssignment(exp.left, exp.right);
},
parseAssignment : function (id, value) {
if (parser.verbose) {
console.log(id, value, 'parseAssignment');
}
if (!value || value.type !== 'FunctionExpression') {
return;
}
var name = id.name,
val = parser.nameToAbsolute(parser.state.name, name);
parser.state.scope.add(name, val);
this.parseFunction(value);
}
};
var Scope = function (parent) {
var ret = { items : {}, parent : parent, children : [] };
ret.get = function (name) {
if (this.items[name]) {
return this.items[name];
}
if (this.parent) {
return this.parent.get(name);
}
//this is fake, as it also assumes every global reference is legit
return name;
};
ret.add = function (name, val) {
this.items[name] = val;
};
if (parent) {
parent.children.push(ret);
}
return ret;
};
Run Code Online (Sandbox Code Playgroud)
Now if you'll excuse me, I need a long shower.
没有.
对不起,这在使用eval的动态语言中的理论水平上是不可能的.好的IDE可以检测基本的东西,但是有些东西你根本无法检测到:
我们来看你的简单案例:
function a() {
//do something
b();
};
Run Code Online (Sandbox Code Playgroud)
让我们复杂一点:
function a() {
//do something
eval("b();")
};
Run Code Online (Sandbox Code Playgroud)
现在我们必须检测字符串中的内容,让我们前进一步:
function a() {
//do something
eval("b"+"();");
};
Run Code Online (Sandbox Code Playgroud)
现在我们必须检测字符串concats的结果.让我们再做几个:
function a() {
//do something
var d = ["b"];
eval(d.join("")+"();");
};
Run Code Online (Sandbox Code Playgroud)
还不开心吗?我们编码吧:
function a() {
//do something
var d = "YigpOw==";
eval(atob(d));
};
Run Code Online (Sandbox Code Playgroud)
现在,这些是一些非常基本的案例,我可以根据需要将它们复杂化.这里真的是到处跑的代码没有办法 -你必须对所有可能的输入运行它,并检查,我们都知道,这是不切实际的.
将依赖关系作为参数传递给函数并使用控制反转.始终明确您的更复杂的依赖关系而不是隐含的.这样你就不需要工具来知道你的依赖是什么:)