Deprecated code analyzer for perl
After reading NPEREZ's blog article DarkPAN SchmarkPAN -- STOP THE MEME, it suddenly comes to me:
We need a tool that enable users to profile their codebase to verify if they are using any removed, deprecated or changed features of a new version of the perl interpreter.
This would make the task of identifying troublesome constructs in code much easier for your average perl programmer. I, for one, do not know when a particular feature of perl was introduced, or when another particaluar construct was deprecated (or removed). And if I read the perl changelog I'm not always aware of how to test if any particular code construct is in use in my code or not.
Wouldn't it be possible to make a tool (could possibly be named perldelta or perllint) that scans a piece of code (but doesn't run it) and gives out information about removed and deprecated constructs. To me this seems like a mix between warnings and Perl::Critic, but focused on identifying removed or deprecated constructs in just the perl interpreter instead of advocating general code style (as Perl::Critic does).
The tool could by default run against the release it is packaged with, but optionally also against older releases (with a --version option) so that you could test which version of perl really would cause problems for your app.
I'm not sure if all of the features that are deprecated/removed can (easily) be tested with this approach, but I'm fairly certain that it would go a long way towards enabling deprecation cycles in core perl development.
Another useful function this tool could have was to determine the minimum (or possibly maximum) useful version of the perl interpreter that your code can use.
- Let's say that your code uses pseudo-hashes (I don't even know how to identify those). They were removed somewhere in the 5.9.x-series.
- This means that your code is incompatible with 5.9 or newer.
- Let's also assume that you use some useful UTF-8 construct that makes you require 5.8.x.
- That means that as long as you have that pseudo-hash code in there your limited to running your code on perl 5.8.x.
- Let us then assume that a new (junior) programmer enters the stage and has heard about this fancy given/when thingy in perl 5.10 and starts using that in the same codebase.
- Suddenly you have a codebase that is in fact completely incompatible with any single perl version, but each of the parts are perfectly compatible with some perl version.
With a tool like what I have described it would be easy to determine a specific version requirement for your codebase (or CPAN module), and it would be even easier to pinpoint things that bites you now, or could possibly come and bite you in the near future.
This tool would also benefit distribution packagers in determining incompatibilities when they plan on upgrading to a new version of perl (which we should hope they would do frequently).
What I like about Ubuntu, as Gabor Szabo wrote about, compared to e.g. Debian, is that it has regular releases every 6 months. It makes me certain that if a certain feature comes along in some popular package, it is never more than 6 months away. Imagine if we could have this kind of predictability with perl. It would be almost like Christmas, only twice a year.
I'm not saying that this tool necessarily needs to be bundled with perl, but it should be trivial to get it to run on your machine so that you could profile your code and give you the necessary advice. Chromatic says that the
pumpkings need more help, and I believe this is a way to help them.
If the tool could also be made to output some kind of structured output (somewhat similar in concept to Debian popularity-contest) we could harness statistics from both CPAN and the DarkPAN (I really dislike that name) to figure out what features of the language are actually in use without divulging any actual copyrighted/restricted code. Imagine, for a second, that you run this tool against your code and then you upload a set of structured data that identifies what kind of code constructs you use in that code (this could happen automatically if you have it set to do that). If any of those constructs become deprecated or removed in the future you would get an email that says so (based on your opt-in/out settings).
If you want to go all out you could even store checksums (SHA1) of the files it is run against and store that value in the structured data, that way you could look up reports against a specific file without actually running it. Imagine running something like this recursively on a big project and have it generate a report of your entire codebase (including CPAN deps) and then with very little effort be able to see the obvious blockers towards upgrading to the next version of perl.
I also see that this could be used to better determine Kwalitee of a CPAN module.
If we actually implement those last parts about uploading structured data to a central repository it is absolutely vital that it contains no trace of either code or the identity of the owner of that piece of code. Because, if it does, no company would (probably) be willing to submit data into the repository. Identification of a particular report/datadump to a company, individual or named code should be completely optional. Anonymization is a vital key here.
I'm not exactly sure how we would be able to handle foreign code (C, C++, Java, etc.), or how XS ties into the picture, but I have a feeling it should be doable.
Please let me (and others) hear what you have to say about this subject in the comments.
First is Perl::Critic policies targetted at modernizing source code. This can provide a solution to many, but definitely not all compatibility problems.
Second is a Perl::Critic result database with features for anonymity.
These technical bits are not that hard, but still projects.
But the hard subproject is getting community participation.
I don't see how we can get better participation from the darkpan community. There are an unknown number of walled gardens where people assume that their way is the right one. There are probably countless more environments where people use Perl but are unaware of the Perl community. How do you reach these people?
To make average users aware of the tool at all is the hard part, as you say. Information about the tool should of course follow the normal perl documentation (hopefully some basic outline in the README). The tool should be easily installable on several platforms, hopefully packaged as a standalone PAR (with instructions on how to build, for the paranoid). Someone needs to write good articles on risk management and change management that mention the tool, so that project managers that normally don't see any of our articles start to think "are we using those methods to mitigate our risk factors?"
But really, first we need to make the tools. Then we need to make it easy to install, run and understand them. And finally we need to market them.
We cannot hope to change their walled garden ways, but at least we can try to inform them, and hope that they will understand the benefit of analyzing their code. Adam Kennedy mentioned that they had 300k lines of code. What if this tool could be run on that codebase and tell them that they had 10 occurrences of pseudo-hashes, 25 occurrences of @_[x] etc. (I'm not suggesting they have). It would be a tremendous help for them in tracking down obsolete constructs before they even start running their test suite. I'm just suggesting that we make things easier, not necessarily perfect.
WRT our code, running these kinda of automated tools is exactly what we do.
We've run Perl::Critic over our code a few times to cherry pick some specific policies we'd like to deal with.
We've also used Ovid's SQL injection detector a few times to look for (extremely) old code that did unsafe things.
Apart from Perl::Critic, take a look at Perl::MinimumVersion for another example.
You only need two things.
1. A collection of PPI search expressions.
2. A set of replacement expressions for the subset that you can automatically (and safely) upgrade.
WRT doing this in the large, I've already been experimenting with something like this.
I have a 7.5g MD5-indexed PPI::Cache of documents, and a metrics/detection plugin API.
One initial experiment was do locate all existing uses of the soon-to-be-deprecated $[ variable.
What was interesting was that a bunch were typos. So for some future automated replace code, you'd also want to try and look around to isolate known-good replace situations, and then bail out on the ones you can't be sure of.
But this whole problem is quite tractable, especially for the smaller deprecations.
I think what the doomsayers amoung us are trying to argue is that just because users don't talk to you, doesn't mean they aren't important.
The kind of automated tool you suggest would be hugely valuable. It lets us fix the DarkPAN without having to see it at all.
This sort of tool can allow DarkPAN, who may not be able to talk to us because of stupid legal concerns, to test their own code and lessen the fear of upgrading.
Comments are closed for this post.
|« Playing with Alias' new CPANDB||Lets make a perl appreciation survey for the masses »|