Why Do Developers Use Trivial Packages?

By Rabe Ab­dalkareem, Olivier Nourry, Sultan We­haibi, Suhaib Mu­jahid, and Emad Shihab.
In Joint Meeting of the European Soft­ware En­gin­eering Con­fer­ence and the ACM SIG­SOFT Sym­posium on the Found­a­tions of Soft­ware En­gin­eering (ESEC/F­SE). 2017.
Paper / Con­fer­ence

We saw last time that de­velopers are often wary of in­tro­du­cing new de­pend­en­cies un­less they’re really worth it, due to the in­ev­it­able cost of main­ten­ance. Why then do de­velopers also de­pend on so-c­alled “trivial pack­ages”? The left-pad fiasco of last year brought to light how ex­treme this situ­ation really is: a package providing 11 lines of code to left pad a string was pulled from npm, breaking thou­sands of other pack­ages which, dir­ectly or in­dir­ectly, de­pended on it.

This is the ques­tion which this survey paper sets out to an­swer. Firstly we get some quant­it­ative ana­lysis of trivial package use across 230,000 npm pack­ages and 38,000 ap­plic­a­tions, then a survey with 88 Node.js de­velopers trivial pack­ages.

What do we mean by a “trivial pack­age”? The au­thors ran­domly se­lected 16 npm pack­ages with between 4 and 250 lines of code and sent out a sur­vey, which got 12 re­sponses, asking whether each package was trivial or not, and why. Here’s an ex­ample, the is-­pos­itive pack­age:

module.ex­ports = func­tion (n) {
  re­turn to­String.call(n) === '[ob­ject Num­ber]' && n > 0;
};

Based on the survey re­sponses, the au­thors iden­ti­fied both length and cyc­lo­matic com­plexity of a package to be con­trib­uting factors to its tri­vi­al­ity:

Our survey in­dic­ates that size and com­plexity are com­monly used meas­ures to de­termine if a package is trivial. Based on our ana­lysis, pack­ages that have ≤ 35 JavaS­cript LOC and a Mc­Cabe’s cyc­lo­matic com­plexity ≤ 10 are con­sidered to be trivial.

You can quibble over this defin­i­tion (I might con­sider a longer but low-­com­plexity package to be trivial, for in­stance), but tri­vi­ality is ul­ti­mately a judge­ment call. No matter what metric the au­thors pick, there will be some who dis­agree.

How pre­valent are they? The au­thors fetched the latest ver­sion of every npm package as of the 5th of May 2016, giving 231,092 pack­ages, after re­moving 21,904 with no code. They also fetched all Node.js/npm ap­plic­a­tions on Git­Hub, giving 38,807 ap­plic­a­tions, after fil­tering out 76,814 with fewer than 100 com­mits or only one de­veloper.

Percentage of Published Trivial Packages on npm

Per­centage of Pub­lished Trivial Pack­ages on npm

Of the npm pack­ages, an in­cred­ible 28,845 (16.8%) are trivial pack­ages. Fur­ther­more, if we look at the pro­por­tion of pub­lished trivial pack­ages over time, we see that it’s going up! This graph is jagged, up until npm banned un­pub­lishing pack­ages in re­sponse to the left-pad in­cid­ent. I sus­pect this means that a lot of people used to pub­lish, and then al­most im­me­di­ately re­move, trivial pack­ages. Cur­rently, roughly 15% of the pack­ages added each month are trivial pack­ages.

Rather than looking at the en­tire data­base of pack­ages, we can also look at the most pop­ular:

npm posts the most de­pended-upon pack­ages on its web­site. We meas­ured the number of trivial pack­ages that exist in the top 1,000 most de­pended-upon pack­ages; we find that 113 of them are trivial pack­ages. This finding shows that trivial pack­ages are not only pre­valent and in­creasing in num­ber, but they are also very pop­ular among de­velopers, making up 11.3% of the 1,000 most de­pended on npm pack­ages.

When it comes to ap­plic­a­tions, the au­thors parsed the source code, looking for im­port state­ments, to handle cases where a pro­ject’s pack­age.json file (con­taining metadata for npm to build and run it) spe­cifies a de­pend­ency which isn’t used any­where. This gives, for each ap­plic­a­tion, a set of de­pend­en­cies which are used:

Fi­nally, we meas­ured the number of pack­ages that are trivial in the set of pack­ages used by the ap­plic­a­tions. Note that we only con­sider npm pack­ages since it is the most pop­ular package man­ager for Node.js pack­ages and other package man­agers only manage a subset of pack­ages. We find that of the 38,807 ap­plic­a­tions in our data set, 4,256 (10.9%) dir­ectly de­pend on at least one trivial pack­age.

How do de­velopers feel about them? Given how pop­ular trivial pack­ages are, we might sus­pect that de­velopers don’t con­sider them a prob­lem. This is in sharp con­trast to some view­points in How to Break an API, where de­velopers were wary of in­tro­du­cing new de­pend­en­cies. This part of the study was con­ducted as a survey of 88 de­velopers.

The reasons given are:

  • Trivial pack­ages provide well im­ple­mented and tested code (48 re­spond­ents)
  • Use of trivial pack­ages in­creases pro­ductivity (42 re­spond­ents)
  • Use of trivial pack­ages out­sources the main­ten­ance burden for that code to the package au­thors (8 re­spond­ents)
  • Use of trivial pack­ages helps read­ab­ility and re­duces com­plexity (8 re­spond­ents)
  • Use of a trivial pack­age, over a large lib­rary or frame­work, im­proves ap­plic­a­tion per­form­ance (3 re­spond­ents)

Only 7 re­spond­ents said they saw no reason to use trivial pack­ages.

The au­thors also asked for the draw­backs of using trivial pack­ages. Now we get some view­points closer to How to Break an API. The draw­backs given are:

  • The over­head of mon­it­oring de­pend­en­cies for up­dates (49 re­spond­ents)
  • The main­ten­ance burden of breaking changes (16 re­spond­ents)
  • De­creased build per­form­ance, due to the over­head of fetching and building more de­pend­en­cies (14 re­spond­ents)
  • De­creased de­veloper per­form­ance, due to needing to read more doc­u­ment­a­tion (11 re­spond­ents)
  • A missed learning op­por­tun­ity: it’s easier to use a package to solve a problem than to figure it out your­self (8 re­spond­ents)
  • Po­ten­tial se­curity risks in third-­party code (7 re­spond­ents)
  • Li­censing is­sues (3 re­spond­ents)

Only 7 re­spond­ents said they saw no draw­backs to using trivial pack­ages.

Are they well tested? Over half of the re­spond­ents said that a reason to use trivial pack­ages is that the code is per­ceived to be well im­ple­mented and tested. But is that really the case?

npm re­quires that de­velopers provide a test script name with the sub­mis­sion of their pack­ages (l­isted in the pack­age.json file). In fact, 81.2% (31,521 out of 38,845) of the trivial pack­ages in our dataset have some test script name lis­ted. However, since de­velopers can provide any script name under this field, it is di­fi­cult to know if a package is ac­tu­ally tested.

So the au­thors turn to the npms tool to col­lect met­rics about the trivial pack­ages in their data­set:

We ex­amine whether a package is really well tested and im­ple­mented from two as­pects; first, we check if a package has tests written for it. Second, since in many cases, de­velopers con­sider pack­ages to be ‘de­ploy­ment tested’, we also con­sider the usage of a package as an in­dic­ator of it being well tested and im­ple­men­ted. To care­fully ex­amine whether a package is really well tested and im­ple­men­ted, we use the npm on­line search tool (known as npms) to measure various met­rics re­lated to how well the pack­ages are tested, used and val­ued. To provide its ranking of the pack­ages, npms mines and cal­cu­lates a number of met­rics based on de­vel­op­ment (e.g., tests) and usage (e.g., no. of down­loads) data.

They used three npms met­rics to eval­uate how tested a package is:

  • “Tests”, a weighted sum of the size of the tests, the cov­erage per­cent­age, and the build status
  • “Com­munity in­terest”, de­rived from pop­ularity on GitHub
  • “Down­load count”, the number of down­loads in the last three months

The res­ults are not so prom­ising:

As an ini­tial step, we cal­cu­late the number of trivial pack­ages that have a Tests value greater than zero, which means trivial pack­ages that have some of tests. We find that only 45.2% of the trivial pack­ages have tests, i.e., a Tests value > 0.

So much for well tested!

Distribution of Tests, Community Interest, and Download Count metrics

Dis­tri­bu­tion of Tests, Com­munity In­terest, and Down­load Count met­rics

The au­thors also com­pare the met­rics of trivial pack­ages with non­trivial pack­ages. We see that the dis­tri­bu­tions are sim­ilar, though non­trivial pack­ages have a greater me­dian, which could easily be due to the size and com­plexity dif­fer­ence. The au­thors find that the dif­fer­ences are stat­ist­ic­ally sig­ni­fic­ant, but with small ef­fect size.

How much ef­fort is needed to keep up with new re­leases? The most cited draw­back for using trivial pack­ages was the extra over­head of needing to keep everything up­-to-d­ate.

Number of Releases for Trivial Packages Compared to Nontrivial Packages

Number of Re­leases for Trivial Pack­ages Com­pared to Non­trivial Pack­ages

There are a couple of ways to look at the im­pact of de­pend­en­cies. Firstly, the au­thors com­pare the number of re­leases. Trivial pack­ages tend to have fewer re­leases, so it seems that if you’re going to have a de­pend­ency, from a purely main­ten­ance per­spect­ive, a trivial de­pend­ency is the better op­tion.

The fact that the trivial pack­ages are up­dated less fre­quently may be at­trib­uted to the fact that trivial pack­ages ‘per­form less func­tion­al­ity’, hence they need to be up­dated less fre­quently

Distribution of Direct & Indirect Dependencies for Trivial and Nontrivial Packages

Dis­tri­bu­tion of Direct & In­direct De­pend­en­cies for Trivial and Non­trivial Pack­ages

Next the au­thors con­sider how many de­pend­en­cies (direct and in­dir­ect) trivial and non­trivial pack­ages have. In­tro­du­cing extra de­pend­en­cies in­creases the com­plexity of the de­pend­ency chain, so all else being equal, we would prefer to have fewer de­pend­en­cies.

The au­thors group pack­ages into four cat­egories by number of de­pend­en­cies:

  • 0: 56.3% of trivial pack­ages, 34.8% of non­trivial pack­ages
  • 1–10: 27.9% of trivial pack­ages, 30.6% of non­trivial pack­ages
  • 11–20: 4.3% of trivial pack­ages, 7.3% of non­trivial pack­ages
  • More: 11.5% of trivial pack­ages, 27.3% of non­trivial pack­ages

So de­velopers should be­ware extra de­pend­en­cies! Even though the source of a trivial package may be small, it may pull in many ad­di­tional pack­ages!

Trivial pack­ages have fewer re­leases and de­velopers are less likely to be ver­sion locked than non-trivial pack­ages. That said, de­velopers should be careful when using trivial pack­ages, since in some cases, trivial pack­ages can have nu­merous de­pend­en­cies. In fact, we find that 43.7% of trivial pack­ages have at least one de­pend­ency and 11.5% of trivial pack­ages have more than 20 de­pend­en­cies.

The bottom line The final sen­tence of the paper is short, snappy, and neatly sum­mar­ises all of what came be­fore:

Hence, de­velopers should be careful about which trivial pack­ages they use.

It prob­ably goes without say­ing, but I would apply this warning to all pack­ages, trivial and non­trivial.

Date
Tags
esec, fse, paper summary, research
Target Audience
Computer science people.