Xervo

NPM Install JSONStream

NPM Install JSONStream

JSONStream is one of the oldest and most useful libraries in npm. It allows you to process JSON from any readable stream in realtime without wasting memory and buffering data. It supports real time parsing from multiple format styles (elements within an array, line delimited JSON, etc).

A useful API to play with when using JSONStream is npm’s “skim database.” It’s a public CouchDB instance with all the package metadata and none of the tarballs. Since it’s just CouchDB it comes with an HTTP JSON interface and several streaming options.

var request = require('request')
  , jsonStream = require('JSONStream')
  ;
request({url: 'http://isaacs.couchone.com/registry/_all_docs'})
  .pipe(jsonStream.parse('rows.*'))
  .on('data', function (data) {
    console.log(data.id)
  })

This will show you a stream of package names from the npm database being printed to the console. The parse directive “rows.*” means that it parses values in the “rows” property as a stream. The response for this npm endpoint is one gigantic object, in fact it’s probably too big to parse if we weren’t streaming it.

JSONStream can also parse individual JSON objects as a stream.

request({url: 'http://skimdb.npmjs.com/registry/_changes?feed=continuous'})
  .pipe(jsonStream.parse())
  .on('data', function (data) {
    console.log(data.id)
  })

CouchDB’s “continuous” feed is line delimited JSON objects. JSONStream will detect the separator if you don’t pass any directives to .parse(). The nice thing about this example is that this feed will stay open and print new package names as they are published or updated in the npm registry.

Let’s do something more interesting, let’s only print packages that depend on `request`.
request({url: 'http://skimdb.npmjs.com/registry/_changes?feed=continuous&include_docs=true'})
  .pipe(jsonStream.parse())
  .on('data', function (data) {
    var doc = data.doc
    if (doc['dist-tags'] && doc['dist-tags'].latest) {
      var latest = doc.versions[doc['dist-tags'].latest]
      if (latest.dependencies && latest.dependencies.request) {         
        console.log(data.id)
      }
    }
  })

By adding “&include_docs=true” we get the package metadata and can now write a smart filter to find out if the latest version of this package requires `request`.

You can see how easy JSONStream makes working with this kind of data, all of which would not be possible if we had to pull all the data in these API endpoints in to memory.

What is Xervo?

Xervo makes deploying applications in the public cloud or your own data center easy. Node.js, PHP, Java, Python, Nginx, and MongoDB supported. Full Docker support included in Enterprise version. It’s free to get started.

Share This Article

comments powered by Disqus