Hernan Vivani's Blog

Linux, Big Data, AWS, Astronomy, Running, Cycling… and more

Posted on May 28, 2015 by hvivani

If you want to explore how to parallelize the data ingestion into Elasticsearch, please have a look to this post I have written for Amazon AWS:

It explains how to index Common Crawl metadata into Elasticsearch using Cascading connector directly from the S3 data source.

Cascading Source Code is available here.