Reactivesearch: Feature: Use _msearch to send bulk requests

Created on 15 Mar 2018  路  15Comments  路  Source: appbaseio/reactivesearch

Issue Type:

Enhancement

Platform:

Both

Description:

Use _msearch to optimize the number of search requests sent to the ES backend. See #294 for what happens currently. We send requests proportional to the number of components connected to the one which sees an interaction.

Reactivesearch version: x.y.z

latest

enhancement

Most helpful comment

@mhurne It wasn't that hard to get started, if you know the basics of node.js, it shouldn't be a problem.

they already provide a starting proxy-server here: https://github.com/appbaseio-apps/reactivesearch-proxy-server/

and it is just ready for you to implement your logic: https://github.com/appbaseio-apps/reactivesearch-proxy-server/blob/master/index.js#L38

I didn't find it difficult at all to start, and the approach of creating a proxy server is a great idea, not just for security, but also for rate limiting (if you want to implement it)

If you face any challenges, the guys here are very helpful, I've had great experience whenever I had a question

All 15 comments

Released in v2.5.0 馃帀

Hi @siddharthlatest & @metagrover,

Thanks for working on this feature :)

I tested it locally with v2.6.0 and I am facing 2 issues:

  1. I am using a proxy app to customize the query (for security), and I threw an error that the last line has to be new line

    • So I updated the encoding of the request from application/json to application/x-ndjson and added a newline after stringifying the request, but it didn't work as well
    • __Q:__ what is your guidelines or recommendations to update the query using a proxy app with reactivesearch and multi-request queries
  2. For the sake of testing the new feature, I updated my app to point to elasticsearch directly, not the broken proxy app, and it worked, but I noticed that I am getting multiple msearch queries

    • I thought there is something wrong with my code, so I checked the hosted Booksearch demo, but it was using v2.0.0-beta, so I cloned it locally, updated the library to v2.6.0 and tested it and still got multiple `msearch queries

    skaermbillede 2018-04-24 kl 19 14 32

    • __Q:__ are there any requirements or configurations to use the new msearch
    • And can I control it, meaning can I toggle it on and off -one use case would be testing the query of each component independently-

Again thanks for the amazing work you guys do in @appbaseio :) love using your tools <3

Hi @a-magdy, thanks for bringing this up!

I updated the encoding of the request from application/json to application/x-ndjson and added a newline after stringifying the request, but it didn't work as well. What is your guidelines or recommendations to update the query using a proxy app with reactivesearch and multi-request queries.

What errors are you seeing? Make sure that you are correctly parsing the x-ndjson data. We will cover this in our docs properly. I will post the link here once they are up! An example of how this can look:

// to support JSON-encoded bodies
app.middleware('parse', bodyParser.json({
 limit: '50mb'
}));

// to support x-ndjson
app.middleware('parse', bodyParser.text({
 type: 'application/x-ndjson'
}));

Regarding multiple _msearch queries

Yes, this is an expected behaviour. At the time of component mount, the component queries get fired independent of other components (in the order of mounting). The msearch's combined querying comes into play when there is chain of queries need to be fired due to an interaction on a source component, for ex: Let's suppose that X is being watched by Y and Z. When X gets updated, the core of the system is aware of its subscribers and can then fire a combined query for Y and Z, but this is not feasible at the time of mount, since mounting process is the learning ground for the core where it comes to know what all components are present and how their subscription model looks like.

I'm open to suggestions if you have any better way to achieve this 馃槃

Hope that helps!

I'd highly recommend you to try Server Side Rendering wherein, we only make one _msearch query to feed data to all the reactive components.

馃憠 Read all about it here in the introductory blog post.

@metagrover Thanks for your reply

I am manipulating the request body, after parsing it as json using bodyParser: it was like:

const app = express();

app.use(cors());

app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: false }));

app.use((req, res, next) => {

  // Here I check my custom autherization header and manipulate the `req.body` json object

  next();
})

// Define proxy options
const options = {
  target: _elastic_host_url_,
  changeOrigin: true,               // needed for virtual hosted sites
  // ws: true,                         // proxy websockets
  onProxyReq: (proxyReq, req, res, options) => {
    if(req.body) {
      let bodyData = JSON.stringify(req.body);
      console.log({ bodyData });
      // In case if content-type is application/x-www-form-urlencoded -> we need to change to application/json
      proxyReq.setHeader('Content-Type','application/json');
      proxyReq.setHeader('Content-Length', Buffer.byteLength(bodyData));
      // Stream the content
      proxyReq.write(bodyData);
    }
  }
};

app.use("*", proxy(options));

And it was working fine with json format

I've been testing with the new format, hoping that bodyParser libs would have something to help with ndjson, but I couldn't find any references to ndjson instead of json
I even tried writing my own ndjson middleware, copying the content of the json one and updating this line to use ndjson parser, but no success

I tried a lot of other tools: can-ndjson-stream, parse-ndjson, but I couldn't get them to run in the proper order

The last trial I did, which I think I'll go with for now, is updating the code above to:

const app = express();

app.use(cors());

// app.use(bodyParser.json());
app.use(bodyParser.text( { type: 'application/x-ndjson' } ));
app.use(bodyParser.urlencoded({ extended: false }));

app.use((req, res, next) => {

  // Here I check my custom autherization header

  // And now I'll get req.body as a multi-line string with ndjson format -json object in each line-
  // So I am thinking of just using string split by `\n` and manipulate every 2nd line with the same logic as a single request body

  next();
})

// Define proxy options
const options = {
  target: _elastic_host_url_,
  changeOrigin: true,               // needed for virtual hosted sites
  // ws: true,                         // proxy websockets
  onProxyReq: (proxyReq, req, res, options) => {
    if(req.body) {
      // TODO test this to properly stringify the req.body as `ndjson` (don't forget the trailing `/n`)
      let bodyData = JSON.stringify(req.body);
      console.log({ bodyData });
      // In case if content-type is application/x-www-form-urlencoded -> we need to change to application/json
      // proxyReq.setHeader('Content-Type','application/json');
      proxyReq.setHeader('Content-Type','application/x-ndjson');
      proxyReq.setHeader('Content-Length', Buffer.byteLength(bodyData));
      // Stream the content
      proxyReq.write(bodyData);
    }
  }
};

app.use("*", proxy(options));

But by default, removing any manipulations works by just updating the middlewares and encoding in the proxy config

For the multi _msearch requests, I understand it now, it is combining searches after the initial loading :+1: really nice :)

And for the __SSR__, I think it is a fantastic idea, will definitely try it, I think it'll be really beneficial in different use cases :) 馃帀

This is nice @a-magdy, I was trying something similar with:

app.use(bodyParser.text( { type: 'application/x-ndjson' } ));

But the requests were not going through (probably because of ndjson -> text) conversion. Will try to play around with your config 馃檪

I could get the req.body to have something like:

"
{..json object..}
{..json object..}

"

If it didn't work for you, please tell me

I'll try to share the snippet once it is done

Here's how I'm doing it now, let me know if you think it can be improved.

@divyanshu013 Thanks for sharing it

I tried is locally and it works for in my app, I think it is good as is :+1:

Here is my version though:

import _ from 'lodash';
import express from "express";
import proxy from "http-proxy-middleware"; // https://github.com/chimurai/http-proxy-middleware
import btoa from "btoa";
import cors from 'cors';
import jwt from 'jwt-simple';
import bodyParser from 'body-parser';

const options = {
  target: _elastic_host_,
  changeOrigin: true,               // needed for virtual hosted sites
  // ws: true,                         // proxy websockets
  onProxyReq: (proxyReq, req, res, options) => {

    if(req.body) {
      let bodyData = req.body;

      proxyReq.setHeader('Content-Type','application/x-ndjson');
      proxyReq.setHeader('Content-Length', Buffer.byteLength(bodyData));

      // Stream the content
      proxyReq.write(bodyData);
    }
  }
};

const app = express();

app.get("/ping", (req, res) => {
  res.status(200);
  res.json({ pong: new Date().toISOString() });
  res.end();
});

app.use(cors());

app.use(bodyParser.text( { type: 'application/x-ndjson' } ));

app.use(bodyParser.urlencoded({ extended: false }));

/* This is how we can extend this logic to do extra stuff before
 * sending requests to our backend for example doing verification
 * of access tokens or performing some other task */
app.use((req, res, next) => {
  const token = req.headers && (req.headers.Authorization || req.headers.authorization);

  if (!token) {
    res.sendStatus(403);
    throw new Error('Missing token');
  }

  const decoded = jwt.decode(token, _secret_);

  if (!token || !decoded || !decoded.iss || decoded.exp <= Date.now()) {
    res.sendStatus(403);
    throw new Error('Permission denied');
  }

  // TODO write a logic to splits ndjson requests and applies the transformer to each 2nd json line in the ndjson request

  return next();
});

const templateBodyWithCountAggregation = {
  aggs: {
    "by_index": {
        "terms": {
            "field": "_index"
        }
    }
  }
};

// // TODO enable/disable this using environment variables
// // Add doc_count aggregation here if it doesn't already exists
// app.use((req, res, next) => {

//   // console.log({ body: req.body });
//   // At this point there should be a req.body json object
//   // TODO this is commented until it is fixed for ndjson requests (with multiple json lines)
//   req.body = _.merge(req.body, templateBodyWithCountAggregation);

//   next();
// })

/* Here we proxy all the requests from reactivesearch to our backend */
app.use("*", proxy(options));

const port = process.env.__PORT__;
app.listen(port, () =>
  console.log(`Server running at http://localhost:${port} 馃殌`)
)

Looks great 馃榿

I am glad it made you laugh :D !!

@a-magdy Somewhat off-topic, but I'm curious - in practice, how easy/difficult have you found implementing and maintaining a proxy to secure your search data to be? I'm interested in using Reactivesearch but am concerned about that aspect.

@mhurne It wasn't that hard to get started, if you know the basics of node.js, it shouldn't be a problem.

they already provide a starting proxy-server here: https://github.com/appbaseio-apps/reactivesearch-proxy-server/

and it is just ready for you to implement your logic: https://github.com/appbaseio-apps/reactivesearch-proxy-server/blob/master/index.js#L38

I didn't find it difficult at all to start, and the approach of creating a proxy server is a great idea, not just for security, but also for rate limiting (if you want to implement it)

If you face any challenges, the guys here are very helpful, I've had great experience whenever I had a question

Thanks for the info, @a-magdy

Was this page helpful?
0 / 5 - 0 ratings

Related issues

davidklebanoff picture davidklebanoff  路  4Comments

scheiblr picture scheiblr  路  3Comments

calebdel picture calebdel  路  4Comments

vharitonsky picture vharitonsky  路  4Comments

mihalo picture mihalo  路  4Comments