Data validation on REST API: Don’t trust clients input
REST API’s have been becoming more and more common on Web applications/tools around the Internet, specially for those on open social context. It is also undebatable how the REST style architectures have been growing in enterprise applications, probably as a possible and more light-weight alternative for architecture to expose services, and/or produce a consumer-based application. REST solution are easy to visualize, specially because thinking in resource are on the most of the cases a natural way to think, very close to domains. Associating those things with the “simple”, fast and pragmatic way to implement a REST service, probably it is one of the principal reasons why so many applications in different scenarios have been adopting REST style in some way.
Although REST style architecture shouldn’t be so hard to define and implement(otherwise software developers would avoid it), there are several things that must be considered when adopting a REST solution, such as: modeling your resources, types, REST enterprise development, resource reuse, think about consumer-based services, data representation, apply standardization, URI strategy and the flexible programming model to complement it, and others. In additional to these things that should be considered, when go more deeply and start to design/implement the REST based-services/API, others concerns come up, for example error/notification messages and data input validation. It is important to keep in mind that probably the REST services will be consumed by severals different clients(or maybe not), and the quality/integrity data the services are receiving cannot be guarantee, different from a Web application(in most of the cases) where the presentation layer probably has a validator mechanism to be sure about what will be sent through(of course it doesn’t put away the validation moment into others layers). Taking it into account, a validation mechanism analyzing different possibilities becomes important on the REST API, guaranteeing that clients will be always notifying correctly instead of just get an “exception” back and have a way to keep track of the errors once they happen. When something unexpected happens(invalid data or system troubles for example) let the clients know that there is a problem or special situation that the system identified, notifying the clients through a correct message which must be wrapper in a previous known format and send back on the HTTP Response body following a specific HTTP code. Other important point is avoid risks on the service side having a safe code, understand the potential risks to the software and to make sure to have the appropriate mitigations in place, applying potential data validation mechanism to help with code quality too and don’t be affected for bad structured data comes from.
Going through the data validation input on REST, the REST API needs to ensure that this “layer” on the application is robust against all client’s input data, whether obtained from the external entities or known applications. Basically two approaches are useful(of course several others can be considered):
- Integrity checks – ensure that data has not been tampered.
- Validation – ensure that the data is strongly typed, correct syntax, within length boundaries, contains only permitted characters, or if numeric is correctly signed and within range boundaries, etc.
It is important to define the message error format, because the HTTP code associate with message error “envelop” is the way for REST API notify the clients in details about what happened, keep in mind that in a simple data input validation different types of validation rules can be broken, then a strategy to notify in a unique response all errors got during the process can be interesting. In additional to that to have a request ID in a error message help to keep track of the happenings in the logs.
The validation cab be performed on every tier as desired, however, on each layer should contains specifics validations trying to be complementary during the event process. Basically thinking in the REST “layer”, the validation approach could be:
- Accept known good – Just the known data will be accepted and send through the layers. The data received through some data representation(for example: JSON/XML) must be parsed to the specific type and just the substantial fields must be checked for the event triggered. There are so many kind of interesting validator frameworks that can be “injected” into the code to validate the received data, following the domain definitions and keeping the data integrity between the layers.
- Make clean – In an effort to have a “safe” input try to clean eliminating or removing not desired data.
- No validation – Although this approach must be considered unsafe and strongly discouraged, specially because it brings a non defensive code and fragile services on the REST API, maybe the team wants to considered this way taking into account a totally secure and known environment and client applications, just pushing the input validation in a non explicit way for others “layers”, it means for example just the validation on database through constraints will be used.
And in this talk, the last but not least, is the checking URI, where depending of the method on the API and the URI form, specific validation can be necessary. It is important remember that most of the REST frameworks on different platforms provide mechanism to convert a “parameter” that forms a URI into a specific type, for example: /users/mary/project/55, the parameter project ID in this case “55″ will be convert to a long on the specific method responsible for that. But if the client application tried: /users/mary/project/55XYZ probably the client application will receive a error from the framework during the convert moment, instead of receive a message informing that the provided ID is not valid for a project. It means that adopt a strategy to receive all parameters as String and then check/valid it of them internally in the code, can provide the alternative to send correct/custom messages for the client application. It always depend how much details the software wants to take over. Still in this direction, for example the relation between “mary” and “project 55″ must be check, to guarantee that “mary” has rights on “project 55″ and can retrieve/edit data there. This simple case has been approaching here because in several cases the REST API is “resource integrity”, then the data just there is in a specific resource, where for example: /users/john/projects/55 is not allow, because project 55 is a “mary” project and ”john” cannot do nothing there, then a specific rule restrictions on the persistence layer must be in place.
