HTTP: redirects, POST and GET requests, and the “lost” data

By | 11/23/2019
 

We have a web application which has to accept POST-requests from clients.

In front of this application, there is some proxy service, no matter which – initially, we faced the issues on an AWS’s Application Load Balancer, then I reproduced them with NGINX, and it will “work” for any other proxying system.

Besides proxying – this service also performs an HTTP (80) to HTTPS (443) redirect.

The problem appears accurately during such a redirect:

  1. a client sends a POST requests via HTTP
  2. a proxy returns 301 or 302 redirect to HTTPS
  3. the client then sends a request via HTTPS, but:
    1. in some cases this POST becomes GET
    2. or it still will be POST but all its data will be “lost”

A testing environment setup

For the testing purpose, let’s use the next configuration:

  1. a NINGX accepts API-requests
  2. NGINX passes the via proxy_pass by HTTP to a backend-service
    • to reproduce the POST to GET problem – a backend with a Go application in Docker container will be used
    • to reproduce the “lost” data and empty Сontent-length – a Python script will be used to run as a web-server

NGINX

Just an common config – NGINX, listen on 80, makes redirects to HTTPS:

server {

    listen 80;

    server_name dev.poc.example.com;

    location / {

        return 302 https://dev.poc.example.com$request_uri;
    }
}
...

And a server 443 {} – also common config with a proxy_pass to the backend on an 8081 port:

...
server {

    listen       443 ssl;
    server_name  dev.poc.example.com;

    ...

    location / {

        proxy_pass http://localhost:8081;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

    }
}

Go-app in Docker

We have multiple containers in a stack there, but currently we are interested in a go-queue-consumer with a local NGINX (the whole initial system still in the Proof of Concept state now, so don’t wonder about too many NGINXes here).

We’re just interested in its logs with a POST or GET methods.

Python web-server

And another web-server to play the backend role to see the data length issue – quickly googled Python script:

#!/usr/bin/env python3
"""
Very simple HTTP server in python for logging requests
Usage::
    ./server.py [<port>]
"""
from http.server import BaseHTTPRequestHandler, HTTPServer
import logging

class S(BaseHTTPRequestHandler):
    def _set_response(self):
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.end_headers()

    def do_GET(self):
        logging.info("GET request,\nPath: %s\nHeaders:\n%s\n", str(self.path), str(self.headers))
        self._set_response()
        self.wfile.write("GET request for {}".format(self.path).encode('utf-8'))

    def do_POST(self):
        content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
        post_data = self.rfile.read(content_length) # <--- Gets the data itself
        logging.info("POST request,\nPath: %s\nHeaders:\n%s\n\nBody:\n%s\n",
                str(self.path), str(self.headers), post_data.decode('utf-8'))

        self._set_response()
        self.wfile.write("POST request for {}".format(self.path).encode('utf-8'))

def run(server_class=HTTPServer, handler_class=S, port=8081):
    logging.basicConfig(level=logging.INFO)
    server_address = ('', port)
    httpd = server_class(server_address, handler_class)
    logging.info('Starting httpd...\n')
    try:
        httpd.serve_forever()
    except KeyboardInterrupt:
        pass
    httpd.server_close()
    logging.info('Stopping httpd...\n')

if __name__ == '__main__':
    from sys import argv

    if len(argv) == 2:
        run(port=int(argv[1]))
    else:
        run()

Examples: reproduce the Problem

A POST‘s data lost after a redirect

The first thing we faced with was the fact that after HTTP to HTTPS redirects – our POST requests lose their data.

To reproduce it – let’s use the Python script mentioned above.

Run it on the 8081 port:

[simterm]

root@ip-10-0-15-118:/home/admin# ./test_post.py 8081
INFO:root:Starting httpd...

[/simterm]

And run curl with POST and some data in --data:

[simterm]

$ curl -vL -X POST http://dev.poc.example.com/ -d "param1=value1&param2=value2"
...
*   Trying 52.***.***.224:80... 
...
> Content-Length: 27
...
< HTTP/1.1 302 Moved Temporarily
...
< Content-Length: 161
< Connection: keep-alive
< Location: https://dev.poc.example.com/
...
> POST / HTTP/1.1
> Host: dev.poc.example.com
> User-Agent: curl/7.67.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 502 Bad Gateway
< Server: nginx/1.10.3
...
< Content-Type: text/html
< Content-Length: 173

[/simterm]

Let’s see what’ happening here:

  1. a POST request is sent via HTTP, Content-Length: 27
  2. a 302 redirect issued to HTTPS
  3. and a POST request is sent via HTTPS, Content-Length: 173

In the NGINX logs, we can see a standard HTTP POST:

[simterm]

...
==> /var/log/nginx/dev.poc.example.com-error.log <==
2019/11/23 09:52:41 [error] 19793#19793: *51100 upstream prematurely closed connection while reading response header from upstream, client: 194.***.***.26, server: dev.poc.example.com, request: "POST / HTTP/1.1", upstream: "http://127.0.0.1:8081/", host: "dev.poc.example.com"

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:09:52:41 +0000] "POST / HTTP/1.1" 502 173 "-" "curl/7.67.0"
...

[/simterm]

But let’s take a look at the stderr of the Python-server running:

[simterm]

...
Exception happened during processing of request from ('127.0.0.1', 38224)
Traceback (most recent call last):
  File "/usr/lib/python3.5/socketserver.py", line 313, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 341, in process_request
    self.finish_request(request, client_address)
  File "/usr/lib/python3.5/socketserver.py", line 354, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib/python3.5/socketserver.py", line 681, in __init__
    self.handle()
  File "/usr/lib/python3.5/http/server.py", line 422, in handle
    self.handle_one_request()
  File "/usr/lib/python3.5/http/server.py", line 410, in handle_one_request
    method()
  File "./test_post.py", line 22, in do_POST
    content_length = int(self.headers['Content-Length']) # <--- Gets the size of data
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
...

[/simterm]

Pay attention at those lines:


content_length = int(self.headers[‘Content-Length‘])
TypeError: int() argument must be a string, a bytes-like object or a number, not ‘NoneType

E.g., the application got an empty/missing Content-Length field.

Still, if issue a new request directly via HTTP (with the redirect disabled on NGINX) or via HTTPS – everything will work, as expected:

[simterm]

$ curl -vL -X POST https://dev.poc.example.com/ -d "param1=value1&param2=value2"
...
> POST / HTTP/1.1
> Host: dev.poc.example.com
> User-Agent: curl/7.67.0
> Accept: */*
> Content-Length: 27
> Content-Type: application/x-www-form-urlencoded
> 
* upload completely sent off: 27 out of 27 bytes
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.10.3
< Date: Sat, 23 Nov 2019 09:55:07 GMT
< Content-Type: text/html
< Transfer-Encoding: chunked
< Connection: keep-alive
< 
* Connection #0 to host dev.poc.example.com left intact
POST request for /

[/simterm]

And the Python’s application stdout:

[simterm]

...
INFO:root:POST request,
Path: /
Headers:
Host: dev.poc.example.com
X-Real-IP: 194.***.***.26
X-Forwarded-For: 194.***.***.26
X-Forwarded-Proto: https
Connection: close
Content-Length: 27
User-Agent: curl/7.67.0
Accept: */*
Content-Type: application/x-www-form-urlencoded

Body:
param1=value1&param2=value2

127.0.0.1 - - [23/Nov/2019 09:55:07] "POST / HTTP/1.0" 200 -

[/simterm]

Now, let’s go to the next second problem.

A POST became GET

During debugging the problem discussed above, when a backend didn’t receive data during POST I found another “exciting” behavior of a request during HTTP-redirects.

Let’s see how our POST request will become… a GET request!

The details and the root cause will be examined later on this post, and now let’s reproduce the problem using two HTTP-clients – curl and Postman.

curl

Run a request with the POST type via HTTP with -L to follow a redirect to the HTTPS.

In this case, we are using not the Python script, used above, but our real backend Docker container to demonstrate its work and to check its logs.

The data itself, as well as any errors, have no value here – we are interested only in requests methods used – POST and GET.

[simterm]

$ curl -vL -X POST http://dev.poc.example.com/skin/api/v1/receipt -d "{}"
...
> POST /skin/api/v1/receipt HTTP/1.1
> Host: dev.poc.example.com
> User-Agent: curl/7.67.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 400 Bad Request
< Server: nginx/1.10.3
< Date: Sat, 23 Nov 2019 10:07:37 GMT
< Content-Type: application/json; charset=utf-8
< Content-Length: 58
< Connection: keep-alive
< 
* Connection #1 to host dev.poc.example.com left intact
{"message":"Validation failed: unable to parse json body"}

[/simterm]

Once again – do not look at the errors here – instead, let’s go to the NGINX’s logs:

[simterm]

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:10:07:37 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 400 58 "-" "curl/7.67.0"

[/simterm]

Seems all good here? A POST was sent – a POST was received.

Just to check – let’s run curl with explicitly specified GET type to see the backend’s response and NGINX logs in this case:

[simterm]

$ curl -L -X GET  http://dev.poc.example.com/skin/api/v1/receipt -d "{}"
404 page not found

[/simterm]

And NGINX logs with this GET:

[simterm]

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:10:07:37 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 400 58 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:09:57 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "-" "curl/7.67.0"

[/simterm]

All good, all seems to be correct – no problems at all, huh?

Postman

But now, let’s use Postman to send the same request to the same endpoint: a POST via HTTP to trigger and follow a redirect to the HTTPS:

And?…

And – let’s see NGINX logs now:

[simterm]

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:10:07:37 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 400 58 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:09:57 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:11:44 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "http://dev.poc.example.com/skin/api/v1/receipt" "PostmanRuntime/7.19.0"

[/simterm]

Errr…

What?!?

But I sent an explicitly specified POST request?

Once again:

  1. make a request with curl, HTTP => HTTPS redirect issued, POST in logs – all good
  2. make a request with Postman, HTTP => HTTPS redirect issued, but GET in logs – WTF???

POST, GET and a “lost” data

Well, now we can understand where our data went – because our POST became GET.

The root cause, 3xx redirects, and HTTP RFC

Actually, both problems are caused by the same reason.

Let’s start our investigation by reading the 301 code description in the RFC 2616 – https://tools.ietf.org/html/rfc2616#section-10.3.2, especially its Note:

Note: When automatically redirecting a POST request after
receiving a 301 status code, some existing HTTP/1.0 user agents
will erroneously change it into a GET request.

I.e., some clients after sending POST and receiving 301 – will change the requests’ type to the GET.

But this only the very beginning!

During the further reading, at the 302 code description in the same RFC 2016 – https://tools.ietf.org/html/rfc2616#section-10.3.3 we’ll see another Note:

Note: RFC 1945 and RFC 2068 specify that the client is not allowed
to change the method on the redirected request. However, most
existing user agent implementations treat 302 as if it were a 303
response, performing a GET on the Location field-value regardless
of the original request method. The status codes 303 and 307 have
been added for servers that wish to make unambiguously clear which
kind of reaction is expected of the client.

RFC 1945 about 3хх redirects – https://tools.ietf.org/html/rfc1945#section-9.3

RFC 2068 about 3хх redirects – https://tools.ietf.org/html/rfc2068#section-10.3.2

I.e., RFC 1945 and RFC 2068 states that a client has no change type for a redirected request, but most of them will treat the 302 code as… 303!

Let’s go to the next part and read about the 303 code – https://tools.ietf.org/html/rfc2616#section-10.3.4:

The response to the request can be found under a different URI and
SHOULD be retrieved using a GET method on that resource.

I.e., when a client considers that it received the 303 code – it always will perform the GET.

So, what we saw with the Postman case (and our mobile application clients where we faced that issue at first):

  1. a client performs POST by HTTP
  2. gets a redirect to the HTTPS with either 301 or 302
  3. considers it as 303
  4. and changes its own request’s type sent via HTTPS from to the GET, “losing” all original’s data

The Solution

I found the solution after reading the Mozilla’s documentation (although there was a hint in the RFC 2016 Notes for the 302), which says about 301 and 302 redirects:

It is therefore recommended to set the 302 code only as a response for GET or HEAD methods and to use 307 Temporary Redirect instead, as the method change is explicitly prohibited in that case.

Okay – let’s update our NGINX and change code 302 to the 307:

server {

    listen 80;
...
    location / {

        # return 302 https://dev.poc.example.com$request_uri;
        return 307 https://dev.poc.example.com$request_uri; 
    }
}
...

Reload configs, and repeat the request from curl:

[simterm]

$ curl -L -X POST http://dev.poc.example.com/skin/api/v1/receipt -d "{}"
{"message":"Validation failed: fields 'hardware_id' and 'receipt' are mandatory"}

[/simterm]

And we got a valid response from the backend, which just asks for a normal data now.

Redirect worked, a POST request was received.

And the NGINX log:

[simterm]

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:10:07:37 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 400 58 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:09:57 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:11:44 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "http://dev.poc.example.com/skin/api/v1/receipt" "PostmanRuntime/7.19.0"
194.***.***.26 - - [23/Nov/2019:10:35:51 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 422 81 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:36:00 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 422 81 "-" "curl/7.67.0"

[/simterm]

Repeat the request from the Postman:

NGINX logs:

[simterm]

==> /var/log/nginx/dev.poc.example.com-access.log <==
194.***.***.26 - - [23/Nov/2019:10:07:37 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 400 58 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:09:57 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:11:44 +0000] "GET /skin/api/v1/receipt HTTP/1.1" 404 18 "http://dev.poc.example.com/skin/api/v1/receipt" "PostmanRuntime/7.19.0"
194.***.***.26 - - [23/Nov/2019:10:35:51 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 422 81 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:36:00 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 422 81 "-" "curl/7.67.0"
194.***.***.26 - - [23/Nov/2019:10:37:57 +0000] "POST /skin/api/v1/receipt HTTP/1.1" 422 81 "http://dev.poc.example.com/skin/api/v1/receipt" "PostmanRuntime/7.19.0"

[/simterm]

All works, as expected.

AWS Application Load Balancer redirects

Unfortunately, I wasn’t able to find how to set 307 when using AWS ALB:

That’s all, folks!

Useful links

Those problems on other sites

Helped to find the solution