Solving a Payment Processing Problem: When API Response Sizes Go Overboard
In software development, even a tiny detail—like the size of a data response—can lead to a cascade of challenges. Recently, our team encountered an issue while integrating with QuickBooks Online (QBO) that blocked one of our client's customers from processing invoice payments for months. Here’s a behind-the-scenes look at how we tackled the problem.
The Problem: An “Unexpected EOF” Error
When making an API call to QBO to fetch payment information for invoices, our system was returning the error message:
“Failed to read both success and failure payloads from HTTP response: unexpected EOF”
What Does “EOF” Mean?
In programming, EOF stands for “End of File.” An “unexpected EOF” indicates that the program reached the end of the data stream sooner than it was supposed to, as if the file or data ended abruptly. This error was not just a minor hiccup—it stopped a customer’s payments from processing, affecting their business operations for several months which was a huge problem.
The Technical Background: Understanding Data Limits
What could be causing this problem? Early in the process we thought maybe this service was running in a container or environment with a small memory allocation and this was a legitimate Out of Memory error. There were no indications of this in the logs and monitoring software, and the configured limit seemed to be well above any potential issues. Still it was an option on the table.
After digging into the code, we discovered that our client's system uses a tool in the Go programming language called an io.LimitReader. This tool is designed to read data from a stream but only up to a certain size, to avoid issues like running out of memory either organically or through a potential attack. Initially, this limit was set to 5 MiB (Mebibytes, where 1 MiB is roughly 1.05 million bytes. Almost - but not quite - the same size as a standard Megabyte).
This code was deep in the codebase, being used to parse all JSON response objects being returned from APIs. Working in code like this that affects core processes of a live system is tricky because changing the wrong thing can easily shut down the entire system and cause huge problems.You have to take your time, make sure you understand what you are changing, and run many tests before pushing anything into production. We were understandably hesitant to make any changes to this area.
Early logs didn’t show the exact size of the response from QBO. However, when we replicated the API call using Postman (a popular tool for testing APIs), we estimated the response size to be around 5.5 MiB. With that in mind, our first thought was simple: increase the limit to 7 MiB. We decided to move slowly here by not increasing the value too much so we could manage the innate risks associated with working with such core systems.
The Debugging Journey: From 7 MiB to 20 MiB
First Attempt: Increasing to 7 MiB
- Expectation: Since our Postman tests showed a 5.5 MiB response, a 7 MiB limit should have been sufficient.
- Reality: The issue persisted, and payments still couldn’t be processed. We quickly reverted the change while we continued investigating. Essentially going back to the drawing board because if this piece of code wasn't causing the problem then there was no other obvious source of the issue.
The Breakthrough: Reproducing the Issue in JavaScript
To get a clearer picture, we built a small script in JavaScript. This script mimicked our API call and processed the response in small chunks, allowing us to measure the exact size of the data returned.
What We Discovered:
- The API call wasn’t just fetching a single invoice. Instead, it was a conglomerate request:
- Paginated Calls: QBO uses multiple paginated calls to retrieve data in smaller segments.
- Data Aggregation: Our client's system then grouped together payment IDs from multiple invoices into one large API call.
- This is where we failed in the initial investigation: by only taking the results of the first of the paginated calls for invoices and retrieving the payment IDs from there, we were missing a significant chunk of the results from the second API call for the payment details. This resulted in our reported size being smaller than the real size.
- The actual size of the response came out to be 7.24 MiB—slightly over our initial 7 MiB cap. So close and yet so far.
The Final Fix: Setting the Limit to 20 MiB
Armed with this insight, we decided to update the limit to 20 MiB.
What about our concerns about setting it too high in the first place? We had consulted with the infrastructure team to be confident that the deployed service had enough RAM to handle this increase, which it did. We also triple checked the logs and metrics of the deployed system to make sure that we weren’t already seeing any potential memory issues, which we weren’t. We were well within the current system limits on all fronts so that alleviated our concern here and gave us the confidence to move forward with this change.
This change:
- Immediately Resolved the Error: Payments could now be processed without encountering the “unexpected EOF” issue.
- Future-Proofed Our Integration: The higher limit gives us room to request more data from QBO in a single call, which is beneficial because it means:
- Fewer API Calls: Instead of making many small requests, we can batch them together.
- Efficiency Gains: Batching helps us stay within QBO’s strict API rate limits, improving overall performance.
Despite the small size of the resulting code change, this was a huge win for our client and their customers. The issue was deep in the codebase and hadn't been resolved for many months. In fact, if we hadn't started seeing this issue in this one instance and eventually resolved it then it could have reared its head further down the line and potentially caused even more issues.
Lessons Learned and the Path Forward
This experience highlights a couple of key lessons:
- Always Validate Assumptions: Initial tests using tools like Postman are invaluable, but they might not capture the full complexity of real-world API calls.
- Build Custom Diagnostic Tools: Sometimes, the best way to understand a problem is to recreate it in a controlled environment—just as we did with our JavaScript script.
- Plan for Growth: By setting a higher data limit, we not only solved the immediate problem but also opened up opportunities for more efficient API usage in the future.
By walking through this debugging process, we’ve learned how even seemingly small technical constraints can have significant impacts. We’re now better equipped to handle similar challenges in the future, ensuring that our integrations are robust, efficient, and ready for the demands of real-world business operations.
Whether you’re a seasoned developer or someone who’s just curious about what goes on behind the scenes in software development, we hope this story provides a clear picture of the challenges and triumphs that come with building reliable systems. Happy coding!