How to Bypass Server Cache When Retrieving Web Content?

08 Jul, 2018

2 min read

Once at a job interview I was asked how to bypass the server cache when retrieving web content. Let’s picture a situation: caching is enabled on the server side, and you have no access to reset the server cache. How can you get a web page bypassing the server cache?

CONTENTS

Why this trick will work?

What are the situations where this technique will not work?

Resetting the browser cache or clearing cookies are client-side operations that are familiar to anyone who deals with development and testing of web applications and websites. Also, CMS (content management systems) and e-commerce platforms usually provide administrative tools to reset the server cache. But can you bypass it, if you have no access to the administration tools?

How to bypass server cashe

You certainly can. All you have to do is add GET parameter to the URL. The link will change and the browser will request the page bypassing the server cache.

For example:

https://www.google.com/ => https://www.google.com/?1

If the URL already contains some parameters, you can add a ‘fake’ parameter:

https://www.google.com/search?q=hello => https://www.google.com/search?q=hello&1=1

Why this trick will work?

Parameters in links are not just something additional or optional. They are very important for content management. For example, parameters in the link can send a page number for paged output, a code of a product or service, a search request, filtrating conditions, sorting order, and other data. The content of the page may vary significantly for different parameters. It would be strange if a website displayed only one particular product when different products are selected or showed only the first page while scrolling a list. Also developers and testers, who have ever performed a website check for broken hyperlinks using Xenu, Screaming Frog SEO Spider, or other utilities, probably have paid attention that an online store with 20-30 dynamically generated pages can contain tens of thousands of unique links that only differ in parameters.

What are the situations where this technique will not work?

If the load balance on the server is not set up correctly, the server response can be wrong too.
If the changes you want to see are not in the page content, but in the files loaded by the page (js, css, iframe, images, …), it may take more effort to get their modified copies.
You have to add GET parameter to each link you follow to get the modified copies of the pages you need.
In case of hard locked XSS (cross-site scripting), when all unexpected or non-valid GET parameters may redirect to 404 Page.

You can read more here:

https://stackoverflow.com/questions/7062680/html-link-that-bypasses-cache

Why this trick will work?

What are the situations where this technique will not work?