Monday, June 23, 2014

Curl showing garbage when downloading a URL

I am new to 'curl'. Late I know but better later than never.

Anyway, I faced an issue downloading from a certain URL where the output, either to a file or the terminal. will show garbage like contents
▒▒O▒▒p\YF▒!▒a▒▒▒1▒_>▒▒▒W▒nS▒h▒▒▒▒Li▒▒+▒n▒GBw▒▒@8▒}▒W▒▒▒▒1(▒H7▒▒▒w:F~▒▒rFϒ1Sl+▒㎉▒0-1▒()▒▒
▒E▒G<3xggy amp="" blockquote="" c="" cr="" cu="" d1="" d="" e="" ey="" g="" h="" hp="" i="" ji="" jzpn="" k="" m="" nbsp="" p="" sf="" u="" uyg="" v="" w="" y="">
▒c▒▒oS▒▒▒!▒U]▒l▒▒▒Vy▒Y▒+N▒݄҅▒▒▒$+▒]▒▒▒,▒q▒▒▒▒{BJ▒▒#p ▒V▒~]?▒j▒▒▒5▒?▒4▒4▒q▒lS▒▒{4m▒▒▒
▒^▒E▒▒▒}tA90▒▒Y▒▒▒ɍ@mkZ▒$nX▒ɤ6▒▒▒O[root@h10141 ~]# PuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTYPuTTY

It was puzzling because there were no issues viewing the content when using a standard browser.

Anyway, the issue is actually because the content on that URL is compressed, therefore you have to issue the '--compressed' option to download such content with curl.

So if a certain URL is returning garbage, check the header. You can use the '-D ' option to download the header to a file first and check if the contents are actually compressed, as shown below. 

HTTP/1.1 200 OK
Server: HP HTTP Server; 
Content-Type: text/xml
Content-Encoding: gzip
Content-Length: 5392
Cache-Control: must-revalidate, max-age=0
Pragma: no-cache