![]() The simplest way to do what you want is to create a dictionary and specify your headers directly, like so: import requests Here is a list of HTTP header fields, and you'd probably be interested in request-specific fields, which includes User-Agent. Of course, if there’s a real reason (for example the site requires a widevine level which is currently not supported on Linux), the site will not work on a Linux based browser regardless of the user agent string.The user-agent should be specified as a field in the header. Sometimes the website owner tries to sniff the OS version from the user agent string, so may serve the good content only for Windows users (now I can’t tell which site this was - something entertaining/streaming) worked with Windows browsers well, but not on Linux: changing user agent string to lie about the browser (pretending running on Windows) solved the problem, and it worked without problem. They sniffed the agent string, but checked only the version number, and indeed, that was way lower than a current Firefox version, but they did not make further checks, wether it runs on Windows, is it a “normal” version, or the ESR, etc. Of course, after I changed the user agent string, everything worked smooth, which clearly prooves that being “incompatible” is a wrong statement. For example, my bank considered my that time browser, Firefox ESR too old and insecure and incompatible (wich is a wrong statement), so they served me a page that reminded me to update my browser. Sometimes this all is misused, and website owners use sniffing the user agent string just to restrict acces to content for no real reason. This requires proper cooperation between websites developer, the hosting servers maintenence team, and of course the developers of that browser… Based on that the server may decide to serve different content: normally, say, if there’s a feature in Mozilla, which Internet Explorer does not have, the server will render a page for Mozilla that benefits that feature, but will not try to use that feature for Internet Explorer. They are there to introduce your browser to the server. I am not up on user-agents… could you explain what they do? Many thanks again for your help, Neville. I don´t know if it works all the time but it should be worth a try. If for any reason wget isn´t allowed to download something, try it with setting another user-agent. Registered socket 3 for persistent reuse. GET /linux-command-line-cheat-sheet/ HTTP/1.1 X509 certificate successfully verified and matches host -request begin. Handshake successful connected socket 3 to SSL handle 0x0000559baa4b5df0 Reading HSTS entries from /home/rosika/.wget-hstsĬonverted file name 'index.html' (UTF-8) -> 'index.html' (UTF-8) Setting -user-agent (useragent) to Mozilla/5.0 (X11 Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/.61 Chrome/.61 Safari/537.36ĭEBUG output created by Wget 1.21.2 on linux-gnu. Here´s the terminal output: firejail wget -debug -user-agent="Mozilla/5.0 (X11 Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/.61 Chrome/.61 Safari/537.36" "" … and suddenly it occurred to me to use another user-agent (for firefox) with wget, as firefox is perfectly able to access and display the page. If it works with lynx and not with wget it could still be a problem at their end, or just a mismatch between what you ask and what they allow. Many thanks in advance and many greetings. That´s odd as I normally employ wget for dowloading almost anything and to the best of my recollection I never ran into any difficulties. So lynx doesn´t seem to have any problems with downloading the page but wget doesn´t have the right permissions I immediately got a nice html document of the respective page which I could easily save. Well, I then employed lynx for that matter (I´d like to stay within the terminal) and all I needed to do was hitting the “d” key from the lynx history and ![]() I looked it up and I think it´s a great summary of all the important stuff I may need.įor easy future reference I tried downloading the page with wget but to my astonishment I got this terminal output: wget "" On Linux Command Line Cheat Sheets was kind enough to tell us about “Linux Command Line Cheat Sheets”. Not an actual problem here, as I´ve found a solution/workaround already, but still: I´d like to know what you might think of the following:
0 Comments
Leave a Reply. |