[TUT] How to WEBSCRAPE albums from bandcamp using Python ^_^
#1
Greetings my friends. I know that a little or some of you are fans of indie artists or disc jockeys at bandcamp. Well, ever thought of getting the albums and playlists in mp3 format for free?

Well here is one of the scripts I have made during my spare time when I became obsessed with the site and the music they produce. I have been using this code for almost a year now and feel free to use it. Cool

Tools of trade:
- Python
- youtube-dl
- BeautifulSoup (python library)
- urllib2 (python library)

Source code (Python 2.7.x):
Code:
#!/usr/bin/python

from BeautifulSoup import BeautifulSoup as bs
import urllib2, re, os

artist_link = "https://YOURARTIST.bandcamp.com"
album_dict = {}

src = bs(urllib2.urlopen(artist_link + "/music"))
for link in src.findAll("a", attrs={"href" : re.compile("/album/")}):
    str_link = str(link.get("href"))
    key = str_link.replace("/album/", "")
    val = artist_link + str_link
    albums_dict[key] = val
    
for album in albums_dict:
    if not os.path.exists(album) and not os.path.isdir(album):
        cmd = "youtube-dl -o \"./"
        cmd += album + "/"
        cmd += "%(playlist_index)s %(title)s.%(ext)s\" "
        cmd += albums_dict[album]
        os.mkdir(album)
        os.system(cmd)

Steps:
1.) Install python, the two important libraries and its prerequisites.
2.) Install youtube-dl, and its prerequisites.
3.) Change YOURARTIST using your target artist.
4.) Kindly double click it or execute it via command-line to see the logs.
5.) Once download has been started, you will notice that it will create a directory where the source code is and it will create folders of each album then place each song to its respective album.

Notes to be aware of:
This script was tested using Cygwin, WSL (Windows Subsystem for Linux), Xubuntu, Ubuntu, Gentoo, and Slackware. Haven't tried it on other operating systems so feel free to use my code. Please do enhance and make the source code much better because sometimes this does not work if the bandcamp artist is using a custom theme (by default this works on most of the artist pages) or you have reached your maximum internet usage or bandwidth (Lols. Big Grin ). Please do reply to my thread for further questions and suggestions. Thank you very much for reading my thread. Wink

Credits To:
- Vector and Enmafia2 for encouraging me to post a thread and for helping me cope up with my anxiety.
- To the authors of python and extended libraries from the pip repository.
- To the authors of youtube-dl and its prerequisites.
- Bandcamp's developers who did not notice this simple security hole.
Reply
#2
Code:
if not os.path.exists(album) and not os.path.isdir(album):

use `os.path.isfile(...)`

Code:
import urllib2, re, os

split imports over lines opposed to using `import x, y, z`

Code:
for album in albums_dict:
    if not os.path.exists(album) and not os.path.isdir(album):

factorize to

Code:
albums_dict = filter(lambda k: os.path.isfile(k), albums_dict)
for album in albums_dict:
    <...>

Code:
cmd = "youtube-dl -o \"./"
        cmd += album + "/"
        cmd += "%(playlist_index)s %(title)s.%(ext)s\" "
        cmd += albums_dict[album]
        os.mkdir(album)
        os.system(cmd)

the youtube-dl is an actual module you can import, i would suggest using the pure version over using `os.system` to call youtube-dl

even if not that, at least use `subprocess.check_output(...)` or `subprocess.call`

also try not to use urllib2 and use requests instead.

but over-all it's pretty good and i know you didn't ask explicitly for a code review but i'll just give one anyway just so you have a heads up
Reply
#3
Hey there, thank you for making suggestions towards my code. Well, I tried using other libraries for url purposes but I found myself more comfortable using urllib2. Big Grin And also, for the separation of libraries, I think that is up to the programmer which he/she finds comfortable then others will just have to adjust their code according to their contentment and needs. It's just like putting comments and not putting comments. Big Grin Big Grin

Below is something I have not thought about and it is highly appreciated.
(10-31-2017, 01:41 AM)уназед Wrote: factorize to

Code:
albums_dict = filter(lambda k: os.path.isfile(k), albums_dict)
for album in albums_dict:
   <...>

Thanks for teaching me this portion of the code which I found very useful and I could also implement and update my other projects with this. Highly appreciate your suggestions and efforts! Cool Big Grin
Reply
#4
Hey grimmbot, thanks for sharing. Looks like a handy program to have. I am wondering though, if you are just performing OS commands with `os.system()` why not write a shell script? Using Python in this manner seems like a roundabout way of going by things.
Reply
#5
(10-31-2017, 04:41 PM)Vector Wrote: Hey grimmbot, thanks for sharing. Looks like a handy program to have. I am wondering though, if you are just performing OS commands with `os.system()` why not write a shell script? Using Python in this manner seems like a roundabout way of going by things.

Well, youtube-dl is the only part that requires shell scripting and it is purely dependent on BeautifulSoup (for getting the specific 'href' tags and parsing it through regular expressions to get what we need then download each mp3 files through a shell command). Well, for starters especially beginners and non-shell dependent users in the forum, it might be hard for them if I were to wrote non-python script. Hehehe. Wink

Actually, this script I shared with you guys is part of a bigger personal script that comes along with other tools and it has a graphical user interface using Tkinter.  Big Grin Currently, I'm thinking of migrating the bigger script to a C/C++ or Java application (for speed and portability purposes), reverse engineer the libraries it is using or use other libraries then build and share it to you guys here.
Reply
#6
WOW! What a coincidence! I'm working in a project with beautifulsoup at the moment too! haha
That's definitely a great script, looks awesome Tongue

And also, thanks for introducing me to bandcamp, I didn't know about that website and it's great for the genres I usually listen to Smile
Good job and keep it up!
Reply
#7
(11-01-2017, 04:04 AM)grimmbot Wrote:
(10-31-2017, 04:41 PM)Vector Wrote: Hey grimmbot, thanks for sharing. Looks like a handy program to have. I am wondering though, if you are just performing OS commands with `os.system()` why not write a shell script? Using Python in this manner seems like a roundabout way of going by things.

Well, youtube-dl is the only part that requires shell scripting and it is purely dependent on BeautifulSoup (for getting the specific 'href' tags and parsing it through regular expressions to get what we need then download each mp3 files through a shell command). Well, for starters especially beginners and non-shell dependent users in the forum, it might be hard for them if I were to wrote non-python script. Hehehe. Wink

Actually, this script I shared with you guys is part of a bigger personal script that comes along with other tools and it has a graphical user interface using Tkinter.  Big Grin Currently, I'm thinking of migrating the bigger script to a C/C++ or Java application (for speed and portability purposes), reverse engineer the libraries it is using or use other libraries then build and share it to you guys here.

I see, well keep me posted on the rest of the project.
Reply
#8
(11-02-2017, 03:54 PM)Vector Wrote: I see, well keep me posted on the rest of the project.

Absolutely Vector! Big Grin
Reply
#9
It is great that useful shares are shared by many people in this article.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Ebook Collection [89 Files] Insider 15 90,924 08-12-2021, 08:02 PM
Last Post: zzeuss
  NSA Python Training Insider 4 29,875 08-12-2021, 02:14 AM
Last Post: hworth
  Having an issue writing a python script with vim FancyBear 4 23,596 01-03-2021, 11:27 PM
Last Post: FancyBear
  Python Data structures and algorithms resources skinnyj0shua 1 18,094 12-23-2020, 12:52 PM
Last Post: enmafia2