1. Goals:

People often search for songs on the Internet. Some know the exact information for the song they are looking for, for example, the title, singer, or album name. Some search for the song they haphazardly hear about. They may only know some fragments or parts of the lyrics, the genres (pop, rock, R&B, or so on), or what language it is. Sometimes they even do not know the exact song they want. In this project, we want to turn the web music search problem into music database with querying. We want to build a music database manage system generally having 4 main entities, which are 1). Song, 2). Artist, 3).Company, and 4). Album. This music database can be used for searching for songs with different sorts of queries. For preparation we have to gather music data and do some preprocessing. Then we will create a database management system and import and store our music data in a well-designed database. Then we design a website as an interface, which connects the database with built in applications for customers’ querying implementation. It allows users to search for songs in our static database. The backend will find the relevant results and return them on the web page.

2. Approach to solve the problem:

a. Database

Considering the lyric information may be the bottleneck of the efficiency of the query, we use MongoDB, which is a document-based database system. Our project includes 4 entities: Company, Artist, Company, and Album. Each of them contains several attributes and the features are listed below. The MongoDB allows us to use bash to import data into the data base.
E.g. add an artist into database Artist

1
2
3
4
5
6
7
8
db.Artist.insert({ID: ‘1’,
Name: ‘Taylor Swift’,
Gender: ‘Female’,
Nation: 'U.S.A',
Date: ‘1989-12-13’,
BeginDate: ‘2006’,
Type: [‘Pop’, ‘Country’]
})

E.g. add a song into database Song

1
2
3
4
5
6
7
db.Song.insert({ID: ‘1’,
Name: ‘Love Story’,
Time: ‘3.9’,
Language: ‘English’,
Type: ‘Pop’,
Lyric: ‘We were both young when I first saw you...’
})

b. Query function

We use java to program the function. As our goal is for users to query the target by the information they provide, we use a function with several parameters to The peus-do code is listed below.

1
2
3
4
5
6
7
8
9
10
11
queryforsong(singerName, songName, songLanguage, composerName, lyricistName, songType, songLyric) {
find singerName == Artist.Name;
group Song by singerName;
find songName == Song.Name;
group Song by songName;
find songLanguage == Song.Language;
group Song by songLanguage;
……
if only one song in the filtered data
return song;
}

c. Update function

If a user cannot find a certain song, he or she can update the database and add a song with some information. Specifically, one can add a song by offering 1) songName, 2) singerName, 3) songLyric and ect. hirachecally, and, if a user want to do this operation he need to give at least the songName.
The information the user offered will not be added into the database directly. Instead, the information will be sent to the administrators and they can decide whether to add it to the database or not.

d. Delete function

Also the user can delete information he doesn’t want from the database. Also, hirachecally, if the user deletes a company, it will delete all artists, albums and songs connected. This helps managing the database as the user want.

3. Design of Database: (ER diagram)

Primary Entities

Each Primary Entity has a main table with the same name as the entity, containing its basic data. In this project, we basically creates the music database with 4 entities: company, artist, album, and song.

Company

A company is a record company, with the certain name, location and the date of foundation, which has a lot of artists. The prime key is company’s id.

Artist

An artist is generally a musician, a group of musicians, or another music professional (composer, engineer, illustrator, producer, etc.). The designed structure includes name, birthday, found date, gender, nationality, type and an ID as primary key. There’s a multi-valued attribute in artist entity, which is ‘type’ attribute.

Album

An album is specifically a product in the market. The type of album can be “Album”, “Single” or “EP”, which is the domain of this attribute. The entity is release by either a group or a single artist. Suppose five artists consist a group and the group released one album. Then the album will directly linked with the group. It will not have direct relationship with these five individuals, since we view this album as the group’s creation, not any single artist of this group.

Song

A song with a primary key ID, including name, time, language, lyric and type as its attributes, is generally sung by an artist or a group. And it also has a composer and a lyricist from the Artist entity.
Here shows the ER model.

4. Data:

a). What is the data for our project:

The data we need in this project is all about music, singer, album, etc. We need the data about songs (must have singers, we exclude pure music in this project), artists (include singers, lyricists, and composers, sometimes one artist can be more than one roles), group (consists of several artists), albums (each song must belongs to one or more albums), and company (every company has one or more artists).
i). For songs entity, we need song id, title/name, singer id, composer id, album id, releasing date, language, time (how long the song lasts), type/genre, and lyrics, with song id with prime key, and singer id, composer id, album id as foreign keys, pointing to artists, groups and albums entities.
ii). For artists entity, we need artist id, artist name, nationality, birth date, strat date, type, group id, company id and gender, with artist id as its prime key and group id and company id as foreign keys pointing to group entity and company entity. The type attribute is a multi-valued attribute because an artist can sometimes have multiple roles.
iii). For album entity, we need album id, name, type, and date, with album id as prime key.
iv). For company entity, we need company id, name, region, and found date, with company id as its prime key.

b). Where is our data from:

There are several ways that we can get the data we want. First, we’ve searched online and found some websites for collecting the music data, e.g. MusicBrainz and iTunes. MusicBrainz also offers API through which we can easily accumulate our desired information. Also, we need to search for some missing parts that those data do not cover, for example, the company entity and its attributes that the music data neglects.

c). How to implement the data:

We want to collect data as comprehensive as possible. However, it’s impossible that these data perfectly match our project goal. So we need to modify or fill up some missing information. For example, for every possible ids (including song id, group id, artist id, company id, and album id) we want to define them for our database system, even if the data has its original ids as key for each entity. For some attributes like tpyes/genres, we need to set their domains and adjust every piece of input data if necessary. For every attribute concerning date, we need to adjust the input format so that the database system could become very regulated. In this data pre-processing part, we may apply regular expression with python. After those, we can import the data into our database.

5. Architectural overview:

We design a webpage as interface, which connects the database with built in applications for customers’ querying implementation. It allows users to search for songs in our static database. The web is connected with our back-end database with JDBC (Java database connectivity) in java. When user inputs the searching for information, the system will translate into query language for database to find the targets. In database system, each entity and relation is connected via foreign keys, which is shown in the relation model graph.

6. Code explanation

6.1 Spring framework

We use Spring framework/J2EE to implement the program. Here shows the packets we queries from the maven.com. The version we choose is spring 3.2.3.RELEASE and the most relative packet for our program is mongo-java-driver as JDBC for MongoDB version 2.13.0 and spring-data-mongodb for the framework to deal with data from MongoDB. The advantage to use such MVC framework is that we can easily get resources online and manage them with versions we need.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>2.13.0</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-aop</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-beans</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-commons-core</artifactId>
<version>1.4.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-mongodb</artifactId>
<version>1.4.2.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-expression</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-web</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-webmvc</artifactId>
<version>3.2.3.RELEASE</version>
</dependency>

Besides the resources, we edit the application servlet, application context and MongoDB config as well to support our program. And the controller we set represents the connection between different web pages with different queries.

6.2 Web page

As time limit, we don’t decorate the web page much. just use some simple CSS styles and script to support the page connection. Below shows the .jsp for seeking certain data by the information user supplied. As we have four main databases connected by super keys, user can query what they want as free as possible.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
<script>
function setaction(a) {
if (a==0) {
document.mform.action="${pageContext.request.contextPath}/music/findsongpage";
} else if (a==1) {
document.mform.action="${pageContext.request.contextPath}/music/findartistpage";
} else if (a==2) {
document.mform.action="${pageContext.request.contextPath}/music/findalbumpage";
} else {
document.mform.action="${pageContext.request.contextPath}/music/findcompanypage";
}
document.mform.submit;
}
</script>
<body>
<p>welcome and enjoy your searching</p>
<div id="center">
<form name="mform" action="${pageContext.request.contextPath}/music/findpage" method="post">
<p align="left">Song Information:</p>
name: <input type="text" name="sname" /> <br />
time: <input type="text" name="stime" value="0"/> <br />
language: <select name="slanguage">
<option value=""></option>
<option value="english">english</option>
<option value="chinese">chinese</option>
</select> <br />
writer: <input type="text" name="swriter" /> <br />
singer: <input type="text" name="ssinger" /> <br />
lyrics: <textarea cols="30" rows="10" name="slyrics"></textarea> <br />

<p align="left">Artist Information:</p>
name: <input type="text" name="arname" /> <br />
birthday: <input type="text" name="arbirthday" value="19000101"/> <br />
gender: <select name="argender">
<option value=""></option>
<option value="Male">Male</option>
<option value="Female">Female</option>
</select> <br />
nation: <select name="arnation">
<option value=""></option>
<option value="U.S.A">U.S.A.</option>
<option value="China">China</option>
</select> <br />

<p align="left">Album Information:</p>
name: <input type="text" name="alname" /> <br />
<input type="hidden" name="altype" value="" />
pop: <input type="checkbox" name="altype" value="pop" />
rock: <input type="checkbox" name="altype" value="rock"/>
country: <input type="checkbox" name="altype" value="country"/>
blues: <input type="checkbox" name="altype" value="blues"/>
<br />
releaseday: <input type="text" name="alreleasedate" value="19000101"/> <br />

<p align="left">Company Information:</p>
name: <input type="text" name="cname" /> <br />
nation: <select name="cnation">
<option value=""></option>
<option value="U.S.A">U.S.A.</option>
<option value="China">China</option>
</select> <br />
foundyear: <input type="text" name="cfoundyear" value="1900"/> <br />

<input type="submit" value="search song" onclick='setaction(0)'/>
<input type="submit" value="search artist" onclick='setaction(1)'/>
<br />
<input type="submit" value="search album" onclick='setaction(2)'/>
<input type="submit" value="search company" onclick='setaction(3)'/>
</form>
<a href="${pageContext.request.contextPath}/music/index">back to homepage</a><br /><br />
</div>
</div>
</body>

6.3 Query function

We create a service center to manage the different queries from user operation. The interface is like below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
public interface UserService {
//User Login
public void registerUser(User user);
public User findUserByName(String name);
public User loginUser(User user);

//Song
public void addSong(Song song);
public void addArtist(Artist artist);
public void addAlbum(Album album);
public void addCompany(Company company);

public long getNextSongID();
public long getNextArtistID();
public long getNextAlbumID();
public long getNextCompanyID();

public long getSongNo();
public long getArtistNo();
public long getAlbumNo();
public long getCompanyNo();

public List<Song> findSong(Song song, Artist artist, Album album, Company company);
public List<Artist> findArtist(Song song, Artist artist, Album album, Company company);
public List<Album> findAlbum(Song song, Artist artist, Album album, Company company);
public List<Company> findCompany(Song song, Artist artist, Album album, Company company);

public String delete(Song song, Artist artist, Album album, Company company);
public void update(String cname, String arname, int arbirthday);
}

As is listed, the main functions we implemented are “add data”, “delete data”, “find data” and “update data”. We will talk about these 4 queries later.
As JDBC for MongoDB has supplied us with some basic functions. The main obstacles is to implement “JOIN” operation which is useful and common in relational database systems because MongoDB, an NOSQL database system, is a document-based system which is helpful for storing text information while is week in relational connections. As we designed the 4 collections hirachically joined, i.e. artist has an attribute named companyID to set this artist to specific company, all 4 kinds of queries satiesfy that strategy, that is, when user add a song, it will generate a new album by a new artists from a new company (if non of these have been stored before). The delete and update queries are similar to the add function. Specially for the find query, as we designed that user can query for whatever they want from the database by whatever information they filled, the sequences of filtering for different 4 collections are different.
Here shows the query for seeking for a song.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
public List<Song> findSong(Song song, Artist artist, Album album, Company company) {
List<Song> list1;
List<Song> list2;
List<Company> clist;
List<Artist> arlist;
List<Album> allist;

if (company.getCfoundyear() != 1900) {
clist = mongoTemplate.find(new Query(Criteria.where("cname").regex(company.getCname(), "i")
.andOperator(Criteria.where("cnation").regex(company.getCnation())
.andOperator(Criteria.where("cfoundyear").is(company.getCfoundyear())))), Company.class, COMPANY_COLLECTION);
} else {
clist = mongoTemplate.find(new Query(Criteria.where("cname").regex(company.getCname(), "i")
.andOperator(Criteria.where("cnation").regex(company.getCnation()))), Company.class, COMPANY_COLLECTION);
}
Set<Long> cid = new HashSet<Long>();
cid.add((long) 0);
for (int i=0; i<clist.size(); i++)
cid.add(clist.get(i).getCuid());

if (artist.getArbirthday() != 19000101) {
arlist = mongoTemplate.find(new Query(Criteria.where("arname").regex(artist.getArname(), "i")
.andOperator(Criteria.where("arbirthday").is(artist.getArbirthday())
.andOperator(Criteria.where("argender").regex(artist.getArgender())
.andOperator(Criteria.where("arnation").regex(artist.getArnation())
.andOperator(Criteria.where("arcompanyID").in(cid))
)))), Artist.class, ARTIST_COLLECTION);
} else {
arlist = mongoTemplate.find(new Query(Criteria.where("arname").regex(artist.getArname(), "i")
.andOperator(Criteria.where("argender").regex(artist.getArgender())
.andOperator(Criteria.where("arnation").regex(artist.getArnation())
.andOperator(Criteria.where("arcompanyID").in(cid))
))), Artist.class, ARTIST_COLLECTION);
}
Set<Long> arid = new HashSet<Long>();
arid.add((long) 0);
for (int i=0; i<arlist.size(); i++)
arid.add(arlist.get(i).getAruid());

if (album.getAlreleasedate() != 19000101) {
allist = mongoTemplate.find(new Query(Criteria.where("alname").regex(album.getAlname(), "i")
.andOperator(Criteria.where("altype").regex(album.getAltype())
.andOperator(Criteria.where("alreleasedate").is(album.getAlreleasedate())
.andOperator(Criteria.where("alartistID").in(arid))
))), Album.class, ALBUM_COLLECTION);
} else {
allist = mongoTemplate.find(new Query(Criteria.where("alname").regex(album.getAlname(), "i")
.andOperator(Criteria.where("altype").regex(album.getAltype())
.andOperator(Criteria.where("alartistID").in(arid))
)), Album.class, ALBUM_COLLECTION);
}
Set<Long> alid = new HashSet<Long>();
alid.add((long) 0);
for (int i=0; i<allist.size(); i++)
alid.add(allist.get(i).getAluid());

if (song.getStime() > 0) {
list1 = mongoTemplate.find(new Query(Criteria.where("sname").regex(song.getSname(), "i")
.andOperator(Criteria.where("stime").gte(song.getStime()-0.25)
.andOperator(Criteria.where("stime").lte(song.getStime()+0.25)
.andOperator(Criteria.where("slanguage").regex(song.getSlanguage()))))), Song.class, SONG_COLLECTION);
} else {
list1 = mongoTemplate.find(new Query(Criteria.where("sname").regex(song.getSname(),"i")
.andOperator(Criteria.where("slanguage").regex(song.getSlanguage()))), Song.class, SONG_COLLECTION);
}
list2 = mongoTemplate.find(new Query(Criteria.where("swriter").regex(song.getSwriter(), "i")
.andOperator(Criteria.where("ssinger").regex(song.getSsinger(), "i")
.andOperator(Criteria.where("slyrics").regex(song.getSlyrics(), "i")
.andOperator(Criteria.where("salbumID").in(alid))))), Song.class, SONG_COLLECTION);

Set<Long> sid = new HashSet<Long>();
for (int i=0; i<list2.size(); i++)
sid.add(list2.get(i).getUid());
int a=0;
while (a<list1.size()) {
if (sid.contains(list1.get(a).getUid())) {
a++;
continue;
}
list1.remove(a);
}

return list1;
}

7. Conclusion

We designed the web application for users to manage their own music database system. The reason why we choose MongoDB instead of relational database such as MySQL is that when searching by lyrics, the query speed of NOSQL document-based MongoDB is faster than MySQL. We tested the performance of the response time and consequntly MongoDB is one time faster than MySQL for the same 4 GB data. The difficulty of using MongoDB is that it doesn’t supply “JOIN” operation while we successfully implement that.