The History is Pretty Neat!
Vocaloid is a musical voice synthesizer that had been developed with the research led by Kenmochi Hideki at the Pomeru Fabra University in Barcelona, Catalonia, Spain in 2000. the project itself was backed up by the Yamaha Corporation as to help develop the software despite not originally intending to be as popular and for commercial use as it is today. It had been released on January 15, 2004 and the stable release as of today is the Vocaloid 4. It was intended for professional musicians as well as light computer users seeing as the developers had their users sold on the idea that “the only limits are the users’ own skills.”
It is currently available in Japanese, English, Korean, Spanish, Chinese, and Catalan. The operating systems needed to run this program include Windows 2000/XP/Vista/7/8 and the Apple iOS (also iVocaloid, but that is a product exclusive only to Japan).
The users type in lyrics and melody in order to synthesize together a song as the program itself has specially recorded vocals of voice actors or singers. A piano roll type. The users can change the stress of pronunciations, add effects (such as vibrato) or change the altogether dynamics of the voice. The vocals are also refereed to as ‘a singer in a box’.
It was originally only available in English with the first singer in a box, Leon, Lola, and Miriam. The later Japanese modification added Meiko and Kaito, as the addition with Vocaloid 3 included Bruno, Clara and Maki for the Spanish update. Later, the Chinese update added Luo Tianyi and Yahne, as the Korean adjustment added SeeU. The most popular Vocaloid singer however is none other than diva pop star Hatsune Miku from Japan.
How does it work?
The system uses concatenation synthesis that is specially programmed to splice and process vocal fragments extracted from human voices singing in singing in synthesis to produce realistic voices by adding the different forms of information to add different vocal expressions such as the vibrato (in short, it’s a smart recording program that you can mess around with). The Vocaloid synthesis technology had been originally named “Frequency-Domain Singing Articulation Splicing and Shaping,” but it was too difficult to remember for most users and Yamaha dropped the name, going as far as to not use the name on their websites.
The Vocaloid 2 synthesis engines had been designed for singing and not reading text out loud despite software such as Vocaloid-flex and Viceroid having been developed for that very reason; naturally, the voices cannot replicate singing expressions like hoarse voices or shouting.
The main parts to the system is the Score Editor, the Singer Library, and the Synthesis Engine. The Synthesis Engine receives score information from the score editor and selects appropriate samples from the singer library, and concatenates them to output the synthesized voices. Yamaha had provided that there be almost no difference in the Score Editor and the Synthesis Engine among different Vocaloid 2 products. Currently, the system operates in Japanese and English, but other languages may be optional to operate under in the future.
The Score Editor is a piano roll style editor that allows the user to input notes, lyrics, and certain expressions that is then automatically converted into phonetic symbols using the built-in pronunciation dictionary which can be directly edited by the user. The editor itself has a supportive program of ReWire that can be synchronized with the DAW as well as the MIDI Keyboard program having the user able to have a real-time
Each Vocaloid liscence is in the Sibnger Library that has all possible combinations of phonemes (pronunciations) of the target languages as well as a chain of diphones (stitching together of the sounds). An example would be to make the word “sing” as to synthesize the sequence of diphones “#-s, s-I, I-N, N-#”. the system itself is able to change the pitch of the fragments so that it could fit the melody by using three or more pitch changes as three or four different ranges are required to be stored in the library. The Japanese singers usually have less diphones as they basically use only three patterns of the diphones containing a voiceless-consonant, vowel-consonant, and a consonant-vowel. English, however has many closed syllables ending in a consonant and a consonant-consonant as well as a consonant-voiceless diphone. that being said, more diphones are to be used and recorded into the English library than the Japanese ones and it is because of this that a Japanese library would not be correctly suitable for singing in English as most would assume.
Other softwares that were made after Vocaloid include:
Vocaloid-Flex: A speech synthesizer to get the tone naturally closer to a human’s to be used in other programs (this was used in Metal Gear Solid: Peace Walker).
VocaListener: Allows realistic songs to be produced.
MikuMikuDance: A 3D modeling system to move characters, stages and props as well as enter music in the background and render into videos (also known as MMD).
NetVocaloid: Uses synthesized singing voices connected to the internet, however after 2012, Yamaha no longer offered it on their website.
MMDAgent: Allows users to interact with the 3D models of the Vocaloid mascots
NetVocalis: Similar to VocaListener
Vocaloid Editor for Cubase:
Vocalodama: an iOs game app using the Vocaloid software
Vocaloid Net: A replacement of the NetVocaloid service that added cloud storage
Vocaloid First: offered as a free version on Vocaloid that contains the VY1 vocal in low quality form, released for the iPhone
Other hardware bridges include the Vocaloid-Board and the eVocaloid.
Ths software itself had become very popular in Japan upon the release of Crypton Future Media’s Hatsune Miku Vocaloid 2 software and her success leading to the popularity of the Vocaloid software in general. Japanese video sharing website NicoNico had also played a large part in its upbringing popularity. A user of the Hatsune Miku singer released a video that showed “Hachune Miku,” a super deformed Miku that held a Welsh onion (also known as Negi or Leek) and sang the Finnish song “Ievan Polkka” that much resembeled the flash animation “Loituma Girl.” As the population of the Vocaloid software grew, NicoNico Douga became a place for collaborative creation in where 2D and 3D animations and remixes were created by the users to make videos. The software has also been used to tell stories using the song and verse to make the Story of Evil series popular. Another theatre production based on the Cantarella song had hit the stage and ran in Shibuya’s Space Zero theatre in Tokyo from August 3 to August 7 in 2011. After a while, YouTube and file sharing sites found their way across the world and spread its influence like wildfire across the US and China, making its way to Europe and other such countries.