Once you have enough audio samples, it’s time to copy them over to your computer to train the model — either use scp to

Author : anino.bola82e
Publish Date : 2021-01-07 12:35:55

Once you have enough audio samples, it’s time to copy them over to your computer to train the model — either use scp to

There’s not much use however in a script that simply prints a message to the standard output if our baby is crying — we want to be notified! Let’s use Platypush to cover this part. In this example, we’ll use the pushbullet integration to send a message to our mobile when cry is detected. Let’s install Redis (used by Platypush to receive messages) and Platypush with the HTTP and Pushbullet integrations:

The boring part comes now: labeling the recorded audio files — and it can be particularly masochistic if they contain hours of your own baby’s cries. Open each of the dataset audio files either in your favourite audio player or in Audacity and create a new labels.json file in each of the samples directories. Identify the exact time where the cries start and where they end, and report them in labels.json as a key-value structure in the form time_string -> label. Example:

The model is designed to work on frequency samples instead of raw audio. The reason is that, if we want to detect a specific sound, that sound will have a specific “spectral” signature — i.e. a base frequency (or a narrow range where the base frequency may usually fall) and a specific set of harmonics bound to the base frequency by specific ratios. Moreover, the ratios between such frequencies are not affected neither by amplitude (the frequency ratios are constant regardless of the input volume) nor by phase (a continuous sound will have the same spectral signature regardless of when you start recording it). Such an amplitude and time invariant property makes this approach much more likely to train a robust sound detection model compared to the case where we simply feed raw audio samples to a model. Moreover, this model can be simpler (we can easily group frequencies into bins without affecting the performance, thus we can effectively perform dimensional reduction), much lighter (the model will have between 50 and 100 frequency bands as input values, regardless of the sample duration, while one second of raw audio usually contains 44100 data points, and the length of the input increases with the duration of the sample) and less prone to overfit.

After running this script (and after you’re happy with the model’s accuracy) you’ll find your new model saved under ~/models/sound-detect. In my case it was sufficient to collect ~5 hours of sounds from my baby’s room and define a good frequency range to train a model with >98% accuracy. If you trained this model on your computer, just copy it to the RaspberryPi and you’re ready for the next step.

Whether you used micmon-datagen or the micmon Python API, at the end of the process you should find a bunch of .npz files under ~/datasets/sound-detect/data, one for each labelled audio file in the original dataset. We can use this dataset to train our neural network for sound detection.

Once you have labelled all the audio samples, let’s proceed with generating the dataset that will be fed to the Tensorflow model. I have created a generic library and set of utilities for sound monitoring called micmon. Let’s start with installing it:





















hologist and therapist, I talk to unhappy people every day — many of whom are quite wealthy, good-good looking, and have read all the best self-help books. But it seems to me that what most often holds them back from happiness is the collection of negative mental programs running in the background in their minds.

Install the Pushbullet app on your smartphone and head to pushbullet.com to get an API token. Then create a ~/.config/platypush/config.yaml file that enables the HTTP and Pushbullet integrations:

In the example above, all the audio segments between 00:00 and 02:12 will be labelled as negative, all the audio segments between 02:13 and 04:56 will be labelled as positive, and so on.

Let’s now create a Platypush hook to react on the event and send a notification to our devices. First, prepare the Platypush scripts directory if it’s not been created already:

micmon provides the logic to calculate the FFT (Fast-Fourier Transform) of some segments of the audio samples, group the resulting spectrum into bands with low-pass and high-pass filters and save the result to a set of numpy compressed (.npz) files. You can do it over command-line through the micmon-datagen command:

In the example above we generate a dataset from raw audio samples stored under ~/dataset/sound-detect/audio and store the resulting spectral data to ~/datasets/sound-detect/data. --low and --high respectively identify the lowest and highest frequency to be taken into account in the resulting spectrum. The default values are respectively 20 Hz (lowest frequency audible to a human ear) and 20 kHz (highest frequency audible to a healthy and young human ear). However, you may usually want to restrict this range to capture as much as possible of the sound that you want to detect and limit as much as possible any other type of audio background and unrelated harmonics. I have found in my case that a 250–2500 Hz range is good enough to detect baby cries. Baby cries are usually high-pitched (consider that the highest note an opera soprano can reach is around 1000 Hz), and you may usually want to at least double the highest frequency to make sure that you get enough higher harmonics (the harmonics are the higher frequencies that actually give a timbre, or colour, to a sound), but not too high to pollute the spectrum with harmonics from other background sounds. I also cut anything below 250 Hz — a baby’s cry sound probably won’t have much happening on those low frequencies, and including them may also skew detection. A good approach is to open some positive audio samples in e.g. Audacity or any equalizer/spectrum analyzer, check which frequencies are dominant in the positive samples and center your dataset around those frequencies. --bins specifies the number of groups for the frequency space (default: 100). A higher number of bins means a higher frequency resolution/granularity, but if it’s too high it may make the model prone to overfit.

The script splits the original audio into smaller segments and it calculates the spectral “signature” of each of those segments. --sample-duration specifies how long each of these segments should be (default: 2 seconds). A higher value may work better with sounds that last longer, but it’ll decrease the time-to-detection and it’ll probably fail on short sounds. A lower value may work better with shorter sounds, but the captured segments may not have enough information to reliably identify the sound if the sound is longer.

Save the script above as e.g. ~/bin/micmon_detect.py. The script only triggers an event if at least positive_samples samples are detected over a sliding window of window_length seconds (that’s to reduce the noise caused by prediction errors or temporary glitches), and it only triggers an event when the current prediction goes from negative to positive or the other way around. The event is then dispatched to Platypush over the RedisBus. The script should also be general-purpose enough to work with any sound model (not necessarily that of a crying infant), any positive/negative labels, any frequency range and any type of output event.

Let’s store them all under the same directory, e.g. ~/datasets/sound-detect/audio. Also, let’s create a new folder for each of the samples. Each folder will contain an audio file (named audio.mp3) and a labels file (named labels.json) that we’ll use to label the negative/positive audio segments in the audio file. So the structure of the raw dataset will be something like:

Category : general

SAP C_TPLM30_67 Certification Exams That You Need to Check

SAP C_TPLM30_67 Certification Exams That You Need to Check

- Marketing automation is one of the great processes that help businesses not only to automate their repetitive marketing tasks.

I think most people treat their housing payment as the one thing they’re gonna pay no matter what. If you miss a credit

I think most people treat their housing payment as the one thing they’re gonna pay no matter what. If you miss a credit

- If you’d like to post a representation of yourself online without worrying about mass surveillance—or if you just want to test how well your contacts know you by swapping your Twitter prof

Global autoinjectors market is projected to grow at an annualized rate of over 10% / till 2030

Global autoinjectors market is projected to grow at an annualized rate of over 10% / till 2030

- Autoinjectors have become an important part of modern healthcare, having demonstrated the potential to address

Practice with Our Unique Arcitura Education S90.02 Questions

Practice with Our Unique Arcitura Education S90.02 Questions

- Real exam questions in PDF and Practice test format. Download dumps file instantly.