Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for barely pronounced vowels #12

Open
postkevone opened this issue Apr 22, 2021 · 14 comments
Open

Add support for barely pronounced vowels #12

postkevone opened this issue Apr 22, 2021 · 14 comments
Labels
enhancement New feature or request

Comments

@postkevone
Copy link

On wadoku is possible to see when a certain mora's vowel is not pronounced

Screenshot 2021-04-22 at 6 14 02 PM

Screenshot 2021-04-22 at 6 13 52 PM
In both cases the "u" is not pronounced.

On the NHK dictionary those moras are shown as below:
5240a1ba-01b8-4121-a685-e0729315a417

It would be great if you could add this feature in your addon, making the pitch accent more accurate.

@IllDepence
Copy link
Owner

Hi kebifurai,

thanks for the input. I like the idea. For the moment I'll make some notes here on what would need to happen to implement the feature.

  • Find an appropriate way to visually convey barely pronounced mora(?) in pitch accent illustrations and update SVG_pitch accordingly
  • Find out how barely pronounced vowels are denoted in the wadoku XML dump
  • Adjust the parsing script
  • Generate updated wadoku_pitchdb.csv, update user_pitchdb.csv

If you have any input for any of the above, don't hesitate to let me know. (:

@postkevone
Copy link
Author

Thank you for your reply.

After exploring the XLM dump a bit I found out that those vowels are preceded by [Dev]

<hatsuon>[Dev]しゅく'じつ</hatsuon>

<hatsuon>た[Dev]すけ</hatsuon>

<hatsuon>く'[Dev]ちく</hatsuon>

<hatsuon>[Dev]き・さま</hatsuon>

@TheScientist14
Copy link

I'd really appreciate the feature as well.
To visually convey these vowels, I feel like circling the vowel either with a solid or dashed stroke would be appropriate.
That's how Japanese do : ㋜, ㋛, ㋡, ㋗, ㋠, ㋖, ㋪, ㋫, etc... (they usually write pronunciations with katakana).

btw, love your add-on

@IllDepence
Copy link
Owner

Thanks for the input!

One key problem is see with circling is that しゅ is a common candidate for having a barely pronounced vowel. So with e.g. 祝福

skfk

both しゅ and ふ would need a circle. "Circling" しゅ completely would result in some kind of oval, while only circling in the し would be hard w/o crossing over the ゅ.

Looking at how Wadoku does it, they additionally show the pronunciation in rōmaji and grey out the vowel. I feel greying out is kind of intuitive for "barely pronounced", but for the add-on accent visualization + kana + rōmaji would be a bit noisy.

@TheScientist14
Copy link

TheScientist14 commented Jun 1, 2021

Yeah, true. Though why not greying out kana ?

@TheScientist14
Copy link

If not possible, circling the circle on the pitch accent graph doesn't seem a bad solution either to me.

@IllDepence
Copy link
Owner

circling the circle

I feel that wouldn't be very intuitive.

why not greying out kana

I played around with greying out the kana and circle. Example:

skfk

Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

A bit more subtle and hinting at only the vowel part being barely pronounced would maybe be to grey out the right part of the circle.

skfk_half

Thoughts?


@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).

@TheScientist14
Copy link

TheScientist14 commented Jun 2, 2021

If you decide to use katakana, you could use the chars that I sent in my first comment.
Here is the list of every katakana which exists with a circle as a char (src) :

㋕, ㋖, ㋗, ㋘, ㋙
㋚, ㋛, ㋜, ㋝, ㋞
㋟, ㋠, ㋡, ㋢, ㋣
㋤, ㋥, ㋦, ㋧, ㋨
㋩, ㋪, ㋫, ㋬, ㋭
㋮, ㋯, ㋰, ㋱, ㋲
㋳,   , ㋴,   , ㋵
㋶, ㋷, ㋸, ㋹, ㋺
㋻, ㋼,   , ㋽, ㋾

Side note : not every vowel in this list are usable, only vowels ending with 'u' or 'i' can be silenced. Idk why the other ones exist...
For every other vowel that is not in this list, I suggest to surround it with parenthesis this way :
(シュ)、(フィ)、(プ)、(ピ)

Actually, greying out the kana and circle feels good to me.

N.B : It appears that only キ、ク、シ、シュ、ス、チ、ツ、ヒ、フ、フィ、ピ、プ can be devoiced. (src)

I believe the NHK @kebifurai has quoted is in this app, not sure though.

@postkevone
Copy link
Author

@kebifurai do you have a link to the NHK app that's using the dashed circles? Or even better maybe some resource (website/book/...) discussing/explaining that kind of notation? If there is some sort of conventional way to denote barely pronounced vowels in Japanese I'd prefer to take inspiration from that. (Side note: considering to switch to katakana given @TheScientist14 pointed out it's common and the 大辞林 I point to in the README does so).

Unfortunately the app is paid and only for iOS: https://www.monokakido.jp/ja/dictionaries/nhkaccent2/index.html

You can also give a look at this anki addon: https://ankiweb.net/shared/info/1225470483
Here you can see a configuration similar to the one used in the NHK dictionary: https://tatsumoto-ren.github.io/blog/useful-anki-add-ons-for-japanese.html#japitch

@IllDepence IllDepence added the enhancement New feature or request label Sep 4, 2021
@redpanda1234
Copy link

I played around with greying out the kana and circle. Example:

skfk

Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.

"H" = high + voiced 
"h" = high + devoiced
"L" = low + voiced 
"l" = low + devoiced 

@TheScientist14
Copy link

I played around with greying out the kana and circle. Example:
skfk
Does look okay I think. Not 100% clean because it's acually only vowel part that's barely pronounced but well ... an okay compromize I guess.

Honestly I like this idea a lot. It's similar to what the people running suzuki kun do so I would support this as a solution (maybe with a slightly lighter shade of gray for the circles). My only question is what the manual-entry syntax would look like. Do you think it would make sense to just do this with upper case vs. lower case letters? E.g.

"H" = high + voiced 
"h" = high + devoiced
"L" = low + voiced 
"l" = low + devoiced 

Imo, it doesn't need to be indicated in the manual-entry. If you really want to, maybe you could surround it with parenthesis ?
Like so : (H) L
But, to me, it is not related to the pitch.

@redpanda1234
Copy link

redpanda1234 commented Oct 4, 2022

Hi, sorry, I didn't see that you'd replied to this. Which is too bad since you were so prompt!! Apologies!!!

But, to me, it is not related to the pitch
I guess it might not be directly related to pitch, but I feel like the point of this tool is to help people hone their pronunciation to be closer to that of a native speaker, and devoicing is an important part of that. So I think it makes sense to include as a feature.

In my mind the alternative is to have two separate tools with which to practice each. I can't think of a good reason to do that instead of practicing both at the same time.

Imo, it doesn't need to be indicated in the manual-entry.

The textbook I'm using frequently has words or phrases that don't play well with the automation script. This happens maybe ~30-50% of the time. Also, there are a handful of words for which the automation script appears to get pitch information that doesn't match that of my textbook. In these cases I look up pitch accent + devoicing information manually and enter it. Since this happens so frequently I think it's a reasonable feature to add.

If you really want to, maybe you could surround it with parenthesis ?

Sure, I'd be fine with that!

@redpanda1234
Copy link

pitch-accent

alright how about something like this

@redpanda1234
Copy link

redpanda1234 commented Oct 9, 2022

I also modified the code to
(a) ignore characters in the pitch pattern string past 1 + number of mora, and
(b) write the kana / pitch pattern / pitch accent image to fields in the card, since I have to do a lot of manual pitch entries in my use case and was getting a bit annoyed that I had to re-enter the whole pitch pattern and reading from scratch whenever I'd make a small mistake in one spot.

Looks something like this: link

Haven't tested it with the batch processing mode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants