Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add gen3 Nintendo Mii data file formats #355

Closed
wants to merge 2 commits into from
Closed

add gen3 Nintendo Mii data file formats #355

wants to merge 2 commits into from

Conversation

HEYimHeroic
Copy link

added the Mii Studio Mii data file format: the Mii Studio (studio.mii.nintendo.com) stores Miis in this format. you can pull the Mii data from your browser's local storage, or by using this JavaScript code written by me and polished by bendevnul: https://github.com/RiiConnect24/mii2studio#importingexporting-to-mii-studio (instructions are on the page)

added the SwitchDB Mii data file format: the Nintendo Switch's internal Mii database uses a different format than used in games. this is the format of the Miis used in the internal database. highly compact, and can fit 10 characters of a Mii name (20 bytes). also includes some currently unknown data :( but it doesn't seem that this data is relevant, as even the Mii Studio format doesn't store it.

added the Switch Mii data file format: can store the exact same as the SwitchDB format, just this format is only used in games like Super Smash Bros. Ultimate and Mario Kart 8 Deluxe, while the SwitchDB is (as explained above) only used in the internal Mii database.

i documented and created the .ksy files for all of these formats. if needed, i can provide default and example files.

added the Mii Studio Mii data file format: the Mii Studio (studio.mii.nintendo.com) stores Miis in this format. you can pull the Mii data from your browser's local storage, or by using this JavaScript code written by me and polished by bendevnul: https://github.com/RiiConnect24/mii2studio#importingexporting-to-mii-studio (instructions are on the page)

added the SwitchDB Mii data file format: the Nintendo Switch's internal Mii database uses a different format than used in games. this is the format of the Miis used in the internal database. highly compact, and can fit 10 characters of a Mii name (20 bytes). also includes some currently unknown data :( but it doesn't seem that this data is relevant, as even the Mii Studio format doesn't store it.

added the Switch Mii data file format: can store the exact same as the SwitchDB format, just this format is only used in games like Super Smash Bros. Ultimate and Mario Kart 8 Deluxe, while the SwitchDB is (as explained above) only used in the internal Mii database.

i documented and created the .ksy files for all of these formats.
Copy link
Contributor

@dgelessus dgelessus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! This looks good overall. I've left a few comments about style/formatting, and some suggestions for using Kaitai Struct's validation features.

@@ -0,0 +1,142 @@
meta:
id: miidata_ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id of a spec has to match the name of the KSY file, i. e. you should probably rename this file to "miidata_ms.ksy".

Also, at the moment the compiler doesn't support namespaces for KSYs properly, so even though they are structured into subdirectories in the kaitai_struct_formats repo, all KSYs end up in a single shared top-level namespace. Because of this, it's better to make top-level ids a bit longer and more descriptive. In this case miidata_miistudio would be better than miidata_ms.

Suggested change
id: miidata_ms
id: miidata_miistudio

(Similar for the two other files.)

@@ -0,0 +1,142 @@
meta:
id: miidata_ms
endian: le
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the KSY correctly, the format doesn't use any multi-byte numbers (they are all u1). In that case you don't need to set an endianness, because u1 is the same for BE and LE.

Suggested change
endian: le

(Same for the two other files.)

meta:
id: miidata_ms
endian: le
seq:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add your description from the commit/PR message and add it as a doc (before seq)? That way others can read it without looking at the commit history, and it will appear on https://formats.kaitai.io/ once the PR is merged.

(Same for the two other files.)

seq:
- id: facial_hair_color
type: u1
doc: Facial hair color. Ranges from 0 to 99. Not ordered the same as visible in editor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't properly documented yet, but Kaitai Struct 0.9 added the valid attribute, which can be used to check that an integer field is in the expected range of values. For this field you would write it like this:

  - id: facial_hair_color
    type: u1
    valid:
      min: 0
      max: 99

Then if the value in the data is not in that range, the parser will throw an exception.

valid also supports a few other options - in kaitai-io/kaitai_struct#435 you can find a detailed description. But I think for the fields in this format you only need min and max.

Suggested change
doc: Facial hair color. Ranges from 0 to 99. Not ordered the same as visible in editor.
valid:
min: 0
max: 99
doc: Facial hair color. Not ordered the same as visible in editor.

(Similar for all other fields with "Ranges from x to y" in their doc.)

doc: Favorite color. Ranges from 0 to 11.
- id: gender
type: u1
doc: Mii gender. 0 = male, 1 = female.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to define an enum for this field.

Actually, there are a few other fields in the format that could also use enums (colors, beard/eye/eyebrow/etc. types), but those fields have so many possible values and are probably difficult to describe in text sometimes, so it might not be worth the effort for those fields.

(Same for the gender fields in the other files.)

Comment on lines +154 to +156
type: u1
repeat: expr
repeat-expr: 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: u1
repeat: expr
repeat-expr: 4
size: 4

Comment on lines +6 to +8
type: u1
repeat: expr
repeat-expr: 16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: u1
repeat: expr
repeat-expr: 16
size: 16

Comment on lines +16 to +18
type: u1
repeat: expr
repeat-expr: 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're relatively sure that these bytes should always be zero, you can use contents to check that:

Suggested change
type: u1
repeat: expr
repeat-expr: 3
contents: [0, 0, 0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're relatively sure that these bytes should always be zero, you can use contents to check that

Yes, "if you're relatively sure", but I don't think we can assume anything about unknown data.

Comment on lines +33 to +35
type: u1
repeat: expr
repeat-expr: 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: u1
repeat: expr
repeat-expr: 2
contents: [0, 0]

Comment on lines +164 to +166
type: u1
repeat: expr
repeat-expr: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type: u1
repeat: expr
repeat-expr: 1
contents: [0]

@HEYimHeroic
Copy link
Author

HEYimHeroic commented Nov 17, 2020

wow, what an incredibly detailed list of suggestions! i'll look more closely at your suggested changes soon, but in the meantime, here's some notes i felt like i should mention:

Actually, there are a few other fields in the format that could also use enums (colors, beard/eye/eyebrow/etc. types), but those fields have so many possible values and are probably difficult to describe in text sometimes, so it might not be worth the effort for those fields.

i actually do have a few separate files available that properly line up everything that isn't the same order as in the editors. it's simply a text file named "maps.txt" which list every selection's ID in the same order they appear in the editor, but it's so large i left it out since i couldn't find a way to properly include it. same with the rotation values, but that might be easier. i've attached the maps.txt and rotation.txt files to this post - i'm fairly new to creating Kaitais, but you've laid out such detail, it seems a lot more possible now :)

The field's id is mole_size, but the doc calls it a "beauty mark". For consistency it would be better to use the same term in both the id and the doc.

this was done so the multiple files would have no naming issues when carrying over values. for some reason, the Mii Studio calls some things different names (such as moles "beauty marks" and facial hair all being under "beard"), despite them being the same exact thing as other editors have called it previously. as a standalone file, however, it does make sense to match up the terms. thanks!

Could you mention in the doc how the length of the name is determined, i. e. if you enter a name shorter than 10 characters, how do you find out where the name ends?

If it's zero-terminated (as is commonly done), you're supposed to be able to use type: strz here. Unfortunately this is currently broken for strings in UTF-16 and UTF-32 encoding - see kaitai-io/kaitai_struct#187. In that case as a workaround you can leave it as type: str and note in the doc that it's zero-terminated.

i see... so, for now, i would use the str workaround? just making sure ^^

If you're relatively sure that these bytes should always be zero, you can use contents to check that:

relatively sure, yes ^^ having 00 buffers like this isn't new to the Mii data formats, as i believe all previous versions of the formats (Wii, Wii U/3DS, etc.) all also had 00 buffers. of course, there aren't many Miis found in Switch games, so the data is limited... but from all cases i've seen, included user-created Miis extracted from save files, i've found these values to always be 00. it's helpful to know that many of these values can have caps and limits. i had no idea Kaitai was capable of throwing errors when the value is outside an expected range! (again, very new to these)

there was also one more thing i wanted to ask. having 16 different values for the SwitchDB and Switch's unknown_data seems tedious, but representing them using decimal numbers is even worse, so i separated them. is there a way i'd be able to represent those 16 bytes as one hexidecimal value? thanks.

also, i'm glad the fact there even is unknown data is okay. i could elaborate more about the unknown data and things like that, so i plan to do so. i was slightly worried only the Mii Studio Kaitai would be allowed, since it's the only one of the three that doesn't have any byte left unknown... but i'm glad to see they're all okay. ^^

here are the files i mentioned above, with many notes, heh:
rotation.txt
maps_Studio.txt
maps_Switch.txt

EDIT: i should probably mention larsenv's Wii and Wii U/3DS Mii data format Kaitais - he made those quickly to add in easy support for a tool we worked on, i was going to take a closer look at the formats later and create proper Kaitais for them, complete with docs and things like that. i just wanted to get a few of the ones i already did in, because again i wasn't totally sure this would be accepted anyways haha ^^

@larsenv
Copy link

larsenv commented Nov 17, 2020

@HEYimHeroic You know you can reply to the code comments themselves, right?

@generalmimon
Copy link
Member

@HEYimHeroic For your own convenience, remember a rule of thumb when making pull requests from your fork repository: never commit directly on master of your fork. Create a topic branch for every PR you create. See https://contribute.jquery.org/commits-and-pull-requests/#never-commit-on-master:

When you're working on a fork, you should always think of your master branch as a "landing place" for upstream changes. You should only ever make your commits to topic branches, and your own commits should only ever end up on master after they've been merged in upstream by a maintainer.

@dgelessus
Copy link
Contributor

i actually do have a few separate files available that properly line up everything that isn't the same order as in the editors. it's simply a text file named "maps.txt" which list every selection's ID in the same order they appear in the editor, but it's so large i left it out since i couldn't find a way to properly include it. same with the rotation values, but that might be easier. i've attached the maps.txt and rotation.txt files to this post - i'm fairly new to creating Kaitais, but you've laid out such detail, it seems a lot more possible now :)

It's possible to include and use simple lookup tables in a KSY, using value instances:

instances:
  face_type_editor_order:
    value: [
      0x00, 0x01, 0x08, 0x02, 0x03, 0x09,
      0x04, 0x05, 0x0a, 0x06, 0x07, 0x0b,
    ]
  face_type_editor_position:
    value: face_type_editor_order[face_type]

Though you might have to "invert" the lookup table first - if I'm reading the text files right, your lookup tables translate the position in the editor to the byte stored in the data, which is the opposite of what you need to do when reading the data (translating the stored byte to an editor position).

this was done so the multiple files would have no naming issues when carrying over values. for some reason, the Mii Studio calls some things different names (such as moles "beauty marks" and facial hair all being under "beard"), despite them being the same exact thing as other editors have called it previously. as a standalone file, however, it does make sense to match up the terms. thanks!

Ah I see, that makes sense. Might still be a good idea to point out in the doc when one system/software uses a different name than others when talking about the same thing.

i see... so, for now, i would use the str workaround? just making sure ^^

Yes, exactly. (I just wanted to mention the strz bug with UTF-16 so you don't run into it by accident.)

there was also one more thing i wanted to ask. having 16 different values for the SwitchDB and Switch's unknown_data seems tedious, but representing them using decimal numbers is even worse, so i separated them. is there a way i'd be able to represent those 16 bytes as one hexidecimal value? thanks.

I'm not sure what you mean here exactly, sorry... The way you've written unknown_data currently (using type: u1 and repeat-expr: 16), it will be parsed as an array of 16 integers (one for each byte). If you use size: 16 (with no type), it will instead be parsed as a raw byte array and stored as your language's type for that (e. g. bytes in Python). Kaitai Struct doesn't support integers larger than 64 bits (8 bytes), so you can't parse all 16 bytes into a single integer.

Regarding decimal vs. hexadecimal, that only depends on your code and how it displays the values - the parser doesn't affect how your language prints integers and byte arrays by default. If you're working in the Web IDE, you can control how it displays your data by adding -webide-representation attributes in your KSY.

also, i'm glad the fact there even is unknown data is okay. i could elaborate more about the unknown data and things like that, so i plan to do so. i was slightly worried only the Mii Studio Kaitai would be allowed, since it's the only one of the three that doesn't have any byte left unknown... but i'm glad to see they're all okay. ^^

It's not a problem if your spec skips some parts of the data where the exact meaning isn't known yet. We don't require anyone to 100 % reverse-engineer a format before submitting it to kaitai_struct_formats 🙂

@HEYimHeroic
Copy link
Author

@HEYimHeroic You know you can reply to the code comments themselves, right?

yes, i know... but leaving like 5 separate comments seems tedious and would spam notifications, heh

@HEYimHeroic For your own convenience, remember a rule of thumb when making pull requests from your fork repository: never commit directly on master of your fork. Create a topic branch for every PR you create.

ah, i see... if my contribution history tells you anything, i'm also new to doing things like this on GitHub. sorry ^^ if there's a way i can revert it or something, let me know.

Though you might have to "invert" the lookup table first - if I'm reading the text files right, your lookup tables translate the position in the editor to the byte stored in the data, which is the opposite of what you need to do when reading the data (translating the stored byte to an editor position).

so, the editor may show:

0x00, 0x01, 0x02,
0x03, 0x04, 0x05

but i should put the lookup table as:

0x05, 0x04, 0x03,
0x02, 0x01, 0x00

is that right?

I'm not sure what you mean here exactly, sorry...

the proper way to store information like this (previous iterations of the format point to this) is to store them as a HEX string. for example, a Mii ID would be properly, accurately stored as something like 097A2DF5, not 159,002,101. i believe at least part of this unknown_data is something similar to the Mii ID used in previous formats, and storing at least 16 HEX values would be nicer and more appropriate than 16 decimal numbers. ^^

@dgelessus
Copy link
Contributor

re. "inverting" the lookup table: I meant that in the mathematical sense of "make a function go the other way" - I don't know any better way to describe it, sorry 😄 As an example, the face array from your maps_Studio.txt looks like this:

face: [
    0x00,0x01,0x08,0x02,0x03,0x09,
    0x04,0x05,0x0a,0x06,0x07,0x0b
]

and it describes this mapping between the position of each face type in the editor and the byte value stored in the data:

Position in editor Byte value in data
0 0
1 1
2 8
3 2
4 3
5 9
6 4
7 5
8 10 (0x0a)
9 6
10 7
11 11 (0x0b)

The face array lets you easily translate an editor position to a byte value using face[editor_pos], e. g. face[5] == 0x09. But you can't easily do a lookup in the other direction, i. e. translate a byte value to an editor position. To do that you would need another lookup table that looks something like this:

face_inverse: [
    0, 1, 3, 4, 6, 7,
    9, 10, 2, 5, 8, 11
]

And with that array you can translate a byte value to an editor position using face_inverse[byte_value], e. g. face_inverse[0x09] == 5. This could be integrated into the KSY using instances, to allow your code to directly read the editor positions, instead of having to do the conversion manually.

Anyway, I don't really know how you're using the KSY in your code, so maybe what I'm suggesting here isn't useful to you after all 😄 If you're not sure, you can just leave this out of the KSY and keep the arrays in your code.

the proper way to store information like this (previous iterations of the format point to this) is to store them as a HEX string. for example, a Mii ID would be properly, accurately stored as something like 097A2DF5, not 159,002,101. i believe at least part of this unknown_data is something similar to the Mii ID used in previous formats, and storing at least 16 HEX values would be nicer and more appropriate than 16 decimal numbers. ^^

That's just a question of how you print/display/format the value. Kaitai Struct doesn't specifically store the number as decimal or hexadecimal - it just stores it as a normal integer, which is normally by default shown as decimal. I don't know what language you're using the KSY with, but in Python for example you could do print(f"{mii_id:>08X}") to show the ID as 8 hex digits like in your example (with mii_id = 159002101 this outputs 097A2DF5). For the unknown_data it's similar - Kaitai Struct isn't storing 16 decimal numbers, just 16 numbers, which will be shown as decimal by default, but you can make your code display them differently if it helps you with figuring out what's actually stored in that data.

Copy link
Contributor

@KOLANICH KOLANICH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO should be moved into an own directory.

@HEYimHeroic
Copy link
Author

okay, sorry i took so long, i just got finished with moving. i'll approve the current changes, and make some more edits from there. after that, it should be ready to go ^^

@generalmimon
Copy link
Member

Closing this PR, as it's superseded by #400. However, from what I've seen, this thread contains some useful review comments which are not pointed out or addressed in #400.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants