Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd Community Review of IPFSCN #236

Closed
ipfscn opened this issue Nov 29, 2024 · 6 comments
Closed

2nd Community Review of IPFSCN #236

ipfscn opened this issue Nov 29, 2024 · 6 comments
Assignees
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@ipfscn
Copy link

ipfscn commented Nov 29, 2024

Allocator Compliance Report: https://compliance.allocator.tech/report/f03019942/1732839477/report.md
Previous review: #173

filplus-bookkeeping/IPFSCN#31
filplus-bookkeeping/IPFSCN#33
filplus-bookkeeping/IPFSCN#35

@filecoin-watchdog filecoin-watchdog added Refresh Applications received from existing Allocators for a refresh of DataCap allowance Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Nov 29, 2024
@Kevin-FF-USA Kevin-FF-USA self-assigned this Dec 3, 2024
@filecoin-watchdog
Copy link
Collaborator

filecoin-watchdog commented Dec 6, 2024

@ipfscn
Allocator Application
Compliance Report
1st Review
1st Review score: 5PiB

5 PiB granted to new clients:

Client Name DC status
SmithsonianInstitution 1.75PiB New
NationalLibrary of Medicine 1.75PiB New
Radiotelescope 1.48PiB New

Smithsonian Institution

  • In the form fields “Share a brief history of your project and organization” and “Describe the data being stored onto Filecoin,” the content is identical, containing only a description of the Smithsonian’s mission.
  • Was KYC (Know Your Customer) verification conducted?
  • This dataset has already been stored multiple times: LDN, Allocator.tech # 27, Allocator.tech # 25
  • The second report showed very low retrieval rates for 2 out of 5 SPs, with the rest below 75%. Despite this, the allocator stated that the performance looked normal and granted another allocation without further comment.
  • The latest report mentions 6 additional SPs, but these were not updated in the GitHub issue. Retrieval rates vary significantly, ranging from 11% to 60%. None of the SPs have a 0% retrieval rate, but overall performance is mixed across 11 SPs.

National Library of Medicine

  • Was KYC verification conducted?
  • This dataset has been stored before: Allocator.tech, LDN
  • The second report revealed partial mismatches between SPs used for deals and the provided list. Three out of six SPs had retrieval rates below 5%. The allocator did not comment on this and granted another allocation without addressing the issue.
  • The latest report shows some improvement, but two SPs still have retrieval rates below 6%.
  • The allocator application stated that “customers are distributed across three different continents, the number of SPs is more than five, and the SPs come from three different companies.” However, this was not fulfilled, as SPs are only from two continents and four countries.

Radiotelescope

  • There is not enough information about data preparation.
  • This dataset has been stored before: LDN
  • Was KYC verification conducted?
  • The fourth report showed that four out of seven SPs had retrieval rates below 6%. When asked about this, the client stated that newly added SPs support Spark, but no explanation was given for the low retrieval rates or any planned improvements.
  • The latest report indicates continued cooperation with SPs that have poor retrieval performance. Four out of eight SPs have retrieval rates below 10%.
  • Two SPs do not match the provided list.
  • The allocator application stated that “customers are distributed across three different continents, the number of SPs is more than five, and the SPs come from three different companies.” This was not achieved, as the SPs are only from two continents and four countries.

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Dec 6, 2024
@ipfscn
Copy link
Author

ipfscn commented Dec 9, 2024

  1. Smithsonian Institution
    [DataCap Application] Smithsonian Institution filplus-bookkeeping/IPFSCN#31
    Hello, we have signed this LDN for them a total of 3 times,
    Before signing, we inquired about how they prepared the data and whether the SP they collaborated with supports Spark?
    The check for the first round of 256T is normal. The partnered SP is the same as the disclosed SP, and both support Spark
    After using 256T, the customer left a message expressing their desire for continued support, so we proceeded to sign up for 512TiB

Based on the data from the first and second rounds, we confirmed that both the SPs cooperated with and disclosed by the client were the same. We also checked the nodes and found no VPN issues. However, we encountered no data from the bot for two consecutive checks. Therefore, after seeing bot data for the third time, we performed the third signature 1PiB
After the third signature, the client added some new SPs.
Although these SPs all support Spark, the customers have not disclosed in advance on GitHub that the newly added SPs may have some VPN issues.
Now, we have inquired with the client on GitHub and are awaiting their response. We will advise the client not to cooperate with the SP that uses VPN anymore.
Meanwhile, we plan to shut down this LDN.

  1. [DataCap Application] NLM Public Dataset
    [DataCap Application] NLM Public dataset filplus-bookkeeping/IPFSCN#35
    Our KYC process for clients is primarily conducted on GitHub. Before signing, we inquired about their technical details, whether their SP supports Spark, and whether they can start immediately?
    We have signed this LDN for them a total of 3 times, with capacities of 256TiB, 512TiB, and 1PiB respectively
    We pay close attention to the bot's data, so before each signature, we enter "checker:manualTrigger" to check the customer's situation. In fact, the report displayed by the bot is quite good. The customer has not encountered any CID issues, nor has there been any issues related to data backup, duplicate data, or VPN usage.
    However, we must admit that we have not been diligent enough in our due diligence. We should pay more attention to the changes in the SPs disclosed by our clients and the actual SPs we collaborate with.
    Next, we will work harder and actively address our own issues.

  2. [DataCap Application] Radio telescope
    [DataCap Application] Radio telescope filplus-bookkeeping/IPFSCN#33
    We still asked the client questions before signing, including data preparation, whether SP supports Spark, and other issues.
    We ask these questions to all our customers. Only after the customer answers these questions will we proceed with signing.
    Our first round of signatures for all customers is 256T, the second round is 512T, the third round is 1P, and the maximum will not exceed 2P.
    As our credit limit was exhausted, we only signed 750T in the third round.
    The SP disclosed by the client is essentially the same as the actual partnered SP. Any newly added SPs have been disclosed in advance, and explanations have been provided for the relatively low retrieval rate of SPs. We also specifically inquired with the client whether the newly added SPs support Spark.

Thank you for raising the issue of duplicate datasets. We have not previously paid attention to this issue. We will focus on it in the next round.
We will also focus on the specific data regarding the retrieval rate of SP, rather than merely requiring SP to support Spark
We will be more diligent and attentive towards our clients, and conduct more rigorous due diligence.
thank you.

@filecoin-watchdog
Copy link
Collaborator

@ipfscn
You mentioned:

Our KYC process for clients is primarily conducted on GitHub.

Could you provide an example of what you mean? For KYC purposes, the focus is on verifying that the client is a real person or entity, not on clarifying the information provided in the form. I assume you’re aware of this distinction, so I’d appreciate it if you could elaborate to help me better understand your perspective.

@ipfscn
Copy link
Author

ipfscn commented Dec 12, 2024

Hello, respected governance team.
After customers apply to us, we will ask them on GitHub:

  1. Technical solutions and details.
  2. Does it support Spark retrieval.
    We have a total of three clients in this round, and before assigning them, we asked them all.
    Taking the first customer as an example:
    [DataCap Application] Smithsonian Institution filplus-bookkeeping/IPFSCN#31
    We asked them four questions in total before signing for them twice.
    Two technical issues, one is the Spark issue, and the other is whether it can start immediately (time).
    image

We only signed after they all replied.
First signature 256T
Second signature 512T
Third signature 1PiB
I checked the bot data before the second and third signatures:
The disclosed SP and the actual cooperating SP are the same.
Most SPs support Spark.
There were no issues such as CID or duplicate data.
After the third signature, we found that the client had added a new SP, but they did not disclose it in advance.
And we found that these SPs may have used VPN.
So, we asked again. Although we have seen their answers, we may no longer continue to support them.
image

Next, we will scrutinize our clients more rigorously. We will also work harder to carry out our work. Thank you

@filecoin-watchdog
Copy link
Collaborator

@ipfscn The above explanation doesn't cover the KYC process.

@galen-mcandrew
Copy link
Collaborator

As noted, there seems to be a gap in the diligence investigations happening here. For example, how are you verifying who this client is, their data preparation methods, and their claims about mailing hard drives? Do you have any additional evidence that supports their claims? Have you seen any additional information about this custom built data processing tool? How do you know this is actually the dataset that is being claimed?

In addition to this missing KYC/KYB diligence, there are additional issues that need to be addressed:

  • extraneous and redundant datasets across the Filecoin ecosystem
  • low retrieval rates
  • inaccurate and inconsistent SP lists and disclosures
  • inaccurate SP regional distribution
  • noncompliance with original allocator application, such as the client diligence section

Given these flags and discrepancies, we are requesting 2.5PiB for this allocator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants