Some missing context is that the data is shared via the DeepSeek app's use of ByteDance analytics/configuration frameworks. So not a backroom deal where DeekSeek handed over the chat history for its user base, but rather ongoing analytics data being sent from the DeepSeek mobile app.
Besides the usual analytics data (device metadata, user behavior, app performance, errors, etc), it's possible raw chat data is being shared as well, but it's not a smoking gun.
We analyzed the iOS app[1] and observed similar traffic as well as a number of basic security issues (hardcoded encryption keys, use of 3DES and some traffic over HTTP).
Thanks for writing this article! I quite enjoyed it.
question: does the DeepSeek's app use of hardcoded encryption keys rise beyond just their attempt to obfuscate and protect their app's private API endpoints? I believe this an attempt to make abusing their mobile app's private web APIs more difficult since even with cert-pinning disabled and HTTPS MITM'd you still can't observe the real traffic and replicate their requests.
If all its doing is obfuscation though, then I don't understand why pointing out that the keys are hardcoded is meaningful. It certainly doesn't engender trust. But if the app's binary is ultimately decoding some encrypted data, it needs the key, meaning it's ultimately available to the reverse engineer. Whether it's hardcoded or not doesn't matter.
It's a bad look, but if the app used the latest tech and assigned each client its own symmetric encryption key for a session, wouldn't you still be able to access the same data? What would be meaningfully different from a security perspective if they had done this obfuscation better?
Apple disallowed HTTP by default, you can flip a bit in the config to allowlist some/all endpoints to HTTP. Not clear what the App Store actually does when reviewing this info when you submit.
Despite their goal of enforcing in 2017, it is still not a hard requirement. Back then, about 80% of the apps we tested disabled ATS either partially or fully [1]. It’s rare to see Apple walk something back [2], but here is a blog at the time that talked about it [3].
Would you say that US-based apps that use e.g. Google Analytics, and therefore share information with Google, "surface the interconnectedness between all of these firms" and are a good reason to e.g. ban apps from US-based developers?
Not the op, but yes, I would; this is why I approve of GDPR and the cookie popup rules and am actively angry at every company who think it's legit to share browsing habbits with more "trusted partner" companies than there were students in my secondary school.
My comment starts with the reality that some people (e.g. U.S. Congress) find cause for concern WRT Chinese apps.
This is the reason, say, revelations about interconnectedness matter when it comes to Chinese apps versus U.S. apps.
You may disagree about whether there should be cause for concern, but that's another matter.
But, if you're asking me if I personally think there's cause for concern around allowing a foreign adversary access to your citizenry via social media platforms, then the answer is yes.
And, of course, China itself also believes it's a problem, which is why U.S. social media is banned there.
No one cares about the details. (Heck, I'd be willing to wager good money that the politicians and most of their staffers don't even understand the details). In the end, it's just one more reason that Chinese models will not be legal in the US in the near future.
Protectionism can be dumb, if competition from china is decimation the US LLM market, making the cheaper better competitors illegal sounds like sound advice to someone like trump, probably?
Following typical tropes about China, "we" decided to ban space cooperation with them because they were just going to steal American space tech or whatever. That's why, to this day, you never see Chinese on the ISS. Of course China then became the 2nd largest player in space, behind only SpaceX, launched and manned their own space station, sent a rover to Mars, carried out unprecedented sample return missions from the dark side of the Moon, and just generally ran circles around the US sans SpaceX.
If it wasn't for this dumb law, it's likely NASA would have been able to use Russia, China, and SpaceX as redundancies for getting Americans to the ISS as one country/company fell out of favor with this administration or that. As was we ended up turning to Boeing for a redundancy. For those that don't follow space news, the 2 astronauts Boeing [barely] sent to the ISS are still stranded up there after their vessel was deemed too dangerous to return in.
I oft wondered what it would have been like to live in Rome circa 460.
Yeah they act holier than thou when someone else takes data but then turn around and do it themselves, I think that's called hypocrisy. Besides, once data goes to your ISP its gone, aren't we better off just limiting data that we want to keep private?
Plot twist: all these people sharing on twitter yet another creative way of mentioning Xi and Tiananmen in a conversation without triggering the protection (count to 11 in roman numbers, leetspeak etc) were in fact collecting the training data for the nextgen LLM-based protection. Well played!
Yes, they probably all do that. Anthropic primised to pay the winner that broke all their protections. That way they get tens of thousands of free workers trying to get the money. Much cheaper than $300k engineers.
A US Tiananmen-comparable example would be ChatGPT censoring George Floyd's death or killing of Native Americans, etc. ChatGPT doesn't censor these topics
Huh? TPTB in the US do not try to censor those topics; if anything they encourage discussion of them (or at least did until this year). US "AI" systems censor much the same topics as US social networks, just as Chinese "AI" systems censor much the same topics as Chinese social networks.
Major Chinese tech companies often collaborate with government entities, potentially compromising user privacy. Given China's regulatory environment, where authorities can access data held by domestic firms, users worldwide should exercise caution when engaging with platforms from such backgrounds.
"Unintentionally exposed" and "deliberately gave" are two meaningfully different actions, both of which are examples of why much better regulation and legislation of individuals rights over their data are needed.
Shouldn't this be the other way around? TikTok has the most user data for any LLM to train with. I bet they will make a killing with it, unless of course the CCP decrees that they share it for free.
Secondly, most data in China is shared among most companies anyway, because, firstly, the government (not necessarily CCP) orders most companies to share data with "technological leaders" and "strategically important" companies, and secondly because computer security is mostly an alien concept to Chinese.
Copyright (broadly speaking, most restrictions on unrestricted dissemination if data) is what is killing the US economy.
you built a internal project, co-hosted with a database, with a password 'abc123'
a month later, your manager decided to share it with other teams, the decision was made in a meeting which you're not invited
when the manager came to you, you asked:
- how about give me a week to make it a saas, with authn/authz
- no, we don't have the time, just tell them the endpoint and the password
another month later, something changed, your company built a partership with another company, your manager decided to share the project with teams in the other company
you asked:
- how about we do something like virtual network peering so that we can share a connected network with our parter
- it's complex, we can not change the network status of our partner, and we don't have a responsible role for this work, just give them the endpoint and the password
password 'abc123' is just a analogy, in this case, there's no password at all
What useful textual user data do you see coming from TikTok? All the text seems very low quality, to the point where I naively assume that including it in training data would decrease performance.
As the sibling commenter mentioned, the video data itself is useful as we see a rise in multimodal models, but also..
(1) all videos are captioned, automatically then often again by the content creator manually. This data alone is extremely valuable for training purposes.
(2) the videos contain great information about slang terms, and youth vernacular. Which is unique data that is harder to find elsewhere.
(3) young people seem to use TikTok as a search engine, so presumably some of the videos' content must be explicitly valuable enough as an information source, similar to YouTube.
> These references suggest deep integration with ByteDance's analytics and performance monitoring infrastructure
I mean when I visit a random website or open a random app, I kind of expect that it will use something like Google Analytics or Firebase Crashlytics so that my "user data" is shared with Google.
If the article wants me to feel outraged about this practice, I don't. I understand that analytics and performance monitoring are often outsourced to a third party, often without a choice of turning off the analytics and performance monitoring features in the first place.
I use the DeepSeek app happily without giving it any data I consider private. I have a separate local DeepSeek distilled model for that.
You and I can't buy it because they don't want their competitors getting it. But they'll happily use it to target ads at you, and the US government has access to it and can use it to decide who they want to send their CIA kidnap-torture squads after.
Which is in this case a pretty important distinction. Letting another company leverage user data within the bounding zone which you've defined is not the same thing as is being alleged here, which is actually sharing data.
It's quite literally the difference between exposing a public API and actually handing over the contents of the database.
> Letting another company leverage user data within the bounding zone which you've defined is not the same thing as is being alleged here, which is actually sharing data.
I think the problem can be solved easily by forcing the company behind DeepSeek to simply redirect all the data they've gathered on their user, directly unto a CIA database. Surely this will be considered a good compromise.
If you're shocked or even the slightest bit surprised, then I can't imagine how blissful your life is to be so unaware about how much corporations are sharing data with each other.
Like, I wholeheartedly expect that if I mention Beyblade toys on Facebook, then the next time I visit Amazon, they'll be suggesting Beyblades even if I've never even searched Amazon for toys, let alone Beyblade.
Bytedance's entire business model is based on user-targeting and showing things what they might enjoy watching, so they can push more ads to them. I wouldn't be surprised if they bought the data to train their own LLMs.
I recently had an experience that genuinely surprised me: I was watching a Peruvian video on YouTube, and I clicked on the creator's Instagram profile link in the description. Literally a few minutes later I received a promotional email with services and investment opportunities from an official Peruvian government email. Somehow opening an Instagram profile of a Peruvian creator got me tagged as a potential investor? But the most shocking part was how quickly this all happened.
There's basically no credible evidence of this happening. All there is are vague anecdotes which are easily explained with confirmation bias and/or the birthday paradox.
If the argument is that there's no credible evidence, retorting with a vague question doesn't really help your case. If anything it reinforces the original claim that there's no credible evidence.
Being "biased" isn't remotely close to outright lying. Despite all the exasperation about Fox News being "fake news" or whatever, they very rarely outright lie.
Weird hill to die on, man. Like, sure credible evidence is one of the most important things in the world... but what, are you honestly saying that you're going to be surprised if WhatsApp turned out to be leaking data?
We don't need the pitchforks just yet, sure, but shit, you have to remain realistic about these things.
>but what, are you honestly saying that you're going to be surprised if WhatsApp turned out to be leaking data?
Your words, not mine. I never made such claims, and you're trying to move the goalposts from "Meta does this" to "I'll be surprised if Meta does this".
I'm not moving goalposts. I didn't accuse WhatsApp of leaking data, stop twisting other people's words.
I think you mean "I'll NOT be surprised if Meta does this", which is the reasonable position of any rational person to take.
I'm allowed to extrapolate expectations of future behaviour, based on past behaviour. Doing otherwise is naive, dangerously so if you're responsible for someone else's security or privacy.
The truth is even worse; reddit has enough of a profile built on you that they can predict your penchant for beyblades without even needing your whatsapp chats.
WhatsApp is a closed-source client that you cannot trust to faithfully and correctly implement the protocol, or be free of backdoors that allow Meta to snoop on your conversations.
At least according to Meta's marketing, WhatsApp is E2E encrypted. And they make ads just for this -- you can literally see billboards in NYC that advertises the encrypted messaging part of the product. It would completely destroy WhatsApp and Meta's brand if there is a backdoor somewhere. Well, Meta is never a great company to begin with, but nobody would ever lie about it and destroy their brand this way.
And I truly believe Meta has an incentive to do so. They had to reveal a conversation on Facebook Messenger on the topic of abortion after the police asked for it, which resulted in someone put in jail. Regardless of Meta's (or rather Zuck's) ever changing political position, they don't want to have liability over anything like this. They want to walk away and just say to the cops, look it's all encrypted, there's nothing we can share with you.
Better keep conspiracy theories to yourself. It's ok to question things, but better back that up with evidence.
In case this is not clear enough: you'd better come up with some real arguments with concrete evidence, or move on. Nobody has time for meaningless speculation.
Things can be shocking (as in: causing indignation or disgust), yet totally unsurprising. In fact, I'd argue that most newsworthy events tend to be both terrible and entirely expected, given incentives and the way the world is set up to work.
>...corporations are sharing data with each other.
>I wholeheartedly expect that if I mention Beyblade toys on Facebook...
Isn't the lede here that this isn't just some random data sharing agreement between companies, but that these are both Chinese companies, and the recipient of the data has been banned in the U.S. precisely because of data concerns?
That won’t protect you from its propaganda/censorship. Some versions of DeepSeek’s models have bias built in - as in it’s not just implemented by their service/app. But offline does protect you from privacy/security issues.
Yes if you search, lots of people have shared evidence of this. But it depends on which model you’re using, as some seemingly don’t have the bias built into their training.
Here is my story. I needed to buy central console for my car (purchased it a while ago in used cars lot). Went to Amazon and made my selection. Next thing is I see is the warning: this particular console will not fit you car which is MAKE: XXXY, MODEL: YYYY, YEAR: ZZZZ. How's that for data sharing.
I guess we know how many bots are commenting on this article based on how many of them are talking about the US, when the article is about South Korea?
For a single sentence on an entire article about South Korea and a pundit comment by a company whose job it is to look at this sort of thing? If you pick the US as your thing to comment on in an article like this, maybe you're just a bot, or even hired hand, for making comments about the US without bothering to understand that you're obviously commenting out of place.
Trump needs to enforce PAFACA and ban TikTok, but also ban DeepSeek, which has the same exact issues since it is also effectively operated by a foreign adversary and poses various security threats.
Your source doesn't even mention "propaganda". Moreover, while censorship is concerning is concerning, I don't see how that practically affects users. If I want to know how to center a div, who cares if it's cagey about what happened in 1989?
"Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
"Neither Feroot nor the other researchers observed data transferred to China Mobile when testing logins in North America, but they could not rule out that data for some users was being transferred to the Chinese telecom."
> "Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
Clearly a false equivalence. You think government propaganda compelled by a dictatorship with access to a military and nukes is the same thing?
Which government are you talking about? Like the US banned tiktok because Israel did not like it [1]?
Let us be honest here.. China censors directly. US censors indirectly through the private companies [2] and through covert use of force [3]. If you had a pro-Russian stance in 2022 or pro-Palestine stance, you will see your content censored in very subtle ways in US.
you mean how every single US tech company shares data with Google and Meta? How you browse a website, and, in an instant, ads show in Meta products? "user behaviour and device metadata [are] likely sent to ByteDance servers", lol, all your user behavior and device metadata are sent to Google and Meta servers. South Korea too afraid to say the same thing about USA. And surprised Pikachu face about all the downvotes in this thread on users pointing out the same thing about US tech companies, lol, propaganda and ethnonationalism is a powerful force
From the 5th paragraph of the article, Americans are complaining:
> Since then, multiple countries have warned that user data may not be properly protected, and in February a US cybersecurity company alleged potential data sharing between DeepSeek and ByteDance.
Bruhh, your iphone and android will literally “share” what you are saying even in private with anyone they can find for advertising… so this should not be surprising
... in the same way a lot of website in this world 'shared user data' with Google.
Through Google Analytics.
Yeah, believe it or not. ByteDance has a cloud offering. And it includes a frontend APM product. And DeepSeek used that. How surprising! A Chinese company used a Chinese cloud.
Oh, and chat.deepseek.com resolve to a Huawei Cloud IP address in China. It resolves to Cloudflare outside of mainland China, but who knows, maybe they just decided to wrap with another CDN and their servers are still on Huawei Cloud. So they sent data to Huawei, too. I repeat, H-U-A-W-E-I. That cursed telecom equipment company in the States.
Some missing context is that the data is shared via the DeepSeek app's use of ByteDance analytics/configuration frameworks. So not a backroom deal where DeekSeek handed over the chat history for its user base, but rather ongoing analytics data being sent from the DeepSeek mobile app.
Here's the SecurityScoreCard article that brought attention to this: https://securityscorecard.com/blog/a-deep-peek-at-deepseek/#...
Besides the usual analytics data (device metadata, user behavior, app performance, errors, etc), it's possible raw chat data is being shared as well, but it's not a smoking gun.
We analyzed the iOS app[1] and observed similar traffic as well as a number of basic security issues (hardcoded encryption keys, use of 3DES and some traffic over HTTP).
[1] https://www.nowsecure.com/blog/2025/02/06/nowsecure-uncovers...
Thanks for writing this article! I quite enjoyed it.
question: does the DeepSeek's app use of hardcoded encryption keys rise beyond just their attempt to obfuscate and protect their app's private API endpoints? I believe this an attempt to make abusing their mobile app's private web APIs more difficult since even with cert-pinning disabled and HTTPS MITM'd you still can't observe the real traffic and replicate their requests.
If all its doing is obfuscation though, then I don't understand why pointing out that the keys are hardcoded is meaningful. It certainly doesn't engender trust. But if the app's binary is ultimately decoding some encrypted data, it needs the key, meaning it's ultimately available to the reverse engineer. Whether it's hardcoded or not doesn't matter.
It's a bad look, but if the app used the latest tech and assigned each client its own symmetric encryption key for a session, wouldn't you still be able to access the same data? What would be meaningfully different from a security perspective if they had done this obfuscation better?
I thought Apple disallowed apps using HTTP years ago?
Apple disallowed HTTP by default, you can flip a bit in the config to allowlist some/all endpoints to HTTP. Not clear what the App Store actually does when reviewing this info when you submit.
Despite their goal of enforcing in 2017, it is still not a hard requirement. Back then, about 80% of the apps we tested disabled ATS either partially or fully [1]. It’s rare to see Apple walk something back [2], but here is a blog at the time that talked about it [3].
[1] https://www.nowsecure.com/blog/2017/12/29/enable-ios-app-tra...
[2] https://developer.apple.com/news/?id=12212016b
[3] https://www.klundberg.com/blog/app-transport-security-delay/
Before you submit, not when or after
So Deepkseek is not sharing more data than most advertising-funded apps in the world?
Only if they were breaking the law too.
Interesting, but I don't think those details will be ameliorative to the people who are concerned (e.g. U.S. Congress).
In fact, I wonder if it may further underscore their concerns, given that it surfaces the interconnectedness between all of these firms.
Would you say that US-based apps that use e.g. Google Analytics, and therefore share information with Google, "surface the interconnectedness between all of these firms" and are a good reason to e.g. ban apps from US-based developers?
Not the op, but yes, I would; this is why I approve of GDPR and the cookie popup rules and am actively angry at every company who think it's legit to share browsing habbits with more "trusted partner" companies than there were students in my secondary school.
My comment starts with the reality that some people (e.g. U.S. Congress) find cause for concern WRT Chinese apps.
This is the reason, say, revelations about interconnectedness matter when it comes to Chinese apps versus U.S. apps.
You may disagree about whether there should be cause for concern, but that's another matter.
But, if you're asking me if I personally think there's cause for concern around allowing a foreign adversary access to your citizenry via social media platforms, then the answer is yes.
And, of course, China itself also believes it's a problem, which is why U.S. social media is banned there.
Yep.
No one cares about the details. (Heck, I'd be willing to wager good money that the politicians and most of their staffers don't even understand the details). In the end, it's just one more reason that Chinese models will not be legal in the US in the near future.
> it's just one more reason that Chinese models will not be legal in the US in the near future
This isn't about the model, it's about the mobile app.
The open source model weights are different from the website and the app. The model cannot track you.
Not just Congress, even techies can be confused about these things.
Yes, exactly, what the guy above was saying is that they're just looking for excuses to keep people from using the Chinese thing.
Protectionism can be dumb, if competition from china is decimation the US LLM market, making the cheaper better competitors illegal sounds like sound advice to someone like trump, probably?
Or Obama. See: Wolf Amendment. [1]
Following typical tropes about China, "we" decided to ban space cooperation with them because they were just going to steal American space tech or whatever. That's why, to this day, you never see Chinese on the ISS. Of course China then became the 2nd largest player in space, behind only SpaceX, launched and manned their own space station, sent a rover to Mars, carried out unprecedented sample return missions from the dark side of the Moon, and just generally ran circles around the US sans SpaceX.
If it wasn't for this dumb law, it's likely NASA would have been able to use Russia, China, and SpaceX as redundancies for getting Americans to the ISS as one country/company fell out of favor with this administration or that. As was we ended up turning to Boeing for a redundancy. For those that don't follow space news, the 2 astronauts Boeing [barely] sent to the ISS are still stranded up there after their vessel was deemed too dangerous to return in.
I oft wondered what it would have been like to live in Rome circa 460.
[1] - https://en.wikipedia.org/wiki/Wolf_Amendment
yeah mostly picking on trump due to recent absurd tariffs logic (see: europe VAT lmfao)
anyways, hoping its not so bad!
All the major US tech firms are extremely competition averse. They are all cozying up to Trump so they can maintain their cartels.
Yeah they act holier than thou when someone else takes data but then turn around and do it themselves, I think that's called hypocrisy. Besides, once data goes to your ISP its gone, aren't we better off just limiting data that we want to keep private?
[dead]
Plot twist: all these people sharing on twitter yet another creative way of mentioning Xi and Tiananmen in a conversation without triggering the protection (count to 11 in roman numbers, leetspeak etc) were in fact collecting the training data for the nextgen LLM-based protection. Well played!
Yes, they probably all do that. Anthropic primised to pay the winner that broke all their protections. That way they get tens of thousands of free workers trying to get the money. Much cheaper than $300k engineers.
People had the same theory for chatgpt, except rather than Xi and Tiananmen, it was how to make meth and anti-"woke" topics.
A US Tiananmen-comparable example would be ChatGPT censoring George Floyd's death or killing of Native Americans, etc. ChatGPT doesn't censor these topics
Huh? TPTB in the US do not try to censor those topics; if anything they encourage discussion of them (or at least did until this year). US "AI" systems censor much the same topics as US social networks, just as Chinese "AI" systems censor much the same topics as Chinese social networks.
Related:
South Korea bans new DeepSeek AI downloads
https://news.ycombinator.com/item?id=43076325
Major Chinese tech companies often collaborate with government entities, potentially compromising user privacy. Given China's regulatory environment, where authorities can access data held by domestic firms, users worldwide should exercise caution when engaging with platforms from such backgrounds.
Which countries don’t have a process for the same thing…specifically?
m8 they shared their database with the entire world: https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepse...
"Unintentionally exposed" and "deliberately gave" are two meaningfully different actions, both of which are examples of why much better regulation and legislation of individuals rights over their data are needed.
Shouldn't this be the other way around? TikTok has the most user data for any LLM to train with. I bet they will make a killing with it, unless of course the CCP decrees that they share it for free.
Firstly, Bytedance is far more than just Tiktok.
Secondly, most data in China is shared among most companies anyway, because, firstly, the government (not necessarily CCP) orders most companies to share data with "technological leaders" and "strategically important" companies, and secondly because computer security is mostly an alien concept to Chinese.
Copyright (broadly speaking, most restrictions on unrestricted dissemination if data) is what is killing the US economy.
> computer security is mostly an alien concept to Chinese.
that's the main reason
i don't know how the situation is elsewhere, but in China, 2/3 of startups expose their databases on public network with a password 'abc123'
How do you know that?
i'm the one who set the password
security is not a requirement for many startups, velocity is
> i'm the one who set the password
You, personally, set the password to a public internet-facing database to 'abc123'?
And if you really did, how much do you estimate that increased your 'velocity'?
you built a internal project, co-hosted with a database, with a password 'abc123'
a month later, your manager decided to share it with other teams, the decision was made in a meeting which you're not invited
when the manager came to you, you asked:
- how about give me a week to make it a saas, with authn/authz
- no, we don't have the time, just tell them the endpoint and the password
another month later, something changed, your company built a partership with another company, your manager decided to share the project with teams in the other company
you asked:
- how about we do something like virtual network peering so that we can share a connected network with our parter
- it's complex, we can not change the network status of our partner, and we don't have a responsible role for this work, just give them the endpoint and the password
password 'abc123' is just a analogy, in this case, there's no password at all
And virtual network peering requires a license in China anyway.
I literally can hear Jimmy Yang as Jian Yang in Silicon Valley narrate this...!
I have no idea why people are downvoting first-hand information. Take an upvote!
loads of rubbish XD The government does not know what data is important, and most time does not know who is technological leader
Of course they don't know, but they really don't have to.
They designate who is strategic, and those designated strategic tell them what kind of data they need.
What useful textual user data do you see coming from TikTok? All the text seems very low quality, to the point where I naively assume that including it in training data would decrease performance.
As the sibling commenter mentioned, the video data itself is useful as we see a rise in multimodal models, but also..
(1) all videos are captioned, automatically then often again by the content creator manually. This data alone is extremely valuable for training purposes.
(2) the videos contain great information about slang terms, and youth vernacular. Which is unique data that is harder to find elsewhere.
(3) young people seem to use TikTok as a search engine, so presumably some of the videos' content must be explicitly valuable enough as an information source, similar to YouTube.
arguably video data is more valuable
whoosh
There's nothing technical about it. Funny when many people mentioned propaganda by Deepseek. You're seeing the counter-strike.
> These references suggest deep integration with ByteDance's analytics and performance monitoring infrastructure
I mean when I visit a random website or open a random app, I kind of expect that it will use something like Google Analytics or Firebase Crashlytics so that my "user data" is shared with Google.
If the article wants me to feel outraged about this practice, I don't. I understand that analytics and performance monitoring are often outsourced to a third party, often without a choice of turning off the analytics and performance monitoring features in the first place.
I use the DeepSeek app happily without giving it any data I consider private. I have a separate local DeepSeek distilled model for that.
If they sold it, isn't that like what... literally almost every single company does nowadays unless you pay up?
I don't think so. You can't buy user data from Google or Facebook or Apple or Microsoft and they probably have more of it than anybody else.
You and I can't buy it because they don't want their competitors getting it. But they'll happily use it to target ads at you, and the US government has access to it and can use it to decide who they want to send their CIA kidnap-torture squads after.
No, but they let you leverage it.
Which is in this case a pretty important distinction. Letting another company leverage user data within the bounding zone which you've defined is not the same thing as is being alleged here, which is actually sharing data.
It's quite literally the difference between exposing a public API and actually handing over the contents of the database.
Facebook buys (well at least used to) buy data from other brokers. So you can think Bytedance = Facebook, Deepseek = data broker in this scenario.
> Letting another company leverage user data within the bounding zone which you've defined is not the same thing as is being alleged here, which is actually sharing data.
Both are real violations of users.
I didn't say they weren't, but it's an important distinction nonetheless.
*even if you pay up.
I think the problem can be solved easily by forcing the company behind DeepSeek to simply redirect all the data they've gathered on their user, directly unto a CIA database. Surely this will be considered a good compromise.
This is my surprised face -_-
If you're shocked or even the slightest bit surprised, then I can't imagine how blissful your life is to be so unaware about how much corporations are sharing data with each other.
Like, I wholeheartedly expect that if I mention Beyblade toys on Facebook, then the next time I visit Amazon, they'll be suggesting Beyblades even if I've never even searched Amazon for toys, let alone Beyblade.
That's literally Meta's business model, they will happily explain how thats going for them in public investors' calls every few months.
With deepseek and bytedance things are a lot less clear cut.
Bytedance's entire business model is based on user-targeting and showing things what they might enjoy watching, so they can push more ads to them. I wouldn't be surprised if they bought the data to train their own LLMs.
The terms of use of Deepseek make it very clear they will sell your data.
How-so less clear-cut? Mysterious and Chinesey so perhaps?
The ostensible business models of the companies at play.
Stop looking at any opportunity to bark as Sinophobia.
Don't just restate it using different words! Precisely how's that different to the business models at play at Meta etc?
Supposing you were an investor in either or both, is that the question you waited three months to ask?
what is less clear cut? you can safely assume they do at least the same things as meta.
I recently had an experience that genuinely surprised me: I was watching a Peruvian video on YouTube, and I clicked on the creator's Instagram profile link in the description. Literally a few minutes later I received a promotional email with services and investment opportunities from an official Peruvian government email. Somehow opening an Instagram profile of a Peruvian creator got me tagged as a potential investor? But the most shocking part was how quickly this all happened.
Apparently Peru's poise presents preparedness porque es preferido para la presencia de GalApagos al oeste. [sic]
You could literally talk about beyblade toys on a whatsapp video call and you'll be getting Amazon ads for dem blades the next day.
There's basically no credible evidence of this happening. All there is are vague anecdotes which are easily explained with confirmation bias and/or the birthday paradox.
How much did Zuckerberg pay for WhatsApp again?
If the argument is that there's no credible evidence, retorting with a vague question doesn't really help your case. If anything it reinforces the original claim that there's no credible evidence.
If Ruppert Murdoch bought an independent news agency, would you expect the agency to remain unbiased in their reporting?
Being "biased" isn't remotely close to outright lying. Despite all the exasperation about Fox News being "fake news" or whatever, they very rarely outright lie.
https://www.astralcodexten.com/p/the-media-very-rarely-lies
This is how you're playing this argument out??
Weird hill to die on, man. Like, sure credible evidence is one of the most important things in the world... but what, are you honestly saying that you're going to be surprised if WhatsApp turned out to be leaking data?
We don't need the pitchforks just yet, sure, but shit, you have to remain realistic about these things.
>but what, are you honestly saying that you're going to be surprised if WhatsApp turned out to be leaking data?
Your words, not mine. I never made such claims, and you're trying to move the goalposts from "Meta does this" to "I'll be surprised if Meta does this".
I'm not moving goalposts. I didn't accuse WhatsApp of leaking data, stop twisting other people's words.
I think you mean "I'll NOT be surprised if Meta does this", which is the reasonable position of any rational person to take.
I'm allowed to extrapolate expectations of future behaviour, based on past behaviour. Doing otherwise is naive, dangerously so if you're responsible for someone else's security or privacy.
WhatsApp is secured by Signal encryption
That's nice. I'll remember that next time I talk about beyblades to a friend in WhatsApp and see adds for them on Reddit the next day.
The truth is even worse; reddit has enough of a profile built on you that they can predict your penchant for beyblades without even needing your whatsapp chats.
Gasp! Even if I am only talking about the Beyblades on WhatsApp?
WhatsApp is a closed-source client that you cannot trust to faithfully and correctly implement the protocol, or be free of backdoors that allow Meta to snoop on your conversations.
At least according to Meta's marketing, WhatsApp is E2E encrypted. And they make ads just for this -- you can literally see billboards in NYC that advertises the encrypted messaging part of the product. It would completely destroy WhatsApp and Meta's brand if there is a backdoor somewhere. Well, Meta is never a great company to begin with, but nobody would ever lie about it and destroy their brand this way.
And I truly believe Meta has an incentive to do so. They had to reveal a conversation on Facebook Messenger on the topic of abortion after the police asked for it, which resulted in someone put in jail. Regardless of Meta's (or rather Zuck's) ever changing political position, they don't want to have liability over anything like this. They want to walk away and just say to the cops, look it's all encrypted, there's nothing we can share with you.
Better keep conspiracy theories to yourself. It's ok to question things, but better back that up with evidence.
Mark Zuckerberg famously had a term for people like you who trusted him.
In case this is not clear enough: you'd better come up with some real arguments with concrete evidence, or move on. Nobody has time for meaningless speculation.
This is a rare example where I actually think Meta should be providing the evidence, and not the other way around.
When you have a history of doing greasy shit, you don't get the benefit of the doubt.
What evidence would be convincing?
How about a breakdown on the exact ways WhatsApp makes them money and how the justified value of $20 bil when meta purchased it made sense?
Things can be shocking (as in: causing indignation or disgust), yet totally unsurprising. In fact, I'd argue that most newsworthy events tend to be both terrible and entirely expected, given incentives and the way the world is set up to work.
>...corporations are sharing data with each other.
>I wholeheartedly expect that if I mention Beyblade toys on Facebook...
Isn't the lede here that this isn't just some random data sharing agreement between companies, but that these are both Chinese companies, and the recipient of the data has been banned in the U.S. precisely because of data concerns?
Lol I'm so glad I'm running it offline with Ollama.
That won’t protect you from its propaganda/censorship. Some versions of DeepSeek’s models have bias built in - as in it’s not just implemented by their service/app. But offline does protect you from privacy/security issues.
>Some versions of DeepSeek’s models have bias built in
Is this empirically supported? IME it doesn't hesitate to talk about Tienanmen Square or whatever, so if there's "bias" it must be very well hidden.
The 1.1B model which I used refuses to talk about Tienanmen Square.
Yes if you search, lots of people have shared evidence of this. But it depends on which model you’re using, as some seemingly don’t have the bias built into their training.
And they also shared how trivial it is to bypass. It is one of the most uncensored released models that you can run privately from my understanding.
If you have a better model for the OP to run, please present it.
Edit: https://huggingface.co/perplexity-ai/r1-1776
Just released.
[dead]
Here is my story. I needed to buy central console for my car (purchased it a while ago in used cars lot). Went to Amazon and made my selection. Next thing is I see is the warning: this particular console will not fit you car which is MAKE: XXXY, MODEL: YYYY, YEAR: ZZZZ. How's that for data sharing.
At some point you have entered your car data while searching for another car part on Amazon. Amazon caches this information.
I've never entered ANY car data online. Most likely it was sold to Amazon by my insurance company
Nope.
wait, is this a bad thing for US Tech companies? Guess if that's a yes this is good for Europe?
The US decided ByteDance and TikTok were national security threats, so presumably this is not a "good thing" from the US's perspective.
again, ... bad for who? For Trump and his Silicon Valley sycophants? If that is a yes, then I don't see why this is a bad thing for Europe.
I guess we know how many bots are commenting on this article based on how many of them are talking about the US, when the article is about South Korea?
Does that make you a bot for not realizing the article also talks about US researchers coming to a similar conclusion?
For a single sentence on an entire article about South Korea and a pundit comment by a company whose job it is to look at this sort of thing? If you pick the US as your thing to comment on in an article like this, maybe you're just a bot, or even hired hand, for making comments about the US without bothering to understand that you're obviously commenting out of place.
shocked pikachu face
Reminder on various DeepSeek problems:
1. DeepSeek is full of propaganda/censorship (https://arstechnica.com/ai/2025/01/the-questions-the-chinese...)
2. They already had a serious security and privacy issue when they left their database wide open and leaked everyone’s chat history (https://www.wiz.io/blog/wiz-research-uncovers-exposed-deepse... )
3. Multiple teams of security researchers found code that links DeepSeek to the Chinese government through China Mobile, who is banned from operating in the US (https://www.pbs.org/newshour/world/researchers-link-deepseek...)
4. Other countries like South Korea are banning DeepSeek already over privacy concerns (https://mashable.com/article/south-korea-blocks-deepseek)
Trump needs to enforce PAFACA and ban TikTok, but also ban DeepSeek, which has the same exact issues since it is also effectively operated by a foreign adversary and poses various security threats.
>1. DeepSeek is full of propaganda/censorship (https://arstechnica.com/ai/2025/01/the-questions-the-chinese...)
Your source doesn't even mention "propaganda". Moreover, while censorship is concerning is concerning, I don't see how that practically affects users. If I want to know how to center a div, who cares if it's cagey about what happened in 1989?
"Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
>3. Multiple teams of security researchers found code that links DeepSeek to the Chinese government through China Mobile, who is banned from operating in the US (https://www.pbs.org/newshour/world/researchers-link-deepseek...)
Seems like a nothingburger?
"Neither Feroot nor the other researchers observed data transferred to China Mobile when testing logins in North America, but they could not rule out that data for some users was being transferred to the Chinese telecom."
>4. Other countries like South Korea are banning DeepSeek already over privacy concerns (https://mashable.com/article/south-korea-blocks-deepseek)
That's what this thread is about. Many commenters have mentioned why South Korea's actions were dumb.
> "Western" AI is also arguably full of "propaganda/censorship". Remember when chatgpt just came out, and conservatives were lambasting it for being "woke"?
Clearly a false equivalence. You think government propaganda compelled by a dictatorship with access to a military and nukes is the same thing?
Which government are you talking about? Like the US banned tiktok because Israel did not like it [1]?
Let us be honest here.. China censors directly. US censors indirectly through the private companies [2] and through covert use of force [3]. If you had a pro-Russian stance in 2022 or pro-Palestine stance, you will see your content censored in very subtle ways in US.
1. https://www.kenklippenstein.com/p/tiktok-ban-fueled-by-israe... 2. https://indi.ca/why-i-left-medium-they-defenestrated-me/ 3. https://www.kenklippenstein.com/p/the-fbi-knocked-on-my-door
It is surprising how people in US cannot see the disinformation campaign they are being subjected to.
you mean how every single US tech company shares data with Google and Meta? How you browse a website, and, in an instant, ads show in Meta products? "user behaviour and device metadata [are] likely sent to ByteDance servers", lol, all your user behavior and device metadata are sent to Google and Meta servers. South Korea too afraid to say the same thing about USA. And surprised Pikachu face about all the downvotes in this thread on users pointing out the same thing about US tech companies, lol, propaganda and ethnonationalism is a powerful force
There was data sharing between Twitter, Google and Facebook as well.
USA has to make a decision between safety and national security.
Sorry how is a story about South Korea about the US?
Does DeepSeek only share South Korean data?
[dead]
[flagged]
Am I (an American) "allowed" to complain if I also complain about the American tech industry doing the same thing? All of it is bad.
Mark Zuckerberg doesn't get to complain about this. Consumers absolutely do.
(a) The BBC is not America, (b) South Korea is not America. It helps to at least browse the first paragraph before commenting.
From the 5th paragraph of the article, Americans are complaining:
> Since then, multiple countries have warned that user data may not be properly protected, and in February a US cybersecurity company alleged potential data sharing between DeepSeek and ByteDance.
Alleging something is not complaining. It’s just…a statement. What do you want?
com·plain
/kəmˈplān/
express dissatisfaction or annoyance about something
Nothing in the sentence you quoted suggests that anyone is complaining about anything.
We're leveling allegations, but without dissatisfaction or annoyance?
"Hey you, you're sharing our data! No further comments. Carry on!"
No, an allegation is a claim that something is wrong.
To be fair, British (e.g. Experian) and SK (e.g. Samsung) companies do the same.
I'll take "headlines that mean nothing when the world has given up the moral high ground" for $200, Alex
Bruhh, your iphone and android will literally “share” what you are saying even in private with anyone they can find for advertising… so this should not be surprising
... in the same way a lot of website in this world 'shared user data' with Google.
Through Google Analytics.
Yeah, believe it or not. ByteDance has a cloud offering. And it includes a frontend APM product. And DeepSeek used that. How surprising! A Chinese company used a Chinese cloud.
Oh, and chat.deepseek.com resolve to a Huawei Cloud IP address in China. It resolves to Cloudflare outside of mainland China, but who knows, maybe they just decided to wrap with another CDN and their servers are still on Huawei Cloud. So they sent data to Huawei, too. I repeat, H-U-A-W-E-I. That cursed telecom equipment company in the States.
> A Chinese company used a Chinese cloud.
Using a Chinese company for every thing except for distillation for which they used OpenAI lol.
Like there is no data sharing between OpenAI and other big tech in US....